0% found this document useful (0 votes)

804 views

SAS DI Developer - Fast Track PDF

Uploaded by

クマーヴィーン

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

804 views

SAS DI Developer - Fast Track PDF

Uploaded by

クマーヴィーン

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 982

SAS Data Integration

Studio: Fast Track

Course Notes
SAS Data Integration Studio: Fast Track Course Notes was developed by Linda Jolley, Kari Richardson,
Eric Rossland, and Christine Vitron. Editing and production support was provided by the Curriculum
Development and Support Department.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of
SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product
names are trademarks of their respective companies.

SAS Data Integration Studio: Fast Track Course Notes

Copyright 2009 SAS Institute Inc. Cary, NC, USA. All rights reserved. Printed in the United States of
America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in
any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written
permission of the publisher, SAS Institute Inc.

Book code E1477, course code DIFT, prepared date 31Jul2009. DIFT_001

ISBN 978-1-60764-048-6
For Your Information iii

Table of Contents

Course Description .................................................................................................................... viii

Prerequisites ................................................................................................................................ ix

Chapter 1 Introduction .......................................................................................... 1-1

1.1 Exploring the Platform for SAS Business Analytics ....................................................... 1-3

1.2 Introduction to Data Integration Applications ............................................................... 1-12

Demonstration: Using SAS Data Integration Studio............................................... 1-20
Demonstration: Using dfPower Explorer and dfPower Profile ............................... 1-55

1.3 Introduction to Change Management............................................................................. 1-81

Demonstration: Exploring the Basics of Change Management .............................. 1-85

Chapter 2 Introduction to Course Data and Course Scenario ........................... 2-1

2.1 Introduction to Classroom Environment and Course Data .............................................. 2-3

2.2 Course Tasks .................................................................................................................... 2-8

Exercises.................................................................................................................. 2-14

2.3 Solutions to Exercises .................................................................................................... 2-26

Chapter 3 Creating Metadata for Source Data ..................................................... 3-1

3.1 Setting Up the Environment............................................................................................. 3-3

Demonstration: Defining Custom Folders ................................................................ 3-5
Demonstration: Defining Metadata for a SAS Library ........................................... 3-10
Exercises.................................................................................................................. 3-17

3.2 Registering Source Data Metadata ................................................................................. 3-18

Demonstration: Registering Metadata for SAS Source Tables ............................... 3-20
Demonstration: Registering Metadata for ODBC Data Sources ............................. 3-36
Demonstration: Registering Metadata for External Files ........................................ 3-65
iv For Your Information

Exercises.................................................................................................................. 3-82

3.3 Solutions to Exercises .................................................................................................... 3-84

Chapter 4 Creating Metadata for Target Data ...................................................... 4-1

4.1 Registering Target Data Metadata .................................................................................... 4-3

Demonstration: Defining the Product Dimension Table ........................................... 4-6
Exercises.................................................................................................................. 4-22

4.2 Importing Metadata ........................................................................................................ 4-23

Demonstration: Importing Relational Metadata ...................................................... 4-26

4.3 Solutions to Exercises .................................................................................................... 4-40

Chapter 5 Creating Metadata for Jobs ................................................................. 5-1

5.1 Introduction to Jobs and the Job Editor ........................................................................... 5-3

Demonstration: Populating Current and Terminated Staff Tables ............................. 5-6

5.2 Using the SQL Join Transformation .............................................................................. 5-33

Demonstration: Populating the Product Dimension Table ............................ 5-41

Exercises.................................................................................................................. 5-68

5.3 Working with Jobs ......................................................................................................... 5-69

Demonstration: Investigate Mapping and Propagation Functionality..................... 5-72
Demonstration: Chaining Jobs ................................................................................ 5-99
Demonstration: Investigating Performance Statistics ........................................... 5-105
Demonstration: Using the Reports Window ......................................................... 5-120

5.4 Solutions to Exercises .................................................................................................. 5-129

Chapter 6 Orion Star Case Study.......................................................................... 6-1

6.1 Exercises .......................................................................................................................... 6-3

Define and Load the Customer Dimension Table...................................................... 6-6
Define and Load the Organization Dimension Table ................................................ 6-8
Define and Load the Time Dimension Table ........................................................... 6-11
For Your Information v

6.2 Solutions to Exercises .................................................................................................... 6-13

Define and Load the Customer Dimension Table.................................................... 6-13
Define and Load the Organization Dimension Table .............................................. 6-19
Define and Load the Time Dimension Table ........................................................... 6-29

Chapter 7 Working with Transformations ............................................................ 7-1

7.1 Introduction ...................................................................................................................... 7-3

Demonstration: Create Orion Reports Subfolders .................................................... 7-5

7.2 Using Extract, Summary Statistics, and Loop Transformations ...................................... 7-7
Demonstration: Using the Extract and Summary Statistics Transformation ............. 7-9
Demonstration: Using the Loop Transformations ................................................... 7-34
Exercises.................................................................................................................. 7-54

7.3 Establishing Status Handling ......................................................................................... 7-57

Demonstration: Working with Transformation and Job Status Handling ............... 7-61
Demonstration: Using the Return Code Check Transformation ............................. 7-71

7.4 Using the Data Validation Transformation..................................................................... 7-77

Demonstration: Using the Data Validation Transformation .................................... 7-79
Exercises.................................................................................................................. 7-93

7.5 Using Transpose, Sort, Append, and Rank Transformations ......................................... 7-95
Demonstration: Using the Transpose, Sort, Append, and Rank
Transformations............................................................................. 7-99

7.6 Basic Standardization with the Apply Lookup Standardization Transformation ......... 7-123
Demonstration: Using the Apply Lookup Standardization Transformation .......... 7-125
Exercises................................................................................................................ 7-138

7.7 Solutions to Exercises .................................................................................................. 7-140

Chapter 8 Working with Tables and the Table Loader Transformation ............. 8-1

8.1 Basics of the Table Loader Transformation ..................................................................... 8-3

8.2 Load Styles of the Table Loader Transformation ............................................................. 8-8

vi For Your Information

8.3 Table Properties and Load Techniques of the Table Loader Transformation ................. 8-14

Chapter 9 Working with Slowly Changing Dimensions...................................... 9-1

9.1 Defining Slowly Changing Dimensions .......................................................................... 9-3

9.2 Using the SCD Type 2 Loader and Lookup Transformations ........................................ 9-15
Demonstration: Populate Star Schema Tables Using the SCD Type 2 Loader
with the Surrogate Key Method .................................................... 9-29

9.3 Introducing the Change Data Capture Transformations ................................................ 9-93

Chapter 10 Defining Generated Transformations ............................................... 10-1

10.1 SAS Code Transformations............................................................................................ 10-3

10.2 Using the New Transformation Wizard ......................................................................... 10-8

Demonstration: Creating a Generated Transformation ......................................... 10-11
Exercises................................................................................................................ 10-35

10.3 Solutions to Exercises .................................................................................................. 10-40

Chapter 11 Implementing Data Quality Techniques (Self-Study) ...................... 11-1

11.1 SAS and Data Quality .................................................................................................... 11-3

11.2 Working with the DataFlux IS Transformations .......................................................... 11-10

Demonstration: Confirming DataFlux Integration Server Is Running .................. 11-15
Demonstration: Configuring the DataFlux Integration Server Manager............... 11-18
Demonstration: Creating Jobs for Execution on DataFlux Integration Server ..... 11-20
Demonstration: Creating a Job to be Used as a Service ........................................ 11-63
Demonstration: Uploading Jobs and Services to DataFlux Integration Server ..... 11-69
Demonstration: Registering DataFlux Integration Server in SAS
Management Console .................................................................. 11-76
Demonstration: Using the DataFlux IS Job Transformation ................................. 11-87
Demonstration: Using the DataFlux IS Service Transformation......................... 11-101
Exercises.............................................................................................................. 11-120
For Your Information vii

Chapter 12 Deploying Jobs ................................................................................... 12-1

12.1 Overview of Deploying Jobs ......................................................................................... 12-3

12.2 Deploying Jobs for Scheduling ...................................................................................... 12-5

Demonstration: Scheduling Orion Jobs ................................................................ 12-13

12.3 Deploying Jobs as Stored Processes ............................................................................ 12-38

Demonstration: Creating SAS Stored Processes from Report Jobs ...................... 12-43

Chapter 13 Learning More ..................................................................................... 13-1

13.1 SAS Resources ............................................................................................................... 13-3

viii For Your Information

Course Description
This intensive training course provides accelerated learning for those students who will register sources
and targets; create and deploy jobs; work with transformations; set up change management; work with
slowly changing dimensions; and understand status handling and change data capture. This course is for
individuals who are comfortable with learning large amounts of information in a short period of time. The
&di1 and &di2 courses are available to provide the same type of information in a much more detailed
approach over a longer period of time.

To learn more

For information on other courses in the curriculum, contact the SAS Education
Division at 1-800-333-7660, or send e-mail to training@sas.com. You can also
find this information on the Web at support.sas.com/training/ as well as in the
Training Course Catalog.

For a list of other SAS books that relate to the topics covered in this
Course Notes, USA customers can contact our SAS Publishing Department at
1-800-727-3228 or send e-mail to sasbook@sas.com. Customers outside the
USA, please contact your local SAS office.
Also, see the Publications Catalog on the Web at support.sas.com/pubs for a
complete list of books and a convenient order form.
For Your Information ix

Prerequisites
Experience with SAS programming, SQL processing, and the SAS macro facility is required. This
experience can be gained by completing the SAS Programming 1: Esstentials, SAS SQL 1: Essentials,
and SAS Macro Language 1: Essentials courses.
x For Your Information
Chapter 1 Introduction

1.1 Exploring the Platform for SAS Business Analytics ................................................... 1-3

1.2 Introduction to Data Integration Applications............................................................ 1-12

Demonstration: Using SAS Data Integration Studio ............................................................ 1-20

Demonstration: Using dfPower Explorer and dfPower Profile ............................................. 1-55

1.3 Introduction to Change Management ......................................................................... 1-81

Demonstration: Exploring the Basics of Change Management ........................................... 1-85
1-2 Chapter 1 Introduction
1.1 Exploring the Platform for SAS Business Analytics 1-3

1.1 Exploring the Platform for SAS Business Analytics

Objectives
Compare the two types of SAS installations.
Define the architecture of the platform for SAS
Business Analytics.
Describe the SAS platform applications used for data
integration, reporting, and analysis.

Two Flavors of SAS

With SAS9 there are two different types of SAS
installations.
SAS The traditional SAS installation, which
Foundation enables you to write SAS programs or
use a point-and-click application such as
SAS Enterprise Guide to assist with
program creation
Platform Enterprise software that utilizes multiple
for SAS machines throughout the organization
Business and consists of applications that help you
Analytics accomplish the various tasks for
accessing and creating information, as
well as creating analysis and reporting
4
1-4 Chapter 1 Introduction

SAS Windowing Environment

The SAS windowing environment can be used to develop
and run SAS programs.

SAS Enterprise Guide

SAS programs can also be developed using the
point-and-click interface in SAS Enterprise Guide.

6
1.1 Exploring the Platform for SAS Business Analytics 1-5

Platform for SAS Business Analytics

The platform for SAS Business Analytics is enterprise
software with components that exist on multiple machines
throughout the organization.

The platform for SAS Business Analytics is also known as the SAS Enterprise Intelligence
Platform and the SAS Intelligence Platform.

SAS Platform Architecture

The platform for SAS
Business Analytics
consists of
a multiple-tier
environment that is
typically represented
by the following:
client tier

middle tier

server tier

data tier

8
1-6 Chapter 1 Introduction

SAS Platform: Client Tier

In the most basic terms, if an application is installed on the machine where the user is sitting, that
machine is part of the client tier.

SAS platform applications cannot execute SAS code on their own. They must request code
submission and other services from a SAS server.

SAS Platform: Middle Tier

The middle tier is where the Web applications reside and execute.
The middle tier also contains the infrastructure that supports the execution of the Web browser
applications, including a Java Servlet Container (or Web Application Server), the Java Software
Development Kit (SDK), the SAS Content Server, and the SAS Services Application.

SAS Platform: Server Tier

The server tier consists of one or more machines where the SAS servers are installed and accessed by the
SAS platform applications. There are different types of SAS servers, including the metadata server, the
workspace server, the stored process server, and the OLAP server.

SAS Platform: Data Tier

The data tier contains the enterprise data sources, which might be in one or more of the following
formats:
SAS data sets
RDBMS (relational database management systems) tables
OLAP (Online Analytical Processing) cubes
SAS SPD Server (Scalable Performance Data Server) files
ERP (Enterprise Resource Planning) data structures

SAS Platform: Metadata

The SAS platform utilizes the metadata server and metadata repositories to manage information about the
entire environment, including server definitions, data definitions, users and groups, security settings, and
business intelligence content.
1.1 Exploring the Platform for SAS Business Analytics 1-7

SAS Platform Applications

Instead of having one large client application that does
everything for all people across the organization, there
are several applications to accomplish these tasks.

The SAS platform applications were created to organize

the functions of various job roles into the different
applications.

Some of these applications are installed on each users

machine; others are accessed using a Web browser.

SAS Platform Applications

The SAS platform applications include the following:
SAS Add-In for SAS Information Map
Microsoft Office Studio
SAS AppDev Studio SAS Management Console
Eclipse Plug-Ins SAS OLAP Cube Studio
SAS BI Dashboard SAS Visual BI (JMP)
SAS Data Integration SAS Web OLAP Viewer
Studio SAS Web Report Studio
SAS Enterprise Guide
dfPower Studio
SAS Information Delivery
Portal

SAS Add-In for The SAS Add-In for Microsoft Office enables business users to
Microsoft Office transparently leverage the power of SAS analytics, reporting, and data
access directly from Microsoft Office via integrated menus and toolbars.

SAS AppDev SAS AppDev Studio is a comprehensive, stand-alone development

Studio Eclipse environment for developing, deploying, and maintaining a wide variety of
Plug-Ins applications that leverage the full power of SAS.
1-8 Chapter 1 Introduction

SAS BI Dashboard SAS BI Dashboard is a point-and-click dashboard development application

that enables the creation of dashboards from a variety of data sources to
surface information visually.

SAS Data SAS Data Integration Studio enables a data warehouse developer to create
Integration Studio and manage metadata objects that define sources, targets, and the sequence
of steps for the extraction, transformation, and loading of data.

SAS Enterprise SAS Enterprise Guide provides a guided mechanism to exploit the power
Guide of SAS and publish dynamic results throughout the organization. SAS
Enterprise Guide can also be used for traditional SAS programming.

SAS Information The SAS Information Delivery Portal is a Web application that can surface
Delivery Portal the different types of business analytic content such as information maps,
stored processes, and reports.

SAS Information SAS Information Map Studio is used to build information maps, which
Map Studio shield business users from the complexities of the underlying data by
organizing and referencing data in business terms.

SAS Management SAS Management Console provides a single interface for managing the
Console metadata of the SAS Platform. Specific administrative tasks are supported
by plug-ins to the SAS Management Console.

SAS OLAP Cube SAS OLAP Cube Studio is used to create OLAP cubes, which are
Studio multidimensional structures of summarized data. The Cube Designer
provides a point-and-click interface for cube creation.

SAS Visual BI SAS Visual BI, powered by JMP software, provides dynamic business
(JMP) visualization, enabling business users to interactively explore ideas and
information, investigate patterns, and discover previously hidden facts
through visual queries.

SAS Web OLAP The SAS Web OLAP Viewer provides a Web interface for viewing and
Viewer exploring OLAP data. It enables business users to look at data from
multiple angles, view increasing levels of detail, and add linked graphs.

SAS Web Report SAS Web Report Studio provides intuitive and efficient access to query and
Studio reporting capabilities on the Web.

dfPower Studio dfPower Studio from DataFlux (a SAS company) combines advanced data-
profiling capabilities with proven data quality, integration, and
augmentation tools for incorporating data quality into a data collection and
management process.
1.1 Exploring the Platform for SAS Business Analytics 1-9

SAS Folder Structure

SAS applications use a hierarchy of SAS folders to store
metadata including the following:

Channels Cubes

Dashboards Data explorations

Folders Information maps

Jobs Libraries

OLAP schema Prompts

Reports Stored processes

Tables Table columns

SAS Folders Tree

The folders are arranged in a structure that:
segregates system information from business
information
provides personal folders for individual users

provides an area for shared data

12
1-10 Chapter 1 Introduction

The initial folder structure includes the following main components:

SAS Folders is the root folder for the folder structure. This folder cannot be renamed, moved or
deleted. It can contain other folders, but it cannot contain individual objects.

My Folder ( ) is a shortcut to the personal folder of the user who is currently logged on.

Products contains folders for individual SAS products. These folders contain content that is
installed along with the product. For example, some products have a set of initial
jobs, transformations, stored processes, or reports which users can modify for their
own purposes. Other products include sample content (for example, sample stored
processes) to demonstrate product capabilities. Where applicable, the content is
stored under the product's folder in subfolders that indicate the release number for
the product.

During installation, the SAS Deployment Wizard enables the installer to

assign a different name to this folder. Therefore, your Products folder might
have a different name.
Shared Data is provided for you to store user-created content that is shared among multiple users.
Under this folder, you can create any number of subfolders, each with the
appropriate permissions, to further organize this content.

You can also create additional folders under SAS Folders in which to store
shared content.
Follow these best practices when interacting with SAS folders:
Use personal folders for personal content, shared folders for content that multiple users need to view.
Use folders instead of custom repositories to organize content.
Do not delete or rename the Users folder.
Do not delete or rename the home folder or personal folder of an active user.
Do not delete or rename the Products or System folders or their subfolders.
Use caution when renaming the Shared Data folder.
When you create new folders, the security administrator should set permissions.
1.1 Exploring the Platform for SAS Business Analytics 1-11

Users, Groups, and Roles

In order to control access to SAS platform content, SAS
must know who is making each request and what type of
functionality has been requested.

Users A user is an individual person or service

identity.
Groups A group is a set of users. Groups provide
an easy method to specify permissions for
similar users.

Roles Roles are used to control access to

application features. An application feature
that is under role-based management is
called a capability.
13

Some important features of roles include:

not all applications support roles
not all application features are under role management
there are no negative capabilities and you cannot deny a capability to anyone
having a certain capability is not an alternative to meeting permission requirements
roles and groups serve distinct purposes
1-12 Chapter 1 Introduction

1.2 Introduction to Data Integration Applications

Objectives
State the purpose of SAS Data Integration Studio.
State the purpose of dfPower Studio.
Explore the available interfaces.

What Is SAS Data Integration Studio?

SAS Data Integration Studio is a visual design tool for
building, implementing, and managing data integration
processes,
regardless
of data
sources,
applications,
or platforms.

17
1.2 Introduction to Data Integration Applications 1-13

SAS Data Integration Studio and Metadata

Through its metadata, SAS Data Integration Studio
provides a single point of control for managing the
following resources:
data sources, from any platform that is accessible to
SAS and from any format that is accessible to SAS
data targets, to any platform that is accessible to SAS,
and to any format that is supported by SAS
processes that specify how data is extracted,
transformed, and loaded from a source to a target
jobs that organize a set of sources, targets, and
processes (transformations)
source code that is generated by SAS Data Integration
Studio
user-written source code
18

SAS Data Integration Studio Interface

SAS Data Integration Studio has many components
available in the desktop interface.

19
1-14 Chapter 1 Introduction

Title Bar, Menu Bar, Toolbar, Status Bar

Title Bar

Menu Bar

Toolbar

Status Bar

The title bar shows the current version of SAS Data Integration Studio, as well as the name of the current
connection profile.
The menu bar provides access to the drop-down menus. The list of active options varies according to the
current work area and the kind of object that you select. Inactive options are disabled or hidden.
The Toolbar provides access to shortcuts for items on the menu bar. The list of active options varies
according to the current work area and the kind of object that you select. Inactive options are disabled or
hidden.
The status bar displays the name of the currently selected object, the name of the default SAS Application
Server if one has been selected, the login ID and metadata identity of the current user, and the name of the
current SAS Metadata Server. To select a different SAS Application Server, double-click the name of that
server to display a dialog box. If the name of the SAS Metadata Server turns red, the connection is
broken. In that case, you can double-click the name of the metadata server to display a dialog box that
enables you to reconnect.
1.2 Introduction to Data Integration Applications 1-15

Tree View, Basic Properties Pane

Tree View

The tree view provides access to the Basic Properties pane, Folders tree, Inventory tree, Transformations
tree, and Checkouts tree.
The Basic Properties pane displays the basic properties of an object selected in a tree view. To surface
this pane, select View Basic Properties from the desktop.
The Folders tree organizes metadata into folders that are shared across a number of SAS applications.
The Inventory tree displays metadata for objects that are registered on the current metadata server, such as
tables and libraries. Metadata can be accessed in folders that group metadata by type, such as Table,
Library, and so on.
The Transformations tree displays transformations that can be dragged and dropped into SAS Data
Integration Studio jobs.
The Checkouts tree displays metadata that has been checked out for update, as well as any new metadata
that has not been checked in. The Checkouts tree is not displayed in above view of SAS Data Integration
Studio. The Checkouts tree automatically displays when you are working under change management.
1-16 Chapter 1 Introduction

Job Editor

The Job Editor window enables you to create, maintain, and troubleshoot SAS Data Integration Studio
jobs.
The Diagram tab is used to build and update the process flow for a job.
The Code tab is used to review or update code for a job.
The Log tab is used to review the log for a submitted job.
The Output tab is used to review the output of a submitted job.
The Details pane is used to monitor and debug a job in the Job Editor.

SAS Data Integration Studio Options

The Options window is used to specify global options for
SAS Data Integration Studio.

23
1.2 Introduction to Data Integration Applications 1-17

dfPower Tools in SAS Data Integration Studio

The Data Quality tab on the Options window has a setting
for accessing DataFlux dfPower tools.

When set, the Tools menu adds a choice to access dfPower

Tools.

What Is dfPower Studio?

dfPower Studio is a powerful, easy-to-use, suite of data
cleansing and data integration applications. With dfPower
Studio, you have access to various applications that can
help eliminate data quality problems.

25
1-18 Chapter 1 Introduction

dfPower Studio Application

The components that comprise dfPower Studio include
the following:
dfPower Explorer enables you to explore
relationships at database, table,
and column level.
dfPower Profile enables you to analyze your data
using a set of profiling metrics.
dfPower Customize enables you to interact with the
components of the Quality
Knowledge Base (QKB).
dfPower Architect enables you to define a sequence
of operations and run them at one
time.
26

dfPower Studio Interface

The dfPower Studio Navigator provides access to all
dfPower Studio objects. This navigator allows you access
to database connections, Quality Knowledge Bases
(QKBs), management resources, reference sources, and
more.

27
1.2 Introduction to Data Integration Applications 1-19

Navigation Tree View

The navigation tree view displays a variety of options
available to view in the dfPower Studio Navigator. Click
any of the options that have a + next to the item to view a
list of database connections, quality knowledge bases
(QKBs), and more.

Menus and Tools

The various applications to which dfPower Studio enables
access are listed as a tool in the tool bar or as a menu
choice from the Tools menu.

For example, the Tools Profile menu provides access

to dfPower Profile's Configurator, dfPower Profile's Viewer,
and dfPower Explorer.

29
1-20 Chapter 1 Introduction

Using SAS Data Integration Studio

This demonstration illustrates logging into SAS Data Integration Studio and investigating the interface by
using predefined metadata objects.
1. Select Start All Programs SAS SAS Data Integration Studio 4.2.
2. Log on using Brunos credentials.
a. Verify that the connection profile is My Server.

Do not click Set this connection profile as the default.

b. Click to close the Connection Profile window and to open the Log On window.

c. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.

Do not click Save user ID and password in this profile.

d. Click to close the Log On window.

1.2 Introduction to Data Integration Applications 1-21

SAS Data Integration Studio opens:

1-22 Chapter 1 Introduction

3. If necessary, click the Folders tab in the tree view area.

Some folders in the Folders tree are provided by default, such as My Folder, Products,
Shared Data, System, and Users.
Three folders (and subfolders) were added by an administrator: Chocolate Enterprises, Data
Mart Development, and Orion Star.
4. Click in front of the Data Mart Development folder to expand the folder.
1.2 Introduction to Data Integration Applications 1-23

5. Click in front of the DIFT Demo folder to expand the folder.

The DIFT Demo folder contains seven metadata objects: two library objects, four table objects, and
one job object.
Each metadata object has its own type of properties.
1-24 Chapter 1 Introduction

6. Single-click on the DIFT Test Table ORDER_ITEM table object. The Basic Properties pane
displays basic information for this table object.
1.2 Introduction to Data Integration Applications 1-25

7. Single-click on the DIFT Test Source Library library object. The Basic Properties pane displays
basic information for this library object.
1-26 Chapter 1 Introduction

8. Single-click on the DIFT Test Job OrderFact Table Plus job object. The Basic Properties pane
displays basic information for this library object.
1.2 Introduction to Data Integration Applications 1-27

9. Examine the properties of a table object in more detail.

a. Right-click on DIFT Test Table ORDER_ITEM and select Properties.

The name of the metadata table object is shown on the General tab, as well as the metadata folder
location.
1-28 Chapter 1 Introduction

b. Click the Columns tab.

The Columns tab displays the column attributes of the physical table. Note that all columns are
numeric.
1.2 Introduction to Data Integration Applications 1-29

c. Click the Physical Storage tab.

The Physical Storage tab displays the type of table, the library object name, and the name of the
physical table.

d. Click to close the properties window.

1-30 Chapter 1 Introduction

10. Right-click on DIFT Test Table ORDER_ITEM and select Open. The View Data window opens
and displays the data for this table.
1.2 Introduction to Data Integration Applications 1-31

The functions of the View Data window are controlled by the View Data toolbar:

The View Data toolbar contains the following items:

TOOL EXPLANATION
Specifies the number of the first row that is displayed in the table.

Positions the data with the Go-to row as the first data line displayed.

Navigates to the first record of data in the View Data window.

Navigates to the last page of data in the View Data window.

Switches to browse mode.

Switches to edit mode.

Displays the Search area.

Refreshes view of the data.

Displays the Search area.

Displays the Sort By Column tab in the View Data Options window.

Displays the Filter tab in the View Data Options window.

Displays the Columns tab in the View Data Options window.

Displays physical column names in the column headers.

You can display any combination of column metadata,
physical column names, and descriptions in the column headers.
Displays optional descriptions in the column headers.

Displays optional column metadata in the column headers. This metadata

can be entered in some SAS Intelligence Platform applications, such as
SAS Information Map Studio.
Toggles between showing formatted and unformatted data in the View
Data window.

11. To close the View Data window, select File Close (or click ).
1-32 Chapter 1 Introduction

12. Examine the properties of a library object in more detail.

a. Right-click on DIFT Test Source Library and select Properties.

The name of the metadata table object is shown on the General tab, as well as the metadata folder
location.
1.2 Introduction to Data Integration Applications 1-33

b. Click the Options tab.

The Options tab displays the library reference and the location of the physical path of this library.

c. Click to close the properties window.

1-34 Chapter 1 Introduction

13. Display the generated LIBNAME statement for this library object by right-clicking on
DIFT Test Source Library and selecting View Libname.

14. Click to close the Display Libname window.

1.2 Introduction to Data Integration Applications 1-35

15. Access the Job Editor window to examine the properties of the job objects in more detail.
a. Right-click on DIFT Test Job OrderFact Table Plus and select Open.

This job joins two source tables and then loads the result into a target table. The target table is then
used as the source for the Rank transformation, the result of the ranking is loaded into a target table,
sorted, and then a report is generated based on the rankings.
1-36 Chapter 1 Introduction

b. Click the DIFT Test Table ORDERS table object. Note that the Details area now has a
Columns tab.

The Columns tab in the Details area displays column attributes for the selected table object.
These attributes are fully editable in this location.
Similarly, selecting any of the table objects in the process flow diagram (DIFT Test Table
ORDERS, DIFT Test Table ORDER_ITEM, DIFT Test Target Order
Fact Table (in diagram twice), DIFT Test Target Ranked Order Fact)
displays a Columns tab for that table object.
1.2 Introduction to Data Integration Applications 1-37

c. Click the SQL Join transformation. Note that the Details area now has a Mappings tab.

The full functionality of the Mappings tab from the SQL Join Designer window is found on this
Mappings tab.
Similarly, selecting any of the transformations in the process flow diagram (SQL Join, Table
Loader, Rank, Sort, List Data) displays a Mappings tab for that transformation.
1-38 Chapter 1 Introduction

d. Run the job by clicking . As the transformations execute, they are highlighted to denote
which node is executing.

As each transformation finishes, the icon is decorated with a symbol to denote success or failure.
Those transformations that had errors are also outlined in red.

Also, the Status tab in the Details area provides the status for each part of the job that executed.
1.2 Introduction to Data Integration Applications 1-39

e. Double-click the word Error under Status for the Table Loader.

The Details area moves focus to Warnings and Errors tab. The error indicates that the physical
location for the target library does not exist.
1-40 Chapter 1 Introduction

f. Select DIFT Test Target Library found in the Data Mart Development DIFT Demo folder
on the Folders tab.
The Basic Properties pane displays a variety of information, including the physical path location.
1.2 Introduction to Data Integration Applications 1-41

g. Create the needed folder.

1) Open a Windows Explorer by selecting Start All Programs Accessories
Windows Explorer.
2) Navigate to S:\Workshop\dift.
3) Create a new folder by selecting File New Folder.
4) Type testdm as the name of the new folder and then press ENTER.

h. Run the entire job again by clicking . The Details area shows that all but the List Data
transformation completed successfully.

i. Double-click the word Error under Status for the List Data transformation.

The Details area moves focus to Warnings and Errors tab. The error indicates that the physical file
does not exist. However, because the file is to be created from the transformation, it is more likely
that the location for the file does not exist.
1-42 Chapter 1 Introduction

j. Create the needed folder.

1) If necessary, open a Windows Explorer by selecting Start All Programs Accessories
Windows Explorer.
2) Navigate to S:\Workshop\dift.
3) Create a new folder by selecting File New Folder.
4) Type reports as the name of the new folder and then press ENTER.

k. Run the List Data transformation.

1) Click the List Data transformation.

2) Click (the Run Selected Transformations tool).

The Status tab of the Details pane shows the transformation completed successfully.

l. Select File Close (or click ) to close the job editor window. If any changes were made while
viewing the job, the following window opens:

m. If necessary, click to not save changes to the job.

1.2 Introduction to Data Integration Applications 1-43

16. Investigate some of the options available for SAS Data Integration Studio by selecting Tools
Options.

The General tab of the Options window opens:

1-44 Chapter 1 Introduction

17. Examine the Show advanced property tabs option (this option is on the General tab of the Options
window).
a. If Show advanced property tabs is de-selected

then tabs such as Extended Attributes and Authorization do not appear in the
properties window for a specified object.

b. If Show advanced property tabs is selected

then tabs such as Extended Attributes and Authorization do appear in the

properties window for a specified object.
1.2 Introduction to Data Integration Applications 1-45

18. Examine the Enable row count on basic properties for tables option (this option is on the General
tab of the Options window).
a. If Enable row count on basic properties for tables is de-selected

then the Number of Rows field displays Row count is disabled for a selected
table object.
1-46 Chapter 1 Introduction

b. If Enable row count on basic properties for tables is selected

then the Number of Rows field displays the number of rows found for the selected table
object.
1.2 Introduction to Data Integration Applications 1-47

19. Click the SAS Server tab in the Options window.

a. Click to establish and/or test the application server connection for SAS Data
Integration Studio. An information window opens verifying a successful connection:

b. Click to close the Information window.

The application server can also be set and tested via the status bar. For example, if the
application server has not been defined, the status bar shows:
. Double-clicking on this area opens the Default Application
Server window. A selection can be made and tested.
1-48 Chapter 1 Introduction

20. Click the Job Editor tab.

The options on this tab affect the Job Editor.

1.2 Introduction to Data Integration Applications 1-49

a. Verify the default selection in the Nodes area is Collapse.

This results in objects on the Diagram tab, such as the following:

If Expand is selected in the Nodes area

the resultant objects in the Diagram area are then drawn as the following:
1-50 Chapter 1 Introduction

b. Verify that the default selection in the layout area is Left To Right.

This results in process flow diagrams going horizontally, such as the following:

If Top To Bottom is selected in the layout area

the process flow diagrams resemble the following:

1.2 Introduction to Data Integration Applications 1-51

21. Click the View Data tab.

The options on this tab affect how data are displayed in the View Data window.
1-52 Chapter 1 Introduction

a. Verify the default selection for the Column headers area is Show column name in column
header.

This results in column heading such as the following:

If Show column description in column header is selected in the Column headers area

the column headings resemble the following:

If both Show column name in column header and Show column description in column
header are selected in the Column headers area

the column headings resemble the following:

1.2 Introduction to Data Integration Applications 1-53

22. Click the Data Quality tab.

a. Verify that the following fields are set appropriately in the Data Quality area:
Default Locale:

ENUSA
DQ Setup Location:

C:\Program Files\SAS\SASFoundation\9.2\dquality\sasmisc\dqsetup.txt
Scheme Repository Type:

dfPower scheme (BFD)

Scheme Repository:

C:\Program Files\DataFlux\QltyKB\CI\2008A\scheme

b. Verify the path specified for DataFlux Installation Folder under the DataFlux
dfPower area.
DataFlux Installation Folder:

C:\Program Files\DataFlux\dfPower Studio\8.1

1-54 Chapter 1 Introduction

23. Select the Tools menu. Note that there is an item, dfPower Tool, that provides direct access to many
of the DataFlux dfPower Studio applications.
1.2 Introduction to Data Integration Applications 1-55

Using dfPower Explorer and dfPower Profile

Introducing dfPower Explorer

1. From SAS Data Integration Studio session, select Tools dfPower Tool dfPower Explorer.
2. Create a new project
a. Click File New Project.

b. Verify that DIFT Repository is the selected Repository.

1-56 Chapter 1 Introduction

If the DIFT Repository has not been created, then follow these steps to create and
initialize it:
From SAS Data Integration Studio, select dfPower Tool dfPower Studio.
In the Navigation area, right-click on Repositories and select New Repository.

Enter a name of DIFT Repository.

Select File as the type.
Navigate to S:\Workshop\dift and create a new folder named diftrepos.
Navigate to S:\Workshop\dift\diftrepos and enter a filename of
DIFTRepository.rps.
Click .
Click . The repository is initialized.
Click .
1.2 Introduction to Data Integration Applications 1-57

a. Click .

1) Type DIFT Orion Detail as the value for the Description field.

2) Click next to the Directory field. The Browse for Folder window opens.

3) Navigate to S:\Workshop\OrionStar\ordetail.

4) Click to close the Browse for Folder window. The Directory field displays
the selected path.

5) Click to close the Add SAS Data Set Directory window.

1-58 Chapter 1 Introduction

b. Expand DIFT Orion Detail.

1.2 Introduction to Data Integration Applications 1-59

c. Double-click CUSTOMER, ORDER_ITEM, ORDERS and PRODUCT_LIST. The tables are

moved to the Selected list.

d. Click .
1-60 Chapter 1 Introduction

e. Click Primary key metadata.

f. Click Foreign key metadata.
g. Click Index metadata.
The final settings resemble the following:

h. Click .
1.2 Introduction to Data Integration Applications 1-61

i. Type DIFT Orion Detail Project as the value for the Project name field.

j. Type DIFT Orion Detail Project as the value for the Description field.

k. Click .
1-62 Chapter 1 Introduction

The results are displayed in dfPower Explorer. Four tables were analyzed, and in the four tables there
are thirty columns.
1.2 Introduction to Data Integration Applications 1-63

2. Expand Databases in the Project Metadata panel.

3. Click DIFT Orion Detail.

The selected table, CUSTOMER, has one matching table, ORDERS.

1-64 Chapter 1 Introduction

4. Click the ORDERS table in the Database area.

There are two matching tables for the ORDERS table.

1.2 Introduction to Data Integration Applications 1-65

5. Click the ORDER_ITEM table in the Matching Tables area. Having both tables selected
displays the relationship between the two tables.

The Order_ID column could potentially link these tables.

1-66 Chapter 1 Introduction

6. Click ORDER_ITEM table in the Database area.

Note that two tables are potentially matches.

1.2 Introduction to Data Integration Applications 1-67

7. Click PRODUCT_LIST in the Matching Tables area. The Product_ID column could
potentially link these tables.

8. Select File Exit to close dfPower Explorer.

1-68 Chapter 1 Introduction

Introducing dfPower Profile

Before initiating any data warehousing project, it is important to first examine the data and identify any
potential issues that may exist.

1. From dfPower Explorer, right-click on the CUSTOMER table in the Database area and select
Add Table to Profile Task.

The table and all its columns get added to the Profile Job Definition & Notes area.
1.2 Introduction to Data Integration Applications 1-69

2. From dfPower Explorer, right-click on the ORDER_ITEM table in the Database area and select
Add Table to Profile Task.

3. From dfPower Explorer, right-click on the PRODUCT_LIST table in the Database area and select
Add Table to Profile Task.

4. Collapse the listing of columns for each of the tables in the Profile Job Definition &
Notes area.

5. Right-click on the Databases keyword and select Create Profile Job.

1-70 Chapter 1 Introduction

6. Type DIFT Orion Detail Information as the value for the Name field.

7. Click . A message window confirms creation of the profile job.

8. Click to close the message window.

9. Select File Exit to close dfPower Explorer.

1.2 Introduction to Data Integration Applications 1-71

10. From SAS Data Integration Studio session, select Tools dfPower Tool
dfPower Profile (Configurator).

11. Select File Open.

12. Single-click DIFT Orion Detail Information.

13. Click .
1-72 Chapter 1 Introduction

If a dfPower Profile job is not available (for instance, one was not created using dfPower
Explorer), SAS data can be added by using the following steps:
Select Insert SAS Data Set Directory.
Type DIFT Orion Detail Data as the value for the Description field.
Click next to the Directory field. The Browse for Folder window opens.
Navigate to S:\Workshop\OrionStar\ordetail.
Click to close the Browse for Folder window.
The Directory field displays the selected path.
Click to close the Insert SAS Data Set Directory window.

The link to the SAS Data Set Directory appears in the database listing.
14. Expand the DIFT Orion Detail data source. A list of available SAS tables is displayed. The ones
selected are the ones added from dfPower Explorer.
1.2 Introduction to Data Integration Applications 1-73

15. Select Job Select Metrics. The Metrics window opens.

16. Click Frequency distribution and Pattern frequency distribution.
17. Click Select/unselect all.

18. Click to close the Metrics window.

1-74 Chapter 1 Introduction

19. Select Job Run Job.

The Run Job dialog box opens.

If you did not open an existing job in dfPower Profile (Configurator) and you attempt to run a
job, a warning window opens.

Clicking opens the Save As window. Typing a valid name and then clicking
displays the Run Job window as above.
1.2 Introduction to Data Integration Applications 1-75

20. Verify that Standard output is selected.

21. Accept the default name for the report that matches the job name (a good best practice).
22. Verify that Append to report (if already exists) is selected.
23. Verify that Launch viewer when processing is complete is selected.

24. Click to close the Run Job window. The Executor executes the job.
1-76 Chapter 1 Introduction

The results are displayed in a dfPower Profile (Viewer) window.

1.2 Introduction to Data Integration Applications 1-77

25. Select CUSTOMER in the Tables pane.

The columns from the CUSTOMER table are listed in the Tables area with a tabular view of the each
column and its calculated statistics.
1-78 Chapter 1 Introduction

26. Select the Country column.

27. Click the Frequency Distribution tab to display frequency counts for the Country column.

28. Click the Pattern Frequency Distribution tab.

1.2 Introduction to Data Integration Applications 1-79

29. Select Customer_ID column.

30. Locate the Uniqueness and Primary Key Candidate statistics.
1-80 Chapter 1 Introduction

31. Select Tools Visualize Table.

32. In the Metrics area, select only the Data Length, Maximum Length and Minimum Length
statistics.

33. In the Fields area, select only Customer_Address, Customer_FirstName, Customer_LastName

and Customer_Name.

34. Click .

35. Select File Exit to close the dfPower Profile (Viewer) window.
36. Select File Exit to close the dfPower Profile (Configurator) window.
1.3 Introduction to Change Management 1-81

1.3 Introduction to Change Management

Objectives
Define the change management feature of
SAS Data Integration Studio.

SAS Data Integration Studio and Metadata

SAS Data Integration Studio enables you to create
metadata objects that define sources, targets, and the
transformations that connect them. These objects are
saved to one or more metadata repositories.

36
1-82 Chapter 1 Introduction

Change Management
The Change Management facility in SAS Data Integration
Studio enables multiple SAS Data Integration Studio
users to work with the same metadata repository at the
same time without overwriting each other's changes.

Change Management
Metadata
Repository

Working with Change Management

Under change management, most users are restricted
from adding or updating the metadata in a change-
managed repository.
Authorized users, however, can
add new metadata objects and check them into the
change-managed repository
check out metadata objects to update them.

38
1.3 Introduction to Change Management 1-83

Checkouts Tree
If you are authorized to work with a project repository, a
Checkouts tree is added to the desktop of SAS Data
Integration Studio.
The Checkouts tree displays metadata in your project
repository, which is an individual work area or playpen.

Checked Out Objects

To update a metadata object in the change-managed
repository, the object needs to be checked out. When
checked out, the object is locked in the change-managed
repository, and a copy is placed in the Checkouts tree.
Metadata that has been checked out for update has a
check mark beside it, such as the three objects shown in
the following display capture.

40
1-84 Chapter 1 Introduction

New Metadata Objects

New metadata objects can be added to a change-
managed repository as usual, with the new metadata
added to the Checkouts tree.
New metadata objects that have never been checked in
do not have a check mark beside them, such as the table
object shown here.

41
1.3 Introduction to Change Management 1-85

Exploring the Basics of Change Management

Access SAS Data Integration Studio Using Barbaras Project Repository

1. Select Start All Programs SAS SAS Data Integration Studio 4.2.
2. Log on using Barbaras credentials to access her project repository.
a. Select Barbaras Work Repository as the connection profile.

Do not click Set this connection profile as the default.

b. Click to close the Connection Profile window and open the Log On window.

c. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.

Do not click Save user ID and password in this profile.

d. Click to close the Log On window.

1-86 Chapter 1 Introduction

SAS Data Integration Studio opens.

1.3 Introduction to Change Management 1-87

3. Double-click on the application server area of the status bar to open the Default Application Server
window.

The Default Application Server window opens.

4. Verify that SASApp is selected as the value for the Server field.

5. Click .

6. Click to close the Information window.

7. Click to close the Default Application Server window. The status bar updates to be the
following:
1-88 Chapter 1 Introduction

8. Verify that the tree view area now has a Checkouts tab.

This tab displays metadata objects checked out of the parent repository, as well as any new objects
that Barbara creates.
9. If necessary, click the Folders tab.
10. Expand the Data Mart Development DIFT Demo folders.

11. Select the DIFT Test Job OrderFact Table Plus job, hold down the CTRL key, and select both
DIFT Test Source Library and DIFT Test Table ORDER_ITEM.
1.3 Introduction to Change Management 1-89

12. Right-click on one of the selected items and select Check Out.

The icons for the three objects are decorated with a check ( ).
1-90 Chapter 1 Introduction

13. Click the Checkouts tab.

The three checked out objects are shown on the Checkouts tab.

14. Right-click on DIFT Test Table ORDER_ITEM and select Properties.

15. Type Testing out Change Management as the value for the Description field.

16. Click to close the Properties window.

1.3 Introduction to Change Management 1-91

17. Right-click on DIFT Test Table ORDER_ITEM and select Check In (optionally, select
Check Outs Check In with the table object selected).
The Check In Wizard opens.
1-92 Chapter 1 Introduction

18. Type Testing out Change Management as the value for the Title field.

19. Type Showing off features of Change Management simply added a

description to the table object as the value for the Description field.

20. Click .
1.3 Introduction to Change Management 1-93

21. Verify that DIFT Test Table ORDER_ITEM is selected.

22. Click .
1-94 Chapter 1 Introduction

23. Review the Summary window.

24. Click .

The Checkouts tab no longer has this table object.

1.3 Introduction to Change Management 1-95

Access SAS Data Integration Studio Using Oles Project Repository

1. Select Start All Programs SAS SAS Data Integration Studio 4.2.
2. Log on using Oles credentials to access his project repository.
a. Select Oles Work Repository as the connection profile.

Do not click Set this connection profile as the default.

b. Click to close the Connection Profile window and open the Log On window.

c. Type Ole as the value for the User ID field and Student1 as the value for the Password
field.

Do not click Save user ID and password in this profile.

d. Click to close the Log On window.

1-96 Chapter 1 Introduction

SAS Data Integration Studio opens.

1.3 Introduction to Change Management 1-97

3. Verify that the default application server is set to SASApp.

4. Verify that the tree view area now has a Checkouts tab.

5. If necessary, click the Folders tab.

6. Expand Data Mart Development DIFT Demo. Two objects are checked out (by Barbara).
1-98 Chapter 1 Introduction

7. Right-clicking on DIFT Test Source Library (or on DIFT Test Job OrderFact Table Plus) shows
that the Check Out option is not available for this checked out object.

8. Right-click on DIFT Test Source Library and select History.

Ole can tell that Barbara has the object checked out.
9. Select File Close (or click ) to close the History window.
1.3 Introduction to Change Management 1-99

10. Right-click on DIFT Test Table ORDER_ITEM and select History.

Ole can tell that Barbara had this object checked out and that it was checked back in. The title and
description information filled in by Barbara in the Check In Wizard can give Ole an idea on what
updates Barbara made to this metadata object.
11. Select File Close (or click ) to close the History window.

12. Right-click on DIFT Test Table ORDER_ITEM and select Check Out.
13. Click the Checkouts tab and verify that the table object is available for editing.
1-100 Chapter 1 Introduction

14. Right-click on DIFT Test Table ORDER_ITEM and select Properties.

15. Clear the text in the Description field.

16. Click to close the Properties window.

1.3 Introduction to Change Management 1-101

Access SAS Data Integration Studio Using Ahmeds Credentials

1. Select Start All Programs SAS SAS Data Integration Studio 4.2.
2. Log on using Ahmeds credentials (an administrator) to access the Foundation repository.
a. Select My Server as the connection profile.

b. Click to close the Connection Profile window and open the Log On window.

c. Type Ahmed as the value for the User ID field and Student1 as the value for the
Password field.

d. Click to close the Log On window.

SAS Data Integration Studio opens.

Ahmed does not see a Checkouts tab.

1-102 Chapter 1 Introduction

3. If necessary, click the Folders tab.

4. Expand Data Mart Development DIFT Demo.
Three objects are checked out (by Barbara and Ole).
1.3 Introduction to Change Management 1-103

5. Select Check Outs Clear Projects.

As an administrator, Ahmed can clear project repositories for all team members.

Clearing a project repository unlocks checked out objects (any changes made to these checked out
objects will not be saved) and deletes any new objects that may have been created in the project
repository.
6. Select both repositories (that is, select Barbaras Work Repository, hold down the CTRL key, and
select Oles Work Repository).

7. Click .
1-104 Chapter 1 Introduction

8. Verify that the checked out objects are no longer checked out.
1.3 Introduction to Change Management 1-105

9. Right-click on DIFT Test Table ORDER_ITEM and select Properties.

10. Verify that the change Ole made to the Description field was not saved.

11. Click to close the Properties window.

12. Select File Exit to close Ahmeds SAS Data Integration Studio session.
1-106 Chapter 1 Introduction

13. Access Oles SAS Data Integration Studio session.

This session has not been refreshed, so it looks like Ole still has an object checked out.

14. Select View Refresh. The Checkouts tab was active, so the metadata for the project repository is
refreshed.
1.3 Introduction to Change Management 1-107

15. Click the Folders tab.

16. Select View Refresh. The metadata are updated to show that no objects (in this folder) are checked
out.

17. Select File Exit to close Oles SAS Data Integration Studio session.
1-108 Chapter 1 Introduction

18. Access Barbaras SAS Data Integration Studio session.

The metadata in the Foundation repository has been refreshed so the Data Mart Development
DIFT Demo folder displays no checked out objects.
However, the Checkouts tab, representing a different repository, still shows objects as being checked
out.

19. Click the Checkouts tab.

20. Select View Refresh. The Checkouts tab was active, so the metadata for the project repository is
refreshed.

21. Select File Exit to close Barbaras SAS Data Integration Studio session.
Chapter 2 Introduction to Course
Data and Course Scenario

2.1 Introduction to Classroom Environment and Course Data ........................................ 2-3

2.2 Course Tasks .................................................................................................................. 2-8

Exercises .............................................................................................................................. 2-14

2.3 Solutions to Exercises ................................................................................................. 2-26

2-2 Chapter 2 Introduction to Course Data and Course Scenario
2.1 Introduction to Classroom Environment and Course Data 2-3

2.1 Introduction to Classroom Environment and Course

Data

Objectives
Define common job roles.
Define the classroom environment.
Explore the course scenario.

SAS Platform Job Roles

Most organizations have multiple types of job roles. The
various components that make up the platform for SAS
Business Analytics fall into one or more of the following
job roles:
BI Applications Developer

Business Analyst

Business User

Data Integration Architect

Data Integration Developer

Platform Administrator

Project Manager

This class concentrates on the

Data Integration Developer job role.
5
2-4 Chapter 2 Introduction to Course Data and Course Scenario

Data Integration Developer

Data integration developers collect, cleanse,
and store the data required for reporting and
analysis.

The software and applications primarily used by

data integration developers include the following:
SAS Data Integration Studio

SAS Data Quality Solution

SAS Scalable Performance Data Server

Classroom Environment
During this course, you will use a classroom machine on
which the SAS platform has been installed and configured
in a single machine environment.

The single machine environment provides an easy way

for each student to learn how to interact with the SAS
platform without impacting each other.

The classroom environment includes the following

predefined elements in the SAS metadata:
users for the various job roles

groups

metadata folders with data and report objects

a basic security model

7
2.1 Introduction to Classroom Environment and Course Data 2-5

Course Data
The data used in the course is from a fictitious global
sports and outdoors retailer named Orion Star Sports &
Outdoors.

Orion Star has traditional stores, an online store, and a

large catalog business.

The corporate headquarters is located in the United

States with offices and stores in many countries
throughout the world.

Course Data
The Orion Star data used in the course consists of the
following:
data ranging from 2003 through 2007

employee information for the employees located in

many countries as well as in the United States
headquarters
approximately 5,500 different sports and outdoor
products
approximately 90,000 customers worldwide

approximately 750,000 orders

64 suppliers

9
2-6 Chapter 2 Introduction to Course Data and Course Scenario

Orion Star Users and Groups

In addition to the users and groups that are part of the
SAS platform
installation, users
and groups for
Orion Star have
been defined in
the metadata.

The Orion Star

users each belong
to one or more job
roles.

10 continued...

Orion Star Users and Groups

The Orion Star users have been organized into several
metadata groups.

11
2.1 Introduction to Classroom Environment and Course Data 2-7

Orion Star Metadata Folders

SAS metadata folders will be used to organize the
metadata objects needed to development a data mart.
You will work with a predefined
folder (Data Mart Development)
and will create subfolders under
this grouping to further organize
the metadata objects.

Course Scenario
During this course, you will have the opportunity to learn
about SAS Data Integration Studio as a data integration
developer.

The course consists of follow-along demonstrations,

exercises to reinforce the demonstrations, and a case
study to practice what you have learned.

13
2-8 Chapter 2 Introduction to Course Data and Course Scenario

2.2 Course Tasks

Objectives
Define the tasks for the course scenario.
Define the data model to be used for the data mart.

Course Tasks
There are several main steps you will accomplish during
this class.
Step 1: Register metadata for source tables.
Step 2: Register metadata for target tables.
Step 3: Create jobs to load tables.
Step 4: Investigate a variety of transformations.
Step 5: Investigate table relationships.
Step 6: Investigate slowly changing dimensions.
Step 7: Develop user-defined transformations.
Step 8: Deploy jobs.

17
2.2 Course Tasks 2-9

Course Tasks Step 1

Step 1: Register metadata for source tables.

There are many types of data sources that can be used

to create tables in a data mart. The Register Tables
wizards enable you to register one or more selected
tables, based on the physical structure of the tables.

For this step, you will investigate each of the following

types of source data:
SAS tables
ODBC data source
You will also use the External File wizards.

Course Tasks Step 2

Step 2: Register metadata for target tables.

The New Tables wizard allows you to register metadata

for a new table. The metadata can include the following:
n Type of table to be constructed
n Library for the new table
n Name of the new table
n Columns for the new table
n Indexes for the new table

For this step, you will define metadata such as the above
several target tables.

19
2-10 Chapter 2 Introduction to Course Data and Course Scenario

Course Tasks Step 3

Step 3: Create jobs to load tables.

The process flow to load or populate a target table from

sources can be defined in the Job Editor window. The
process flows will involve the metadata objects for the
source tables, for the target tables and one or more
transformations. The combination of the sources, targets
and transformations will specify how the extraction,
transformation and loading of the targets will occur.

Course Tasks Step 4

Step 4: Investigate a variety of transformations.

There are many transformations available by default

within SAS Data Integration Studio. For this step, you will
investigate each of the following transformations:
Extract
Summary Statistics
Loop/Loop End
Return Code Check
Data Validation
Transpose
Append
Sort
Rank
Apply Lookup Standardization
21
2.2 Course Tasks 2-11

Course Tasks Step 5

Step 5: Investigate table relationships.

Table maintenance is important for a data integration

developer. Various properties can be defined for each
table that can include the following:
integrity constraints
keys and indexes
load techniques

Course Tasks Step 6

Step 6: Investigate slowly changing dimensions.

During data modeling for a data warehouse, dimension

tables may be defined as slowly changing dimensions.
SAS Data Integration Studio provides transformations for
loading and maintaining dimension and fact tables.

23
2-12 Chapter 2 Introduction to Course Data and Course Scenario

Course Tasks Step 7

Step 7: Develop user-defined transformations.

While SAS Data Integration Studio provides a large

library of transformations, it is possible to add to this
library using the New Transformation Wizard.

Course Tasks Step 8

Step 8: Deploy jobs.

SAS Data Integration Studio jobs can be deployed to run

in a batch environment. Jobs can also be converted in to
a SAS stored process. For this step you will investigate
both of the deployment techniques.

25
2.2 Course Tasks 2-13

Orion Star Target Star Schema

One goal of this class is to produce, from the Orion Star
source data, a dimensional data model that is a star
schema.

Organization Customer
Dimension Dimension

Order
Fact Table

Product Time
Dimension Dimension

Data Modeling Technique

There are many data modeling techniques that can be
employed to develop meaningful tables for data
warehouses.
One simple technique is to gather a list of questions that
need to be answered in each area that is to be a
dimension (for example, the product area that is to be the
product dimension).
These questions can be used to help formulate what is
needed column-wise for the data warehouse tables. A
study can then be conducted using available source table
columns to design the sources for the target dimension
table, as well as any calculated columns.

27
2-14 Chapter 2 Introduction to Course Data and Course Scenario

Exercises

1. Listing Data Items to Address Business Needs (Products Pricing Inventory)

List the data item(s) required to address product-related and supplier-related business needs. Will
the data item be a category over which you will summarize or a value which you will analyze?
Some business intelligence needs to be addressed are as follows:
Which products are sold? How often, when, where, and by whom? This might include Top/Bottom
X reports.
Which products are not sold? Where and when?
Is there profitability by product, product group, product category, product line (sales price versus
cost price related to amounts sold), and so on?
Do discounts raise the sales? (Which product groups/categories, time periods, countries, sales
channels?)
Does discounting yield greater profitability? (Which product groups/categories, time periods,
countries, sales channels?)
Are you profitable for sales of products from the particular suppliers? How much? Who? Can
you negotiate lower cost prices or higher sales prices?
Should you drop some suppliers? (Are sales low for products from particular suppliers?)
How much money do you spend per supplier? Are there reasons to negotiate better prices?
2.2 Course Tasks 2-15

Product Table
Place an X in the column to indicate whether the data item will be used to classify data or as an
analysis variable. Add any additional data items that you think are needed.

Data Column Categorical Data? Analysis Data?

Product_ID

Product_Name

Product_Group

Product_Category

Product_Line

Supplier_ID

Supplier_Name

Supplier_Country

Discount

Total_Retail_Price

CostPrice_Per_Unit
2-16 Chapter 2 Introduction to Course Data and Course Scenario

The following table contains a data dictionary for the columns of source tables.
Column Table Type Length Format Label
Birth_Date CUSTOMER Num 4 DATE9. Customer Birth
Date
STAFF Num 4 DATE9. Employee Birth
Date
City_ID CITY Num 8 City ID
POSTAL_CODE Num 8 City ID
STREET_CODE Num 8 City ID
City_Name CITY char 30 City Name
POSTAL_CODE char 30 City Name
STREET_CODE char 30 City Name
Continent_ID CONTINENT Num 4 Continent ID
COUNTRY Num 4 Numeric Rep.
for Continent
Continent_Name CONTINENT char 30 Continent Name
CostPrice_Per_Unit ORDER_ITEM Num 8 DOLLAR13.2 Cost Price Per Unit
Count STREET_CODE Num 4 Frequency
Country CITY char 2 $COUNTRY. Country
COUNTRY char 2 Country
Abbreviation
CUSTOMER char 2 $COUNTRY. Customer Country
GEO_TYPE char 2 Country
Abbreviation
HOLIDAY char 2 Country's Holidays
ORGANIZATION char 2 $COUNTRY. Country
Abbreviation
STATE char 2 $COUNTRY. Abbreviated
Country
STREET_CODE char 2 $COUNTRY. Abbreviated
Country
SUPPLIER char 2 $COUNTRY. Country
Country_Former_Name COUNTRY char 30 Former Name
of Country
Country_ID COUNTRY Num 4 Country ID
2.2 Course Tasks 2-17

Column Table Type Length Format Label

Country_Name COUNTRY char 30 Current Name
of Country
County_ID COUNTY Num 8 County ID
STREET_CODE Num 8 County ID
County_Name COUNTY char 60 County Name
County_Type COUNTY Num 4 GEO_TYPE. County Type
Customer_Address CUSTOMER char 45 Customer Address
Customer_FirstName CUSTOMER char 20 Customer First
Name
Customer_Group CUSTOMER_TYPE char 40 Customer Group
Name
Customer_Group_ID CUSTOMER_TYPE Num 3 Customer Group ID
Customer_ID CUSTOMER Num 8 Customer ID
ORDERS Num 8 Customer ID
Customer_LastName CUSTOMER char 30 Customer Last
Name
Customer_Name CUSTOMER char 40 Customer Name
Customer_Type CUSTOMER_TYPE char 40 Customer Type
Name
Customer_Type_ID CUSTOMER Num 3 Customer Type ID
CUSTOMER_TYPE Num 3 Customer Type ID
Delivery_Date ORDERS Num 4 DATE9. Date Order was
Delivered
Discount DISCOUNT Num 8 PERCENT. Discount as Percent
of
Normal Retail Sales
Price
ORDER_ITEM Num 8 PERCENT. Discount in percent
of
Normal Total Retail
Price
Emp_Hire_Date STAFF Num 4 DATE9. Employee Hire
Date
Emp_Term_Date STAFF Num 4 DATE9. Employee
Termination
Date
Employee_ID ORDERS Num 5 Employee ID
2-18 Chapter 2 Introduction to Course Data and Course Scenario

Column Table Type Length Format Label

ORGANIZATION Num 8 Employee ID
STAFF Num 8 Employee ID
End_Date DISCOUNT Num 5 DATE9. End Date
ORGANIZATION Num 5 DATE9. End Date
PRICE_LIST Num 5 DATE9. End Date
STAFF Num 5 DATE9. End Date
Factor PRICE_LIST Num 8 Yearly increase
in Price
Fmtname HOLIDAY char 7 Format Name
From_Street_Num STREET_CODE Num 4 From Street
Number
Gender CUSTOMER char 1 $GENDER. Customer Gender
STAFF char 1 $GENDER. Employee Gender
Geo_Type_ID GEO_TYPE Num 4 Geographical Type
ID
Geo_Type_Name GEO_TYPE char 20 Geographical Type
Name
HLO HOLIDAY char 1 High Low Other
Job_Title STAFF char 25 Employee Job Title
Label HOLIDAY char 40 Holiday Name
Manager_ID STAFF Num 8 Manager for
Employee
Order_Date ORDERS Num 4 DATE9. Date Order was
placed
by Customer
Order_ID ORDERS Num 8 Order ID
ORDER_ITEM Num 8 Order ID
Order_Item_Num ORDER_ITEM Num 3 Order Item Number
Order_Type ORDERS Num 3 ORDER_TYPE. Order Type
Org_Level ORGANIZATION Num 3 Organization Level
Number
ORG_LEVEL Num 3 Organization Level
Number
Org_Level_Name ORG_LEVEL char 40 Organization Level
Name
2.2 Course Tasks 2-19

Column Table Type Length Format Label

Org_Name ORGANIZATION char 40 Organization Name
Org_Ref_ID ORGANIZATION Num 8 Organization
Reference
ID
Personal_ID CUSTOMER char 15 Personal ID
Population COUNTRY Num 8 COMMA12. Population
(approx.)
Postal_Code POSTAL_CODE char 10 Postal Code
STREET_CODE char 10 Postal Code
Postal_Code_ID POSTAL_CODE Num 8 Postal Code ID
STREET_CODE Num 8 Postal Code ID
Product_ID DISCOUNT Num 8 Product ID
ORDER_ITEM Num 8 Product ID
PRICE_LIST Num 8 Product ID
PRODUCT Num 8 Product ID
PRODUCT_LIST Num 8 Product ID
Product_Level PRODUCT Num 3 Product Level
PRODUCT_LEVEL Num 3 Product Level
PRODUCT_LIST Num 3 Product Level
Product_Level_Name PRODUCT_LEVEL char 30 Product Level
Name
Product_Name PRODUCT char 45 Product Name
PRODUCT_LIST char 45 Product Name
Product_Ref_ID PRODUCT Num 8 Product Reference
ID
PRODUCT_LIST Num 8 Product Reference
ID
Quantity ORDER_ITEM Num 3 Quantity Ordered
Salary STAFF Num 8 DOLLAR12. Employee Annual
Salary
Start HOLIDAY Num 8 DATE9.
Start_Date DISCOUNT Num 4 DATE9. Start Date
ORGANIZATION Num 4 DATE9. Start Date
PRICE_LIST Num 4 DATE9. Start Date
2-20 Chapter 2 Introduction to Course Data and Course Scenario

Column Table Type Length Format Label

STAFF Num 4 DATE9. Start Date
State_Code STATE char 2 State Code
State_ID STATE Num 8 State ID
State_Name STATE char 30 State Name
State_Type STATE Num 4 GEO_TYPE. State Type
Street_ID CUSTOMER Num 8 Street ID
STREET_CODE Num 8 Street ID
SUPPLIER Num 8 Street ID
Street_Name STREET_CODE char 40 Street Name
Street_Number CUSTOMER char 8 Street Number
Sup_Street_Number SUPPLIER char 8 Supplier Street
Number
Supplier_Address SUPPLIER char 45 Supplier Address
Supplier_ID PRODUCT Num 4 Supplier ID
PRODUCT_LIST Num 4 Supplier ID
SUPPLIER Num 4 Supplier ID
Supplier_Name SUPPLIER char 30 Supplier Name
To_Street_Num STREET_CODE Num 4 To Street Number
Total_Retail_Price ORDER_ITEM Num 8 DOLLAR13.2 Total Retail Price
for This Product
Type HOLIDAY char 1 Type
Numeric/Character
Unit_Cost_Price PRICE_LIST Num 8 DOLLAR13.2 Unit Cost Price
Unit_Sales_Price DISCOUNT Num 8 DOLLAR13.2 Discount Retail
Sales
Price per Unit
PRICE_LIST Num 8 DOLLAR13.2 Unit Sales Price
2.2 Course Tasks 2-21

2. Planning for the Product Dimension Table

The columns and column attributes specified in the star-schema model for the product dimension
table are shown below:
Column Type Length Format Label
Product_ID num 8 Product ID
Product_Category char 25 Product Category
Product_Group char 25 Product Group
Product_Line char 20 Product Line
Product_Name char 45 Product Name
Supplier_Country char 2 $COUNTRY. Supplier Country
Supplier_ID num 4 Supplier ID
Supplier_Name char 30 Supplier Name

a. Complete the table by listing the source tables and the columns in those tables that are involved in
determining the values that will be loaded in the ProdDim table.
Target Source Source Computed
Column Table Column Column? (X)
Product_ID

Product_Category

Product_Group

Product_Line

Product_Name

Supplier_Country

Supplier_ID

Supplier_Name
2-22 Chapter 2 Introduction to Course Data and Course Scenario

b. Sketch the diagram for the product dimension table. Show the input data source(s) as well as the
desired calculated columns, and the target table (product dimension table).
Diagram for the Product Dimension Table:
2.2 Course Tasks 2-23

Product Dimension Table Diagram

The Product Dimension
table process was ProdDim
sketched to be this. Computed
The source table and Columns
target table objects need Product_Category
to be defined in Product_Group
metadata, as well as the Product_Line
process flow.
There are three
computed columns
defined for the target
table.
Product_List Supplier Information

Order Fact Table Diagram

The Order Fact table
process was sketched OrderFact
to be this. Computed
The source table and Columns
target table objects
need to be defined in NONE!
metadata, as well as
the process flow.
There are no computed
columns defined for the
target table.
Orders Order_Item

31
2-24 Chapter 2 Introduction to Course Data and Course Scenario

Customer Dimension Table Diagram

The Customer Dimension
table process was CustDim
sketched to be this. Computed
The source table and Columns
target table objects need
to be defined in Customer_Age
Customer_Age_Group
metadata, as well as the
process flow.
There are two computed
columns defined for the
target table.
Customer Customer Types

Organization Dimension Table Diagram

The Organization
Dimension table process OrgDim
was sketched to be this. Computed
The source table and Columns
target table objects need Company
to be defined in Department
Group
metadata, as well as the
Section
process flow.
There are four computed
columns defined for the
target table.
Organization Staff

33
2.2 Course Tasks 2-25

Time Dimension Table Diagram

The Time Dimension
table process was TimeDim
sketched to be this. Computed
The target table Columns
metadata object must All Columns!!
be created, as well as (User-written code
generates the
the job that uses user-
TimeDim table.)
written code to load
the target.

34
2-26 Chapter 2 Introduction to Course Data and Course Scenario

2.3 Solutions to Exercises

1. Listing Data Items to Address Business Needs (Products Pricing Inventory)

Data Column Categorical Data? Analysis Data?

Product_ID X
Product_Name X
Product_Group X
Product_Category X
Product_Line X
Supplier_ID X
Supplier_Name X
Supplier_Country X
Discount X
Total_Retail_Price X
CostPrice_Per_Unit X

No additional data items were added.

2. Planning for the Product Dimension Table

a. Complete the table by listing the source tables and the columns in those tables that are involved in
determining the values that will be loaded in the ProdDim table.

Target Source Source Computed

Column Table Column Column
Product_ID PRODUCT_LIST Product_ID

Product_Category X
Product_Group X
Product_Line X
Product_Name PRODUCT_LIST Product_Name

Supplier_Country SUPPLIER Country

Supplier_ID SUPPLIER Supplier_ID

Supplier_Name SUPPLIER Supplier_Name

2.3 Solutions to Exercises 2-27

b.
Diagram for the Product Dimension Table:
2-28 Chapter 2 Introduction to Course Data and Course Scenario
Chapter 3 Creating Metadata for
Source Data

3.1 Setting Up the Environment .......................................................................................... 3-3

Demonstration: Defining Custom Folders .............................................................................. 3-5

Demonstration: Defining Metadata for a SAS Library .......................................................... 3-10

Exercises .............................................................................................................................. 3-17

3.2 Registering Source Data Metadata.............................................................................. 3-18

Demonstration: Registering Metadata for SAS Source Tables ............................................ 3-20

Demonstration: Registering Metadata for ODBC Data Sources .......................................... 3-36

Demonstration: Registering Metadata for External Files ..................................................... 3-65

Exercises .............................................................................................................................. 3-82

3.3 Solutions to Exercises ................................................................................................. 3-84

3-2 Chapter 3 Creating Metadata for Source Data
3.1 Setting Up the Environment 3-3

3.1 Setting Up the Environment

Objectives
Define some administrative tasks to be performed for
SAS Data Integration Studio.
Describe the New Library Wizard.

SAS Data Integration Studio Setup

SAS Data Integration Studio depends on servers, clients,
and other resources in a data integration environment.
Administrators install and configure these resources, and
SAS Data Integration Studio users are told which
resources to use.

4
3-4 Chapter 3 Creating Metadata for Source Data

Common Administrative Tasks

Some common administrative tasks that can be performed
with SAS Data Integration Studio are:
Q set up folder structure for your site

Q set up libraries for data sources

Q set up scheduling for jobs that have been deployed for

scheduling
Q set up change management

Q set up support for Web service and stored process jobs

You will perform the first two tasks in this chapter. The last
three tasks will be discussed and demonstrated in the final
chapters of this course.

Custom Folders
The Folders tree is one of the tree views in the left panel
of the desktop. Like the Inventory tree, the Folders tree
displays metadata for objects that are registered on the
current metadata server, such as tables and libraries. The
Inventory tree, however, organizes metadata by type and
does not enable you to add custom folders. The Folders
tree enables you to add custom folders.
In general, an administrator sets up the custom folder
structure in the Folders tree and sets permissions on
those folders. Users simply save metadata to the
appropriate folders in that structure.

6
3.1 Setting Up the Environment 3-5

Defining Custom Folders

1. Select Start All Programs SAS SAS Data Integration Studio 4.2.
2. Log on using Ahmeds credentials to access the Foundation repository.
a. Select My Server as the connection profile.

Do not select Set this connection profile as the default.

b. Click to close the Connection Profile window and open the Log On window.

c. Type Ahmed as the value for the User ID field and Student1 as the value for the Password
field.

Do not select Save user ID and password in this profile.

d. Click to close the Log On window.

3-6 Chapter 3 Creating Metadata for Source Data

SAS Data Integration Studio opens.

3. Expand the Data Mart Development folder.

3.1 Setting Up the Environment 3-7

4. Right-click on the Data Mart Development folder and select New Folder.

5. Type Orion Source Data and then press ENTER.

6. Right-click on the Data Mart Development folder and select New Folder.
7. Type Orion Target Data and then press ENTER.

8. Right-click on the Data Mart Development folder and select New Folder.
9. Type Orion Jobs and then press ENTER.

10. Right-click on the Data Mart Development folder and select New Folder.
11. Type Orion Reports and then press ENTER.

12. Right-click on the Data Mart Development folder and select New Folder.
13. Type Orion SCD and then press ENTER.
3-8 Chapter 3 Creating Metadata for Source Data

The final set of folders should resemble the following:

3.1 Setting Up the Environment 3-9

Libraries
In SAS software, a library is a collection of one or more
files that are recognized by SAS and that are referenced
and stored as a unit.
Libraries are critical to SAS Data Integration Studio.
Metadata for sources, targets, or jobs cannot be finalized
until the appropriate libraries have been registered in a
metadata repository.
Accordingly, one of the first tasks in a SAS Data
Integration Studio project is to specify metadata for the
libraries that contain or will contain sources, targets, or
other resources. At some sites, an administrator adds and
maintains most of the libraries that are needed, and the
administrator tells SAS Data Integration Studio users
which libraries to use.
8

New Library Wizard

The New Library Wizard is used to register a library.

With the wizard, you define the following:

Q the type of library

Q a metadata name for the library

Q the location of the library

Q a library reference

Q the SAS server where this library is to be assigned

9
3-10 Chapter 3 Creating Metadata for Source Data

Defining Metadata for a SAS Library

This demonstration illustrates defining metadata for a SAS library, a location that has some of SAS source
tables to be used throughout the rest of the course.

Access SAS Data Integration Studio Using Barbaras Project Repository

1. If necessary, select Start All Programs SAS SAS Data Integration Studio 4.2.
2. Log on using Barbaras credentials to access the Barbaras Work Repository repository.

a. Select Barbaras Work Repository as the connection profile.

b. Click to close the Connection Profile window and open the Log On window.

c. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.

d. Click to close the Log On window. SAS Data Integration Studio opens.

Define Metadata for a SAS Source Tables Library

1. Click the Folders tab.

2. Expand Data Mart Development Orion Source Data.
3. Verify that the Orion Source Data folder is selected.
3.1 Setting Up the Environment 3-11

4. Select File New Library. The New Library Wizard opens.

3-12 Chapter 3 Creating Metadata for Source Data

5. Select SAS BASE Library as the type of library to create.

6. Click .

7. Type DIFT Orion Source Tables Library as the value for the Name field.

8.
3.1 Setting Up the Environment 3-13

9. Verify that the location is set to /Data Mart Development/Orion Source Data.

If the location is incorrect, click and navigate to SAS Folders

Data Mart Development Orion Source Data.
The final specifications for the name and location window should be as follows:

10. Click .
3-14 Chapter 3 Creating Metadata for Source Data

11. Select SASApp in the Available servers pane.

12. Click to move SASApp to the Selected servers pane.

13. Click .

14. Type diftodet as the value for the Libref field.

15. Select S:\Workshop\OrionStar\ordetail from the Available items pane.

16. Click to move the selected path to the Selected items pane.
3.1 Setting Up the Environment 3-15

The final settings for the library options window are shown here.

If the desired path does not exist in the Available items pane, click . In the New
Path Specification window, click next to Paths. In the Browse window, navigate
to the desired path. Click to close the Browse window. Click to close
the New Path Specification window.

17. Click .
3-16 Chapter 3 Creating Metadata for Source Data

The review window opens.

18. Verify that the information is correct, and then click .

The library metadata object can be found in the Checkouts tree.

3.1 Setting Up the Environment 3-17

Exercises

For this set of exercises, use Barbaras project repository to create the library object(s).
1. Specifying Folder Structure
If you did not follow along with the steps of the demonstration, complete steps 1-13 starting on page
3-5.
2. Specifying Orion Source Tables Library
If you did not follow-along with the steps of the demonstration, complete steps 1-17 starting on page
3-10.
3. Specifying a Library for Additional SAS Tables
There are additional SAS tables that are needed for the course workshops. Therefore, a new library
object must be registered to access these tables. The specifics for the library are shown below:
Name: DIFT SAS Library
Folder Location: \Data Mart Development\Orion Source Data
SAS Server: SASApp
Libref: DIFTSAS
Path Specification: S:\Workshop\dift\data
4. Checking in New Library Objects
Check in the new library objects. Specify the following for check-in information:
Title: Adding two library objects
Description: Checking in new library objects of DIFT Orion
Source Tables Library and DIFT SAS Library.

The Orion Source Data folder resembles the following:

3-18 Chapter 3 Creating Metadata for Source Data

3.2 Registering Source Data Metadata

Objectives
Use Register Tables wizard to register SAS source
data.
Use Register Tables wizard to register metadata for a
Microsoft Access database table using ODBC.
Register metadata for a comma-delimited external file.

Source Data
Tables are the inputs and outputs of many SAS Data
Integration Studio jobs. The tables can be SAS tables or
tables created by the database management systems that
are supported by SAS/ACCESS software.
In this class, you will use source data from three different
types of data sources:
Q SAS tables

Q Microsoft Access Database using ODBC

Q external files

15
3.2 Registering Source Data Metadata 3-19

Register Tables Wizard

The Register Tables wizard is an interface that enables
you to register the physical layout of existing table(s)
using a data dictionary or metadata information from
the source system.
The result of running the Register Tables wizard
successfully is a metadata registration that describes
the data source(s).

Type of Table(s) to Register

The Register Tables wizard first enables you to select the
type of table you are defining metadata for. Only the
formats licensed for
your site will appear
in this list.

The SAS type can

be selected to
register metadata
for existing SAS
tables.

17
3-20 Chapter 3 Creating Metadata for Source Data

Registering Metadata for SAS Source Tables

1. If necessary, access SAS Data Integration Studio using Barbaras credentials.

a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
b. Select Barbaras Work Repository as the connection profile.

c. Click to close the Connection Profile window and open the Log On window.

d. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.

e. Click to close the Log On window.

2. Click the Folders tab.

3. Expand Data Mart Development Orion Source Data.
4. Verify that the Orion Source Data folder is selected.
5. Select File Register Tables.
3.2 Registering Source Data Metadata 3-21

The Register Tables wizard opens.

When the Register Tables wizard opens, only those data formats that are licensed for your
site are available for use.

Two alternative ways to initiate the Register Tables wizard are to

right-click a folder in the Folders tree where metadata for the table should be saved, and
then select Register Tables from the pop-up menu.
right-click a library and select Register Tables.

The procedure for registering a table typically begins with a page that asks you to "Select the
type of tables that you want to import information about". This window is skipped when you
register a table through a library.
3-22 Chapter 3 Creating Metadata for Source Data

3. Select SAS as the type of table to import information.

4. Click . The Select a SAS Library window opens.

3.2 Registering Source Data Metadata 3-23

5. Click next to the SAS Library field and then select DIFT Orion Source Tables Library.

6. Click . The Define Tables and Select Folder Location window opens.
3-24 Chapter 3 Creating Metadata for Source Data

7. Select PRODUCT_LIST table.

8. Verify that /Data Mart Development/Orion Source Data is the folder listed for the
Location field.

If the location is incorrect, click and navigate to SAS Folders

Data Mart Development Orion Source Data.

9. Click . The review window displays.

3.2 Registering Source Data Metadata 3-25

10. Verify that the information is correct.

11. Click .

The metadata object for the table is found in the Checkouts tree.
3-26 Chapter 3 Creating Metadata for Source Data

12. Right-click the PRODUCT_LIST metadata table object and select Properties.
13. Type DIFT at the beginning of the default name.

14. Remove the description.

3.2 Registering Source Data Metadata 3-27

15. Click the Columns tab to view some of the defined information.

Two columns have special symbols:

The key symbol, , indicates that PRODUCT_ID column is a primary key.

The symbol indicates that PRODUCT_LEVEL column has been indexed.

3-28 Chapter 3 Creating Metadata for Source Data

16. Click the Indexes tab.

17. Expand each of the indexes to view the column names that have been indexed.
3.2 Registering Source Data Metadata 3-29

18. Click the Keys tab.

19. Select PRODUCT_LIST.Primary under Keys area to display what column is the primary key in the
Details area.
3-30 Chapter 3 Creating Metadata for Source Data

20. Click the Physical Storage tab.

Note that the physical table name is PRODUCT_LIST.

21. Click to close the Properties window.

3.2 Registering Source Data Metadata 3-31

22. Right-click the DIFT PRODUCT_LIST metadata table object and select Open. The View Data
window opens.
3-32 Chapter 3 Creating Metadata for Source Data

23. Click to open the Query Options window.

24. Click Simple.

25. Select Product_Level as the value for the Column field.

26. Verify that Equals is set as the value for the Filter type field.
3.2 Registering Source Data Metadata 3-33

27. Type 1 as the value for the Value field.

28. Click .
3-34 Chapter 3 Creating Metadata for Source Data

The data returned to the View Data window are filtered based on the query specified.

29. Select File Close to close the view of the data.

3.2 Registering Source Data Metadata 3-35

ODBC Data Sources

Defining metadata for ODBC data sources is a two-step
process.
Q Define ODBC data source to the operating system.

Q Register tables from the ODBC data source.

On a Windows operating
system, the Control Panels
Administrative Tools enable
you to add, remove, and
configure Open Database
Connectivity (ODBC) data
sources and drivers.

Registering ODBC Data Sources

Registering metadata for an ODBC data source involves
registering the following:
Q a server definition that points to the system-defined
ODBC data source
Q a library that will use the ODBC engine and an ODBC
server definition

20
3-36 Chapter 3 Creating Metadata for Source Data

Registering Metadata for ODBC Data Sources

This demonstration uses the Control Panels Administrative Tools to access the ODBC Data Source
Administrator. A Microsoft Access database will be defined as an ODBC data source to the operating
system.
To register the desired tables from the Microsoft Access database via ODBC connection, a library object
(metadata object) is needed, and this library object requires a server definition. This server definition
points to the newly defined ODBC system resources. On this image, Barbara does not have the
appropriate authority to create metadata about a server. So Ahmed will create this server definition for her
using SAS Management Console.
Finally, Barbara can use the Register Tables wizard to complete the registration of the desired table.

Create the Needed ODBC Data Source

1. Select Start Control Panel.

2. Open the Administrative Tools by double-clicking Administrative Tools.
3.2 Registering Source Data Metadata 3-37

3. In the Administrative Tools window, double-click Data Sources (ODBC) to open the ODBC Data
Source Administrator window.

4. In the ODBC Data Source Administrator window, click the System DSN tab.
3-38 Chapter 3 Creating Metadata for Source Data

5. Click .

6. Select Microsoft Access Driver (*.mdb) as the driver type.

7. Click . The ODBC Microsoft Access Setup window opens.

3.2 Registering Source Data Metadata 3-39

8. Type DIFT Course Data as the value for the Data Source Name field.

9. Click in the Database area.

The Select Database window opens.

10. Select s: Extension in the Drives area.

3-40 Chapter 3 Creating Metadata for Source Data

11. Navigate to s:\workshop\dift\data in the Directories area.

12. Select DIFT.mdb in the Database Name area.

13. Click to close the Select Database window.

The path and database name are now specified in the Database area as shown here:
3.2 Registering Source Data Metadata 3-41

14. Click to close the ODBC Microsoft Access Setup window.

The System DSN tab in the ODBC Data Source Administrator now has the newly defined ODBC data
source.

15. Click to close the ODBC Data Source Administrator window.

16. Select File Close to close the Administrative Tools window.

3-42 Chapter 3 Creating Metadata for Source Data

Register Metadata for a Table in a Microsoft Access Database

Metadata for an ODBC data source requires a library object that will use the ODBC engine, and the
library object requires a metadata server object that will point to the system ODBC data source. Barbara
does not have the appropriate authorizations to create this server. Ahmed is an administrator and can
create this server using SAS Management Console.
1. Access SAS Management Console using Ahmeds credentials.
a. Select Start All Programs SAS SAS Management Console 4.2.
b. Select My Server as the connection profile.

c. Click to close the Connection Profile window and open the Log On window.

d. Type Ahmed as the value for the User ID field and Student1 as the value for the Password
field.

e. Click to close the Log On window.

3.2 Registering Source Data Metadata 3-43

SAS Management Console displays.

2. Right-click on the Server Manager plug-in and select New Server.

The New Server Wizard opens.

3-44 Chapter 3 Creating Metadata for Source Data

3. Select ODBC Server under the Database Servers folder.

4. Click .
3.2 Registering Source Data Metadata 3-45

5. Type DIFT Course Microsoft Access Database Server as the value for the Name
field.

6. Click .
3-46 Chapter 3 Creating Metadata for Source Data

7. Select ODBC Microsoft Access as the value for the Data Source Type field.

8. Click .

9. Click Datasrc.
3.2 Registering Source Data Metadata 3-47

10. Type "DIFT Course Data" (the quotes are necessary since the ODBC data source name has
spaces).

11. Click .
3-48 Chapter 3 Creating Metadata for Source Data

The review window shows the following:

12. Click .

13. Select File Exit to close SAS Management Console.

3.2 Registering Source Data Metadata 3-49

With the ODBC server defined, Barbara can now define the metadata object referencing a table in the
Microsoft Access database.
1. If necessary, access SAS Data Integration Studio as Barbaras credentials.
a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
b. Select Barbaras Work Repository as the connection profile.

c. Click to close the Connection Profile window and open the Log On window.

d. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.

e. Click to close the Log On window.

2. Click the Folders tab.

3. Expand Data Mart Development Orion Source Data.
4. Verify that the Orion Source Data folder is selected.
5. Select File Register Tables. The Register Tables wizard opens.

6. Expand the ODBC Sources folder.

3-50 Chapter 3 Creating Metadata for Source Data

7. Select ODBC Microsoft Access as the type of table to import information about.

8. Click .
3.2 Registering Source Data Metadata 3-51

The ODBC window opens.

There are no library metadata objects defined with an ODBC engine so none appear in the selection
list.

9. Click to define a new library object using the ODBC engine.

10. Type DIFT Course Microsoft Access Database as the value for the Name field.
3-52 Chapter 3 Creating Metadata for Source Data

11. Verify that the location is set to /Data Mart Development/Orion Source Data.
The final specifications for the name and location window should be as follows:

12. Click .

13. Select SASApp in the Available servers pane.

3.2 Registering Source Data Metadata 3-53

14. Click in the SASApp to the Selected servers pane.

3-54 Chapter 3 Creating Metadata for Source Data

15. Type diftodbc as the value for the Libref field.

16. Click .
3.2 Registering Source Data Metadata 3-55

17. Verify that DIFT Course Microsoft Access Database Server is the value for the Database
Server field.

18. Click .
3-56 Chapter 3 Creating Metadata for Source Data

The review window shows the following:

19. Click . This finishes the metadata definition for the library object.
3.2 Registering Source Data Metadata 3-57

The Register Tables wizard shows the new library:

20. Click .
3-58 Chapter 3 Creating Metadata for Source Data

21. Select CustType as the table.

22. Click .
3.2 Registering Source Data Metadata 3-59

The review window shows the following:

23. Click .

The metadata object for the ODBC data source, as well as the newly defined library object, are found
in the Checkouts tree.

24. Right-click the CustType metadata table object and select Properties.
3-60 Chapter 3 Creating Metadata for Source Data

25. Type DIFT Customer Types as the new value for the Name field.
3.2 Registering Source Data Metadata 3-61

26. Click the Columns tab to view some of the defined information.
3-62 Chapter 3 Creating Metadata for Source Data

27. Click the Physical Storage tab.

Note that the physical table name is CustType.

28. Click to close the Properties window.

3.2 Registering Source Data Metadata 3-63

29. Right-click the DIFT Customer Types metadata table object and select Open. The View Data
window opens.

30. Select File Close to close the view of the data.

3-64 Chapter 3 Creating Metadata for Source Data

About External Files

An external file, sometimes called a flat file or a raw data
file, is a plain text file that often contains one record per
line.
Within each record, the fields can have a fixed length or
they can be separated by delimiters, such as commas.

About External Files

Like SAS or DBMS tables, external files can be used as
inputs and outputs in SAS Data Integration Studio jobs.
Unlike SAS or DBMS tables, which are accessed with
SAS LIBNAME engines, external files are accessed with
SAS INFILE and FILE statements.
Accordingly, external files have their own registration
wizards. The three wizards are accessed from the
File New menu choice, under External File grouping.

23
3.2 Registering Source Data Metadata 3-65

Registering Metadata for External Files

1. If necessary, access SAS Data Integration Studio using Barbaras credentials.

a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
b. Select Barbaras Work Repository as the connection profile.

c. Click to close the Connection Profile window and open the Log On window.

d. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.

e. Click to close the Log On window.

2. Click the Folders tab.

3. Expand Data Mart Development Orion Source Data.
4. Verify that the Orion Source Data folder is selected.
5. Select File New External File Delimited.
3-66 Chapter 3 Creating Metadata for Source Data

The New Delimited External File wizard opens:

3.2 Registering Source Data Metadata 3-67

6. Type DIFT Supplier Information as the value for the Name field.

7. Verify that the location is set to /Data Mart Development/Orion Source Data.

8. Click . The External File Location window opens.

3-68 Chapter 3 Creating Metadata for Source Data

9. Click to open the Select a file window.

10. Navigate to S:\Workshop\dift\data.

11. Change the Type field to Delimited Files (*.csv).

12. Select supplier.csv.

13. Click .
3.2 Registering Source Data Metadata 3-69

14. To view the contents of the external file, click .

Previewing the file shows the first record contains column names and that the values are comma-
delimited and not space delimited.

15. Click to close the Preview File window.

3-70 Chapter 3 Creating Metadata for Source Data

The final settings for the External File Location window are shown here:

16. Click . The Delimiters and Parameters window opens.

3.2 Registering Source Data Metadata 3-71

17. Clear the Blank check box.

18. Click Comma.

19. Click .
3-72 Chapter 3 Creating Metadata for Source Data

The Column Definitions window opens.

3.2 Registering Source Data Metadata 3-73

20. Click . The Auto Fill Columns window opens.

21. Type 2 (the number two) as the value for the Start record field.

22. Click to close the Auto Fill Columns window. The top portion of the Column Definitions
window populates with 6 columns: 3 numeric and 3 character.

23. Click . The Import Column Definitions window opens.

3-74 Chapter 3 Creating Metadata for Source Data

24. Select Get the column names from column headings in this file.
25. Verify that 1 is set as the value for The column headings are in file record field.

26. Click . The Name field populates with all the column names.
3.2 Registering Source Data Metadata 3-75

27. Provide descriptions for the following:

Column Name Description

Supplier_ID Supplier ID

Supplier_Name Supplier Name

Street_ID Supplier Street ID

Supplier_Address Supplier Address

Sup_Street_Number Supplier Street Number

Country Supplier Country

28. Change the length of Supplier_ID to 4.

The final settings of column attributes should be the following:

29. Click the Data tab in the bottom part of the Column Definitions window.

30. Click .
3-76 Chapter 3 Creating Metadata for Source Data

31. Click .

The review window shows the following:

32. Click .

The metadata object for the external file is found on the Checkouts tab.
3.2 Registering Source Data Metadata 3-77

Check In the Metadata Objects

1. Verify that no objects are selected in the Checkouts tree.

2. Select Check Outs Check In All.

3. Type Adding source data metadata objects as the value for the Title field.
3-78 Chapter 3 Creating Metadata for Source Data

4. Type Checking in one SAS table (PRODUCT_LIST), one external file

(Supplier Information), one ODBC data source (Customer Types) and an
ODBC library object. as the value for the Description field.

5. Click .
3.2 Registering Source Data Metadata 3-79

6. Verify that all objects are selected.

7. Click .
3-80 Chapter 3 Creating Metadata for Source Data

Review the information in the Summary window.

8. Click .
3.2 Registering Source Data Metadata 3-81

The objects should no longer be in the Checkouts tree.

9. Click the Folders tab.

10. Expand Data Mart Development Orion Source Data.
11. Verify that all objects are now checked in. There should be a total of six metadata objects.
3-82 Chapter 3 Creating Metadata for Source Data

Exercises

5. Defining Metadata for Additional SAS Source Tables

Additional SAS source tables need to be registered in metadata.
a. Place the metadata table objects in the Data Mart Development Orion Source Data folder.
b. Register three additional tables found in DIFT Orion Source Tables Library.
Physical Table Metadata Table Name
ORDER_ITEM DIFT ORDER_ITEM
ORDERS DIFT ORDERS
STAFF DIFT STAFF

c. Register the five tables (SAS tables) found in the DIFT SAS Library. Change the default
metadata names to DIFT <SAS-table-name>.

d. Check in all SAS tables.

6. Defining Metadata for an Existing ODBC Data Source
Two tables, Contacts and NewProducts, are located in the Microsoft Access database
DIFT.mdb located in the directory S:\Workshop\dift\data.
a. Place the metadata table objects in the Data Mart Development Orion Source Data folder.
Change the default metadata names to DIFT <table-name>.

b. Check in all tables.

Hint: The ODBC library and ODBC server are defined in the previous demonstration. If necessary,
consider the following steps to replicate:
ODBC Server: pg 3-42, steps 1-13
ODBC Library: pg 3-51, steps 9-19
3.2 Registering Source Data Metadata 3-83

7. Defining Metadata for an External File

A file, profit.txt (located in the directory S:\Workshop\dift\data), contains data that is stored in fixed
column widths. Use the New Fixed Width External File wizard to create metadata. Use the following
record layout and information for the necessary column metadata and use the default settings.

Column Begin End

Name Length Type Informat Format Position Position
Company 30 Char $22. 1 22
YYMM 8 Num YYMM5. 24 28
Sales 8 Num DOLLAR13. 30 43
Cost 8 Num DOLLAR13. 45 58
Salaries 8 Num DOLLAR13. 61 84
Profit 8 Num DOLLAR13. 87 100

Each column must be created using .

Name the metadata object representing the external file DIFT Profit Information and
place the metadata object in the Data Mart Development Orion Source Data folder.
After the external file metadata is defined in the project repository, be sure to check it in.
8. Defining Metadata for an ODBC Data Source
Orion Star is considering purchasing a small store that sells various products over the internet and
through catalog orders. Some data from the store was made available for analysis in the form of a
Microsoft Access database DIFTExerciseData.mdb located in the directory S:\Workshop\dift\data.
Register all three tables in metadata within this database.
The ODBC system data source needs to be created. Name the data source
DIFT Workshop Data.
A library metadata object and a server metadata object need to be created. Name these as follows:

Library name: DIFT Workshop Microsoft Access Database

Server name: DIFT Workshop Microsoft Access Database Server

Place the metadata table objects in the Data Mart Development Orion Source Data folder.
Change the default metadata names to DIFT <table-name>.
Check in all objects.
3-84 Chapter 3 Creating Metadata for Source Data

3.3 Solutions to Exercises

1. Specifying Folder Structure
If you did not follow along with the steps of the demonstration, complete steps 1-13 starting on page
3-5.
2. Specifying Orion Source Tables Library
If you did not follow-along with the steps of the demonstration, complete steps 1-17 starting on page
3-10.
3. Specifying Library for Additional SAS Tables
a. Click the Folders tab.
b. Expand Data Mart Development Orion Source Data. Verify that the Orion Source Data
folder is selected.
c. Select File New Library. The New Library Wizard opens.
d. Select SAS BASE Library as the type of library to create.

e. Click .

f. Type DIFT SAS Library as the value for the Name field.

g. Verify that the location is set to \Data Mart Development\Orion Source Data.

h. Click .

i. Select SASApp in the Available servers pane.

j. Click in the SASApp to selected servers pane.

k. Click .

l. Type diftsas as the value for the Libref field.

m. The desired path does not exist in the Available items pane. Click .

n. In the New Path Specification window, click next to Paths.

o. In the Browse window, navigate to the desired path of S:\Workshop\dift\data.

p. Click to close the Browse window.

q. Click to close the New Path Specification window.

3.3 Solutions to Exercises 3-85

r. Verify the new path appears is the Selected items list.

s. Click .

t. Verify that the information is correct in the review window and then click .
The new library metadata object is found in the Checkouts tree.
3-86 Chapter 3 Creating Metadata for Source Data

4. Checking in New Library Objects

a. Select Check Outs Check In All.
b. Type Adding two library objects as the value for the Title field.

c. Type Checking in new library objects of DIFT Orion Source Tables

Library and DIFT SAS Library as the value for the Description field.

d. Click .
3.3 Solutions to Exercises 3-87

e. Verify that both library objects are selected.

f. Click .
3-88 Chapter 3 Creating Metadata for Source Data

g. Review the information in the Summary window.

h. Click .

The library objects should no longer be in the Checkouts tree.

3.3 Solutions to Exercises 3-89

i. Click the Folders tab.

j. Expand Data Mart Development Orion Source Data.
k. Verify that both library objects are now checked in.
5. Defining Metadata for Additional SAS Source Tables
a. Place the metadata table objects in the Data Mart Development Orion Source Data folder.
1) Click the Folders tab.
2) Expand Data Mart Development Orion Source Data.
3) Verify that the Orion Source Data folder is selected.
b. Register the three additional tables found in DIFT Orion Source Tables Library.
1) Select File Register Tables. The Register Tables wizard opens.
2) Select SAS as the type of table to import information.

3) Click . The Select a SAS Library window opens.

4) Click next to SAS Library field and then select DIFT Orion Source Tables Library.

5) Click . The Define Tables and Select Folder Location window opens.

6) Select STAFF table, hold down the CTRL key and select ORDER_ITEM and ORDERS.
7) Verify that /Data Mart Development/Orion Source Data is the folder listed for the
Location field.

8) Click . The review window opens.

9) Verify that the information is correct and click .

10) Right-click the STAFF metadata table object and select Properties.
11) Type DIFT at the beginning of the default name.

12) Click to close the Properties window.

13) Right-click the ORDER_ITEM metadata table object and select Properties.
14) Type DIFT at the beginning of the default name.

15) Remove the default description.

16) Click to close the Properties window.

17) Right-click the ORDERS metadata table object and select Properties.
18) Type DIFT at the beginning of the default name.
3-90 Chapter 3 Creating Metadata for Source Data

19) Click to close the Properties window.

c. Register the five tables (SAS tables) found in the DIFT SAS Library. Change the default
metadata names to DIFT <SAS-table-name>.

1) Select File Register Tables. The Register Tables wizard opens.

2) Select SAS as the type of table to import information.

3) Click . The Select a SAS Library window opens.

4) Click next to SAS Library field and then select DIFT SAS Library.

5) Click . The Define Tables and Select Folder Location window opens.

6) Click .

7) Verify that /Data Mart Development/Orion Source Data is the folder listed for the
Location field.

8) Click . The review window opens.

9) Verify that the information is correct and click .

The metadata objects for the table are found in the Checkouts tree.

10) Right-click the CUSTOMER_TRANS metadata table object and select Properties.
11) Type DIFT at the beginning of the default name.

12) Click to close the Properties window.

13) Right-click the CUSTOMER_TRANS_OCT metadata table object and select Properties.
3.3 Solutions to Exercises 3-91

14) Type DIFT at the beginning of the default name.

15) Click to close the Properties window.

16) Right-click the NEWORDERTRANS metadata table object and select Properties.
17) Type DIFT at the beginning of the default name.

18) Remove the default description.

19) Click to close the Properties window.

20) Right-click the STAFF_PARTIAL metadata table object and select Properties.
21) Type DIFT at the beginning of the default name.

22) Click to close the Properties window.

23) Right-click the VALIDPRODUSAOUTDOOR metadata table object and select

Properties.
24) Type DIFT at the beginning of the default name.

25) Remove the default description.

26) Click to close the Properties window.

The metadata objects for the table are found in the Checkouts tree.
3-92 Chapter 3 Creating Metadata for Source Data

d. Check in all SAS tables.

1) Select Check Outs Check In All.
2) Type Adding metadata for SAS source tables as the value for the Title
field.
3) Type Added metadata for CUSTOMER_TRANS, CUSTOMER_TRANS_OCT,
NEWORDERTRANS, ORDER_ITEM, ORDERS, STAFF, STAFF_PARTIAL, and
VALIDPRODUSAOUTDOOR. as the value for the Description field.

4) Click . Verify that all table objects are selected.

5) Click . Review the information in the Summary window.

6) Click . The table objects should no longer be in the Checkouts tree.

3.3 Solutions to Exercises 3-93

6. Defining Metadata for an Existing ODBC Data Source

a. Place the metadata table objects in the Data Mart Development Orion Source Data folder.
1) Click the Folders tab.
2) Expand Data Mart Development Orion Source Data.
3) Verify that the Orion Source Data folder is selected.
4) Select File Register Tables. The Register Tables wizard opens.
5) Expand the ODBC Sources folder.
6) Select ODBC Microsoft Access as the type of table to import information about.

7) Click . The ODBC window opens and displays the one ODBC library definition in
the SAS Library field.

8) Click .

9) Select Contacts, hold down the CTRL key and select NewProducts.

10) Click .

11) Click .

The metadata objects are found in the Checkouts tree.

12) Right-click the Contacts metadata table object and select Properties.
13) Type DIFT at the beginning of the default name.

14) Click to close the Properties window.

15) Right-click the NewProducts metadata table object and select Properties.
3-94 Chapter 3 Creating Metadata for Source Data

16) Type DIFT at the beginning of the default name.

17) Click to close the Properties window.

b. Check in all SAS tables.

1) Select Check Outs Check In All.
2) Type Adding metadata for Contacts and NewProducts tables as the
value for the Title field.

3) Click . Verify that all table objects are selected.

4) Click . Review the information in the Summary window.

5) Click . The table objects should no longer be in the Checkouts tree.

7. Defining Metadata for an External File

a. Name the metadata object representing the external file DIFT Profit Information and
place the metadata object in the Data Mart Development Orion Source Data folder.
1) Click the Folders tab.
2) Expand Data Mart Development Orion Source Data.
3) Verify that the Orion Source Data folder is selected.
4) Select File New External File Fixed Width.
5) Type DIFT Profit Information as the value for the Name field.

6) Verify that the location is set to /Data Mart Development/Orion Source Data.

7) Click . The External File Location window opens.

8) Click to open the Select a file window.

9) Navigate to S:\Workshop\dift\data.
10) Select profit.txt.

11) Click .
3.3 Solutions to Exercises 3-95

12) To view the contents of the external file, click .

13) Click to close the Preview File window.

14) Click . The Parameters window opens.

15) Accept the default settings for Parameters and click .

16) Define the column information.

17) Click to add a new column specification. Enter the following information:

Column Begin End

Name Length Type Informat Position Position
Company 22 Char $22. 1 22

18) Click to add a new column specification. Enter the following information:

Column Begin End

Name Length Type Format Position Position
YYMM 8 Num YYMM5. 24 28

19) Click to add a new column specification. Enter the following information:

Column Begin End

Name Length Type Format Position Position
Sales 8 Num DOLLAR13. 30 43
3-96 Chapter 3 Creating Metadata for Source Data

20) Click to add a new column specification. Enter the following information:

Column Begin End

Name Length Type Format Position Position
Cost 8 Num DOLLAR13. 45 58

21) Click to add a new column specification. Enter the following information:

Column Begin End

Name Length Type Format Position Position
Salaries 8 Num DOLLAR13. 61 84

22) Click to add a new column specification. Enter the following information:

Column Begin End

Name Length Type Format Position Position
Profit 8 Num DOLLAR13. 87 100

23) Click the Data tab and then click . Verify that the values are read in correctly.

24) Click . A Warning window opens.

25) Click . The review window displays general information for the external file.
3.3 Solutions to Exercises 3-97

26) Click . The metadata object for the external file is found in the Checkouts tree.

b. After the external file metadata is defined in the project repository, be sure to check it in.
1) Select Check Outs Check In All.
2) Type Adding metadata for profit information external file as the
value for the Title field.

3) Click . Verify that the external file object is selected.

4) Click . Review the information in the Summary window.

5) Click . The external file object should no longer be in the Checkouts tree.

8. Defining Metadata for an ODBC Data Source

a. The ODBC system data source needs to be created. Name the data source
DIFT Workshop Data.

1) Select Start Control Panel.

2) Open the Administrative Tools by double-clicking Administrative Tools.
3) In the Administrative Tools window, double-click Data Sources (ODBC) to open the ODBC
Data Source Administrator window.
4) In the ODBC Data Source Administrator window, click the System DSN tab.

5) Click .

6) Select Microsoft Access Driver (*.mdb) as the driver type.

7) Click . The ODBC Microsoft Access Setup window is opened.

8) Type DIFT Workshop Data as the value for the Data Source Name field.

9) Click in the Database area.

10) Select s:Extension in the Drives area.

3-98 Chapter 3 Creating Metadata for Source Data

11) Navigate to s:\workshop\dift\data in the Directories area.

12) Select DIFTExerciseData.mdb in the Database Name area.

13) Click to close the Select Database window.

14) Click to close the ODBC Microsoft Access Setup window.

15) Click to close the ODBC Data Source Administrator window.

16) Select File Close to close the Administrative Tools window.

b. Access SAS Management Console using Ahmeds credentials.
1) Select Start All Programs SAS SAS Management Console 4.2.
2) Select My Server as the connection profile.

3) Click to close the Connection Profile window and access the Log On window.

4) Type Ahmed as the value for the User ID field and Student1 as the value for the
Password field.

5) Click to close the Log On window.

c. Define a new ODBC server.

1) Right-click on the Server Manager plug-in and select New Server.The New Server
Wizard opens.
2) Select ODBC Server under Database Servers folder.

3) Click .

4) Type DIFT Workshop Microsoft Access Database Server as the value for
the Name field.

5) Click .

6) Select ODBC Microsoft Access as the value for the Data Source Type field.

7) Click .

8) Select Datasrc.
9) Type DIFT Workshop Data (the quotes are necessary since the ODBC data source
name has spaces).

10) Click .

11) Click .

d. Select File Exit to close SAS Management Console.

3.3 Solutions to Exercises 3-99

e. If necessary, access SAS Data Integration Studio as Barbaras credentials.

1) Select Start All Programs SAS SAS Data Integration Studio 4.2.
2) Select Barbaras Work Repository as the connection profile.

3) Click to close the Connection Profile window and access the Log On window.

4) Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.

5) Click to close the Log On window.

f. Register the tables found in the Microsoft Access database.

1) Click the Folders tab.
2) Expand Data Mart Development Orion Source Data.
3) Verify that the Orion Source Data folder is selected.
4) Select File Register Tables. The Register Tables wizard opens.
5) Expand the ODBC Sources folder.
6) Select ODBC Microsoft Access as the type of table to import information about.

7) Click .

8) Click to define a new library object using the ODBC engine.

a) Type DIFT Workshop Microsoft Access Database as the value for the
Name field.

b) Verify that the location is set to /Data Mart Development/Orion Source Data.

c) Click .

d) Select SASApp in the Available servers pane.

e) Click in the SASApp to Selected servers pane.

f) Type diftodbw as the value for the Libref field.

g) Click .

h) Verify that DIFT Workshop Microsoft Access Database Server is the value for the
Database Server field.

i) Click .

j) Click . This finishes the metadata definition for the library object.

9) Click .
3-100 Chapter 3 Creating Metadata for Source Data

10) Select Catalog_Orders , hold down the CTRL key, select PRODUCTS and then
Web_Orders.

11) Click .

12) Click .The metadata objects for the three tables as well as the newly defined
library object are found in the Checkouts tree.
g. Update the metadata for Catalog_Orders.

1) Right-click the Catalog_Orders metadata table object and select Properties.

2) Type DIFT Catalog_Orders as the new value for the Name field.

3) Click to close the Properties window.

h. Update the metadata for PRODUCTS.

1) Right-click the PRODUCTS metadata table object and select Properties.

2) Type DIFT PRODUCTS as the new value for the Name field.

3) Click to close the Properties window.

3.3 Solutions to Exercises 3-101

i. Update the metadata for Web_Orders.

1) Right-click the Web_Orders metadata table object and select Properties.

2) Type DIFT Web_Orders as the new value for the Name field.

3) Click to close the Properties window.

j. Check in the metadata objects.

1) Verify that no objects are selected in the Checkouts tree.
2) Select Check Outs Check In All.
3) Type Adding table metadata from a Microsoft Access database as the
value for the Title field.

4) Click .

5) Verify that all objects are selected.

6) Click . Review the information in the Summary window.

7) Click .
3-102 Chapter 3 Creating Metadata for Source Data
Chapter 4 Creating Metadata for
Target Data

4.1 Registering Target Data Metadata ................................................................................. 4-3

Demonstration: Defining the Product Dimension Table ......................................................... 4-6

Exercises .............................................................................................................................. 4-22

4.2 Importing Metadata ....................................................................................................... 4-23

Demonstration: Importing Relational Metadata.................................................................... 4-26

4.3 Solutions to Exercises ................................................................................................. 4-40

4-2 Chapter 4 Creating Metadata for Target Data
4.1 Registering Target Data Metadata 4-3

4.1 Registering Target Data Metadata

Objectives
Review features of the New Tables wizard.

New Tables Wizard

The New Tables wizard enables metadata to be entered
for a new table.
In designing the new table, you can do these tasks:
access metadata about any tables and columns
already registered in a metadata repository
override metadata that was imported (for example,
change a column name)
define new attributes for the table being created
(for example, indexes)

4
4-4 Chapter 4 Creating Metadata for Target Data

Case Study Scenario

The star schema goal that you are working toward has
four dimension tables and one fact table.

Organization Customer
Dimension Dimension

Order
Fact Table

Product Time
Dimension Dimension

Case Study Scenario

The star schema goal that you are working toward has
four dimension tables and one fact table.

Case
Organization Case
Customer
Dimension
Study Study
Dimension

Order
Exercise
Fact Table

Product Case
Time
Demo
Dimension Dimension
Study

7
4.1 Registering Target Data Metadata 4-5

Product Dimension Table Initial Diagram

The ProdDim table
process diagram was ProdDim
sketched to be this. Computed
The input table Columns
metadata objects Product_Category
are created. In the Product_Group
upcoming Product_Line
demonstration, you
define the metadata
for the target table,
ProdDim.

Product_ Supplier
List
8
4-6 Chapter 4 Creating Metadata for Target Data

Defining the Product Dimension Table

This demonstration defines a metadata object for a single target file. The target is to be a SAS data set
named ProdDim to be stored in DIFT Orion Target Tables Library (the library object needs to be created,
as well).
1. Select Start All Programs SAS SAS Data Integration Studio 4.2.
2. Log on using Barbaras credentials to access her project repository.
a. Select Barbaras Work Repository as the connection profile.

b. Click to close the Connection Profile window and open the Log On window.

c. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.

d. Click to close the Log On window.

3. Click the Folders tab.

4. Expand Data Mart Development Orion Target Data.
5. Verify that the Orion Target Data folder is selected.
6. Select File New Table.
4.1 Registering Target Data Metadata 4-7

The New Table wizard opens.

4-8 Chapter 4 Creating Metadata for Target Data

7. Type DIFT Product Dimension as the value for the Name field.

8. Verify that the location is set to /Data Mart Development/Orion Target Data.
The final specifications for the name and location window should be as follows:

9. Click .
4.1 Registering Target Data Metadata 4-9

10. Verify that the DBMS field is set to SAS.

11. Click next to the Library field. The target tables library is not yet defined.

a. Type DIFT Orion Target Tables Library as the value for the Name field.

b. Verify that the location is set to /Data Mart Development/Orion Target Data.
The final specifications for the name and location window are as follows:

c. Click .
4-10 Chapter 4 Creating Metadata for Target Data

d. Click SASApp in the Available servers pane.

e. Click to move SASApp to Selected servers pane.

f. Click .

g. Type difttgt as the value for the Libref field.

h. Click in the Path Specification area.

i. In the New Path Specification window, click next to Paths.

j. In the Browse window, navigate to S:\Workshop\dift.

k. Click to add a new folder.

l. Type datamart and press ENTER.

m. Click the datamart folder.

n. Click to close the Browse window.

4.1 Registering Target Data Metadata 4-11

o. Click to close the New Path Specification window.

p. Verify that the newly specified path is found in the Selected items pane.

The final settings for the library options window are shown here:

q. Click .
4-12 Chapter 4 Creating Metadata for Target Data

The review window opens.

r. Verify the information is correct, and then click .

The new library metadata object can be found in the Library field.
4.1 Registering Target Data Metadata 4-13

12. Type ProdDim as the value for the Name field.

The final settings for the Table Storage Information window are shown below:

13. Click .
4-14 Chapter 4 Creating Metadata for Target Data

14. Expand the Data Mart Development Orion Source Data folder on the Folders tab.
15. From the Orion Source Data folder, expand DIFT PRODUCT_LIST table object.

16. Select the following columns from DIFT PRODUCT_LIST and click to move the columns to
the Selected pane:

Product_ID
Product_Name
Supplier_ID
4.1 Registering Target Data Metadata 4-15

17. Collapse the list of columns for DIFT PRODUCT_LIST.

18. Expand the DIFT Supplier Information external file object.

19. Select the following columns from DIFT Supplier Information and click to move the columns
to the Selected pane:

Supplier_Name
Country

20. Click .
4-16 Chapter 4 Creating Metadata for Target Data

The Change Columns/Indexes window opens.

4.1 Registering Target Data Metadata 4-17

21. Update column metadata.

a. Update the name for the Country column to Supplier_Country.

b. Verify that the length of Supplier_ID is 4.

22. Select the last column (Supplier_Country).

23. Add new column metadata.

a. Click to define an additional column.

b. Enter the following information for the new column:

Column Name Description Length Type

Product_Category Product Category 25 Character

c. Click to define an additional column.

d. Enter the following information for the new column:

Column Name Description Length Type

Product_Group Product Group 25 Character

e. Click to define an additional column.

f. Enter the following information for the new column:

Column Name Description Length Type

Product_Line Product Line 25 Character
4-18 Chapter 4 Creating Metadata for Target Data

The final set of eight variables is shown below:

4.1 Registering Target Data Metadata 4-19

24. Define needed indexes.

a. Click . Define two simple indexes: one for Product_ID and one for
Product_Group.

b. Click to add the first index.

c. Enter an index name of Product_ID and press ENTER.

Neglecting to press ENTER results in not having the name of the index saved, which
produces an error when the table is generated because the name of the index and the
column being indexed do not match.

d. Select the Product_ID column and move it to the Indexes panel by clicking .

e. Click to add the second index.

f. Enter an index name of Product_Group and press ENTER.

g. Select the Product_Group column and move it to the Indexes panel by clicking . The two
requested indexes are defined in the Define Indexes window.

h. Click to close the Define Indexes window and return to the Target Table Designer.

25. Click .
4-20 Chapter 4 Creating Metadata for Target Data

26. Review the metadata listed in the finish window.

27. Click .

The new table object and new library object appear on the Checkouts tab.
4.1 Registering Target Data Metadata 4-21

28. Select Check Outs Check In All.

a. Type Adding metadata for Product Dimension table as the value for the Title
field.
b. Type DIFT Orion Target Tables Library object was also created and
checked in. as the value for the Description field.

c. Click . Verify that all objects are selected.

d. Click . Review the information in the Summary window.

e. Click .

The objects should appear in the Data Mart Development Orion Target Data folders.
4-22 Chapter 4 Creating Metadata for Target Data

Exercises

1. Defining OrderFact Target Table

Define metadata for the OrderFact table. Name the metadata object DIFT Order Fact.
Specify that the table should be created as a SAS table with the physical name of OrderFact.
Physically store the table in DIFT Orion Target Tables Library. Use the set of distinct columns
from DIFT ORDER_ITEM and DIFT ORDERS. Store the metadata object in the Data Mart
Development Orion Target Data folder. Check in the new table object.
2. Defining Additional Target Tables
Several additional tables must be defined for the demonstrations and exercises in subsequent sections.
Check in all of the metadata table objects.
Create a target table metadata object named DIFT Recent Orders that defines
column metadata for a SAS table that will be named Recent_Orders and stored in the
DIFT Orion Target Tables Library. The columns in Recent_Orders should be the same
columns that are defined in the OrderFact target table. Store the metadata object in the
Data Mart Development Orion Target Data folder.
Create a target table metadata object named DIFT Old Orders that defines
column metadata for a SAS table that will be named Old_Orders and stored in the
DIFT Orion Target Tables Library. The columns in Old_Orders should be the same
columns that are defined in the OrderFact target table. Store the metadata object in the
Data Mart Development Orion Target Data folder.
(Optional) Create a target table metadata object named DIFT US Suppliers that defines
column metadata for a SAS table that will be named US_Suppliers and stored in the
DIFT Orion Target Tables Library. The columns in US_Suppliers should be the same
columns that are defined in the DIFT Supplier Information external file object. Store the
metadata object in the Data Mart Development Orion Target Data folder.
4.2 Importing Metadata 4-23

4.2 Importing Metadata

Objectives
Discuss SAS packages.
Discuss importing and exporting of relational
metadata.

Types of Metadata
SAS Data Integration Studio enables you to import and
export metadata for individual objects or sets of related
objects. You can work with two kinds of metadata:
Q SAS metadata in SAS Package format

Q relational metadata (metadata for libraries, tables,

columns, indexes, and keys) in formats that can be
accessed with a SAS Metadata Bridge

14
4-24 Chapter 4 Creating Metadata for Target Data

SAS Package Metadata

By importing and exporting SAS Package metadata, you
can move the metadata for SAS Data Integration Studio
jobs and related objects between SAS Metadata Servers.
For example, you can create a job in a test environment,
export it as a SAS Package, and import it into another
instance of SAS Data Integration Studio in a production
environment.

Relational Metadata
By importing and exporting relational metadata in external
formats, you can reuse metadata from third-party
applications, and you can reuse SAS metadata in those
applications as well. For example, you can use third-party
data modeling software to specify a star schema for a set
of tables. The model can be exported in Common
Warehouse Metamodel (CWM) format. You can then use
a SAS Metadata Bridge to import that model into SAS
Data Integration Studio.

16
4.2 Importing Metadata 4-25

About SAS Metadata Bridges

SAS Data Integration Studio can import and export
relational metadata in any format that is supported by a
SAS Metadata Bridge.
For example, you can use third-party data modeling
software to specify a star schema for a set of tables.
The model can be exported in Common Warehouse
Metamodel (CWM) format. You can then use a SAS
Metadata Bridge to import that model into SAS Data
Integration Studio.

Relational Metadata
You can import and export relational metadata in any
format that is accessible with a SAS Metadata Bridge.
Relational metadata includes the metadata for the
following objects:
Q data libraries

Q tables

Q columns

Q indexes

Q keys (including primary keys and foreign keys)

18
4-26 Chapter 4 Creating Metadata for Target Data

Importing Relational Metadata

This demonstration illustrates importing metadata that was exported in CWM format from Oracle
Designer.
1. Select Start All Programs SAS SAS Data Integration Studio 4.2.
2. Log on using Barbaras credentials to access her project repository.
a. Select Barbaras Work Repository as the connection profile.

b. Click to close the Connection Profile window and access the Log On window.

c. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.

d. Click to close the Log On window.

3. Click the Folders tab.

4. Expand Data Mart Development Orion Target Data.
5. Verify that the Orion Target Data folder is selected.
4.2 Importing Metadata 4-27

6. Select File Import Metadata. The Metadata Importer wizard initializes and displays the
window to enable the user to select an import format.
7. Select Oracle Designer.

8. Click .
4-28 Chapter 4 Creating Metadata for Target Data

9. Click next to the File name field to open the Select a file window.

10. Navigate to S:\Workshop\dift.

11. Select OracleDesignerMetadata.dat.

12. Click .

13. Click to close the Select a file window.

4.2 Importing Metadata 4-29

14. Verify that the folder location is set to /Data Mart Development/Orion Target Data.
The final settings for the File Location window of the Metadata Importer wizard should be as shown:

15. Click .
4-30 Chapter 4 Creating Metadata for Target Data

16. Review (and accept) the default settings.

17. Click .
4.2 Importing Metadata 4-31

18. Verify that Import as new metadata is selected.

19. Click . The Metadata Location window is displayed.

4-32 Chapter 4 Creating Metadata for Target Data

20. Click in the library field to assign a new library location.

21. On the Folders tab, navigate to Data Mart Development Orion Target Data.
22. Select DIFT Orion Target Tables Library.

23. Click to close Select a library window.

4.2 Importing Metadata 4-33

24. Review the settings for the Metadata Location window.

25. Click .
4-34 Chapter 4 Creating Metadata for Target Data

26. The finish window displays the final settings. Review and accept the settings.

27. Click .

28. An information window is opened.

29. Click to not view the import log.

4.2 Importing Metadata 4-35

30. The Checkouts tree displays two new metadata table objects.

31. Right-click the Current Staff metadata table object and select Properties.
32. Type DIFT at the beginning of the default name.
4-36 Chapter 4 Creating Metadata for Target Data

33. Click the Columns tab to view some of the defined information.
34. Update the formats for each of the columns.

Column Name Column Format

Employee_ID 12.

Start_Date Date9.

End_Date Date9.

Job_Title <none>
Salary Dollar12.

Gender $Gender6.

Birth_Date Date9.

Emp_Hire_Date Date9.

Emp_Term_Date Date9.

Manager_ID 12.

35. Click to close the Properties window.

4.2 Importing Metadata 4-37

36. Right-click the Terminated Staff metadata table object and select Properties.
37. Type DIFT at the beginning of the default name.
4-38 Chapter 4 Creating Metadata for Target Data

38. Click the Columns tab to view some of the defined information.
39. Update the formats for each of the columns.

Column Name Column Format

Employee_ID 12.

Start_Date Date9.

End_Date Date9.

Job_Title <none>
Salary Dollar12.

Gender $Gender6.

Birth_Date Date9.

Emp_Hire_Date Date9.

Emp_Term_Date Date9.

Manager_ID 12.

40. Click to close the Properties window.

4.2 Importing Metadata 4-39

The metadata on the Checkouts tree now displays as the following:

41. Select Check Outs Check In All.

a. Type Adding metadata for Current and Terminated Staff tables as the
value for the Title field.

b. Click . Verify that all table objects are selected.

c. Click . Review the information in the Summary window.

d. Click . The table objects should no longer be on the Checkouts tab.

4-40 Chapter 4 Creating Metadata for Target Data

4.3 Solutions to Exercises

1. Defining the OrderFact Target Table

a. Click the Folders tab.

b. Expand Data Mart Development Orion Target Data.
c. Verify that the Orion Target Data folder is selected, and then select File New Table.
The New Table wizard opens.
d. Type DIFT Order Fact as the value for the Name field.

e. Verify that the location is set to /Data Mart Development/Orion Target Data.

f. Click .

g. Verify that the DBMS field is set to SAS.

h. Select DIFT Orion Target Tables Library as the value for the Library field.

i. Type OrderFact as the value for the Name field.

j. Click .

k. Expand the Data Mart Development Orion Source Data folder on the Folders tab.
l. From the Orion Source Data folder, click the DIFT ORDER_ITEM table object.

m. Select all columns from DIFT ORDER_ITEM by clicking (all columns will be moved to
the Selected pane).

n. Collapse the list of columns for DIFT ORDER_ITEM.

o. From the Orion Source Data folder, click the DIFT ORDERS table object.

p. Select all columns from DIFT ORDERS by clicking (all columns will be moved to the
Selected pane). An Error window opens saying that Order_ID will not be added twice.

q. Click .

r. Click .

s. Accept the default attributes of the columns and then click .

t. Review the metadata listed in the finish window and then click . The new table object
appears on the Checkouts tab.
u. Select Check Outs Check In All.
v. Type Adding metadata for Order Fact table as the value for the Title field.

w. Click . Verify that all table objects are selected.

4.3 Solutions to Exercises 4-41

x. Click . Review the information in the Summary window.

y. Click . The table object should no longer be on the Checkouts tab.

2. Defining Additional Target Tables

a. Define metadata for the DIFT Old Orders table.

1) Click the Folders tab.

2) Expand Data Mart Development Orion Target Data.
3) Verify that the Orion Target Data folder is selected, and then select File New Table.
The New Table wizard opens.
4) Type DIFT Old Orders as the value for the Name field.

5) Verify that the location is set to /Data Mart Development/Orion Target Data.

6) Click .

7) Verify that the DBMS field is set to SAS.

8) Select DIFT Orion Target Tables Library as the value for the Library field.

9) Type Old_Orders as the value for the Name field.

10) Click .

11) Expand the Data Mart Development Orion Target Data folder on the Folders tab.
12) From the Orion Target Data folder, locate the DIFT Order Fact table object.

13) Select the DIFT Order Fact table object and click to move all columns to the
Selected pane.

14) Click .

15) Accept the default attributes of the columns and then click .

16) Review the metadata listed in the finish window and then click . The new table
object appears on the Checkouts tab.
b. Define metadata for the DIFT Recent Orders table.

1) Click the Folders tab.

2) Expand Data Mart Development Orion Target Data.
3) Verify that the Orion Target Data folder is selected, and then select File New Table.
The New Table wizard opens.
4) Type DIFT Recent Orders as the value for the Name field.

5) Verify that the location is set to /Data Mart Development/Orion Target Data.
4-42 Chapter 4 Creating Metadata for Target Data

6) Click .

7) Verify that the DBMS field is set to SAS.

8) Select DIFT Orion Target Tables Library as the value for the Library field.

9) Type Recent_Orders as the value for the Name field.

10) Click .

11) Expand the Data Mart Development Orion Target Data folder on the Folders tab.
12) From the Orion Target Data folder, locate the DIFT Order Fact table object.

13) Select the DIFT Order Fact table object and click to move all columns to the
Selected pane.

14) Click .

15) Accept the default attributes of the columns and then click .

16) Review the metadata listed in the finish window and then click . The new table
object appears on the Checkouts tab.
c. (Optional) Define metadata for the DIFT US Suppliers table.

1) Click the Folders tab.

2) Expand Data Mart Development Orion Target Data.
3) Verify that the Orion Target Data folder is selected, and then select File New Table.
The New Table wizard opens.
4) Type DIFT US Suppliers as the value for the Name field.

5) Verify that the location is set to /Data Mart Development/Orion Target Data.

6) Click .

7) Verify that the DBMS field is set to SAS.

8) Select DIFT Orion Target Tables Library as the value for the Library field.

9) Type US_Suppliers as the value for the Name field.

10) Click .

11) Expand the Data Mart Development Orion Source Data folder on the Folders tab.
12) From the Orion Source Data folder, locate the DIFT Supplier Information table object.

13) Select the DIFT Supplier Information table object and click to move all columns to the
Selected pane.

14) Click .
4.3 Solutions to Exercises 4-43

15) Accept the default attributes of the columns and then click .

16) Review the metadata listed in the finish window and then click . The new table
object appears on the Checkouts tab.
d. Check in the newly created table objects.
1) Select Check Outs Check In All.
2) Type Adding metadata for various target table objects as the value for
the Title field.

3) Type Adding metadata for Old and Recent Orders, and US Suppliers
as the value for the Description field.

4) Click . Verify that all table objects are selected.

5) Click . Review the information in the Summary window.

6) Click . The table object should no longer be on the Checkouts tab.

The metadata in the Orion Target Data folder should now resemble the following:
4-44 Chapter 4 Creating Metadata for Target Data
Chapter 5 Creating Metadata for
Jobs

5.1 Introduction to Jobs and the Job Editor ...................................................................... 5-3

Demonstration: Populating Current and Terminated Staff Tables .......................................... 5-6

5.2 Using the SQL Join Transformation ........................................................................... 5-33

Demonstration: Populating the Product Dimension Table ............................................. 5-41

Exercises .............................................................................................................................. 5-68

5.3 Working with Jobs ........................................................................................................ 5-69

Demonstration: Investigate Mapping and Propagation Functionality .................................. 5-72

Demonstration: Chaining Jobs ............................................................................................. 5-99

Demonstration: Investigating Performance Statistics ........................................................ 5-105

Demonstration: Using the Reports Window ....................................................................... 5-120

5.4 Solutions to Exercises ............................................................................................... 5-129

5-2 Chapter 5 Creating Metadata for Jobs
5.1 Introduction to Jobs and the Job Editor 5-3

5.1 Introduction to Jobs and the Job Editor

Objectives
Define a job object.
Discuss various features of jobs and the Job Editor
window.

Overview
At this point, metadata is defined for the following:
various types of source tables

desired target tables

The next step is to load the targets in the data mart.

4
5-4 Chapter 5 Creating Metadata for Jobs

What Is a Job?
A job is a collection of SAS tasks that creates output. SAS
Data Integration Studio uses the metadata for each job to
generate SAS code that reads sources and creates
targets in physical storage.

Example Process Flow

The following process flow diagram shows a job that
reads data from a source table, sorts the data, and then
writes the sorted data to a target table.

The components of this process flow perform the

following functions:
STAFF specifies metadata for the source table.
Sort specifies metadata for the sort process.
STAFF Sorted specifies metadata for the target table.

6
5.1 Introduction to Jobs and the Job Editor 5-5

A Quick Example
Before you proceed to further discussions on jobs and
the Process Designer window, look at the creation of a
simple job. The job creates two SAS data sets, one
containing the current employees and the other
containing the terminated employees.

(The aspects of creating a job are examined in more

detail after some of the functionality is shown.)
7

Splitter Transformation
The Splitter transformation is a
transformation that can be used to
create one or more subsets of a
source.

8
5-6 Chapter 5 Creating Metadata for Jobs

Populating Current and Terminated Staff Tables

This demonstration shows the building of a job that uses the Splitter transformation.
The final process flow diagram will look like the following:

1. If necessary, access SAS Data Integration Studio using Barbaras credentials.

a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
b. Select Barbaras Work Repository as the connection profile.

c. Click to close the Connection Profile window and access the Log On window.

d. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.

e. Click to close the Log On window.

5.1 Introduction to Jobs and the Job Editor 5-7

2. Click the Folders tab.

3. Expand Data Mart Development Orion Jobs.
4. Verify that the Orion Jobs folder is selected.
5. Select File New Job.

The New Job window opens.

5-8 Chapter 5 Creating Metadata for Jobs

6. Type DIFT Populate Current and Terminated Staff Tables as the value for the
Name field.

7. Verify that the Location is set to /Data Mart Development/Orion Jobs.

If the Location is incorrect, click and navigate to SAS Folders

Data Mart Development Orion Jobs.

8. Click .
5.1 Introduction to Jobs and the Job Editor 5-9

The Job Editor window opens:

5-10 Chapter 5 Creating Metadata for Jobs

9. Add the source data object to the process flow.

a. If necessary, click the Folders tab.
b. Expand Data Mart Development Orion Source Data.
c. Drag the DIFT Staff table object to the Diagram tab of the Job Editor.

When a job window is active, objects can be added to the diagram by right-clicking and
selecting Add to Diagram.
10. Select File Save to save diagram and job metadata to this point.
5.1 Introduction to Jobs and the Job Editor 5-11

11. Add the Splitter transformation to the diagram.

a. In the tree view, select the Transformations tab.
b. Expand the Data grouping.
c. Select the Splitter transformation.

d. Drag the Splitter transformation to the diagram.

e. Center the Splitter so that it is lined up with the source table object.

12. Select File Save to save diagram and job metadata to this point.
5-12 Chapter 5 Creating Metadata for Jobs

13. Connect the source table object to the Splitter transformation.

a. Place your cursor over the DIFT STAFF table object. A connection selector appears.

b. To connect the DIFT STAFF table object to the Splitter transformation, place your cursor over
the connection selector until a pencil icon appears.

c. Click on this connection selector and drag to Splitter transformation.

14. Select File Save to save diagram and job metadata to this point.
5.1 Introduction to Jobs and the Job Editor 5-13

15. Add the target table objects to the diagram.

a. Click the Folders tab.
b. If necessary, expand the Data Mart Development Orion Target Data folder.
c. Hold down the CTRL key and click on the two target table objects (DIFT Current Staff,
DIFT Terminated Staff).

d. Drag the two objects to the Diagram tab of the Job Editor.

When a job window is active, objects can be added to the diagram by right-clicking and
selecting Add to Diagram.
5-14 Chapter 5 Creating Metadata for Jobs

e. Arrange the source data objects so that they are separated.

16. Select File Save to save diagram and job metadata to this point.
5.1 Introduction to Jobs and the Job Editor 5-15

17. Connect the Splitter transformation to the target table objects.

The two target tables will be loaded with direct one-to-one column mappings of subsetted data.
Therefore, no Table Loader transformation is needed for either of the target tables. Hence, the two
temporary output table objects need to be deleted in order to connect the transformation to the table
objects.
a. Right-click on one of the temporary table objects of the Splitter transformation and select Delete.
5-16 Chapter 5 Creating Metadata for Jobs

b. Right-click on the second temporary table object of the Splitter transformation and select Delete.

18. Connect the Splitter transformation to each of the target table objects.
a. Place your cursor over the Splitter transformation until a pencil icon appears.

b. When the pencil icon appears, click and drag the cursor to the first output table,
DIFT Current Staff.
5.1 Introduction to Jobs and the Job Editor 5-17

c. Again, place your cursor over the Splitter transformation until a pencil icons appears, and click
and drag the cursor to the second output table, DIFT Terminated Staff.

The resulting process flow diagram should resemble the following:

19. Select File Save to save diagram and job metadata to this point.
5-18 Chapter 5 Creating Metadata for Jobs

20. Specify the properties of the Splitter transformation.

a. Right-click on the Splitter transformation and select Properties.
5.1 Introduction to Jobs and the Job Editor 5-19

b. Click the Row Selection tab.

c. Specify the subsetting criteria for the DIFT Current Staff table object.
1) Verify that the DIFT Current Staff table object is selected.

2) Select Row Selection Conditions as the value for the Row Selection Type field.

3) Click below the Selection Conditions area.

The Expression window is opened.

5-20 Chapter 5 Creating Metadata for Jobs

4) Click the Data Sources tab.

5) Expand the STAFF table.
6) Select the Emp_Term_Date column.

7) Click .

8) Click in the operators area.

9) Type . (a period for a missing numeric value).

The Expression Text area should now resemble the following:

10) Click to close the Expression Builder window.

5.1 Introduction to Jobs and the Job Editor 5-21

The Row Selection tab updates to the following:

5-22 Chapter 5 Creating Metadata for Jobs

d. Specify the subsetting criteria for the DIFT Terminated Staff table object.

1) Verify that the DIFT Terminated Staff table object is selected.

2) Select Row Selection Conditions as the value for the Row Selection Type field.

3) Click below the Selection Conditions area. The Expression window opens.

4) Click the Data Sources tab.

5) Expand the STAFF table.
6) Select the Emp_Term_Date column.

7) Click .

8) Click in the operators area.

9) Type . (a period for a missing numeric value).

10) Click to close the Expression Builder window.

The Row Selection tab updates to the following:

e. Click the Mappings tab.

5.1 Introduction to Jobs and the Job Editor 5-23

f. Verify that all Target Table columns have an arrow coming in to them (that is, all target columns
will receive data from a source column).

g. Click to close the Splitter Properties window.

21. Select File Save to save diagram and job metadata to this point.
5-24 Chapter 5 Creating Metadata for Jobs

22. Run the job.

a. Right-click in background of the job and select Run.

A job can also be processed by selecting Actions Run, or by clicking

b. Click the Status tab in the Details area.

Note that all processes completed successfully.

5.1 Introduction to Jobs and the Job Editor 5-25

23. Click to close the Details view.

24. View the Log for the executed job.

25. Scroll to view the note about the creation of the DIFTTGT.CURRENT_STAFF table:
5-26 Chapter 5 Creating Metadata for Jobs

26. Scroll to view the note about the creation of the DIFTTGT.TERM_STAFF table:
5.1 Introduction to Jobs and the Job Editor 5-27

27. View the data for the DIFT Current Staff table object.

a. Right-click on the DIFT Current Staff table object and select Open.

b. When finished viewing the data, select File Close to close the View Data window.
5-28 Chapter 5 Creating Metadata for Jobs

28. View the data for the DIFT Terminated Staff table object.

a. Right-click on the DIFT Terminated Staff table object and select Open.

b. When finished viewing the data, select File Close to close the View Data window.
29. Select File Close to close the Job Editor. If necessary, save changes to the job. The new job object
appears on the Checkouts tab.
30. Select Check Outs Check In All.
a. Type Adding job that populates current & terminated staff tables as
the value for the Title field.

b. Click . Verify that all table objects are selected.

c. Click . Review the information in the Summary window.

d. Click . The table objects should no longer be on the Checkouts tab.

5.1 Introduction to Jobs and the Job Editor 5-29

New Jobs
New jobs are initialized by the New Job wizard. The New
Job wizard names a job and the metadata location.
(A description can optionally be specified.)
Selecting creates
an empty job.

Job Editor
The Job Editor window enables you to create, maintain,
and troubleshoot SAS Data Integration Studio jobs. To
display this window for an existing job, right-click a job in
the tree view and select Open.

Job Editor Tabs:

Tab Description
Diagram Used to build and update the process flow for a job
Code Used to review or update code for a job
Log Used to review the log for a submitted job
Output Used to review the output of a submitted job

11
5-30 Chapter 5 Creating Metadata for Jobs

Panes Used with the Job Editor

Working with jobs in the Job Editor can be enhanced by
using one or more panes available.

Pane Description
Details Used to monitor and debug a job.
To display, select View Details from the
desktop.
Runtime Manager Displays the run-time status of the current job,
the last time that the job was executed in the
current session, and the SAS Application
Server that was used to execute the job.
To display, select View Runtime Manager
from the Desktop.
Actions History Displays low-priority errors and warnings.
To display, select View Actions History
from the Desktop.
12

Global Job Options

Options can be specified for SAS Data Integration Studio
that will affect all jobs.

13
5.1 Introduction to Jobs and the Job Editor 5-31

Local Job Options

Local options for jobs can be established using the job
properties window. These options override global options
for the selected
job only they
do not affect
any other jobs.

Process Flow Diagrams

If SAS Data Integration Studio is to generate code for a
job, you must define a process flow diagram that specifies
the sequence of each source, target, and process in a job.
In the diagram, each source, target, and process has its
own metadata object.

15
5-32 Chapter 5 Creating Metadata for Jobs

Introduction to Transformations
A transformation is a metadata object that specifies how
to extract data, transform data, or load data into data
stores. Each transformation that you specify in a process
flow diagram generates or retrieves SAS code. You can
also specify user-written code in the metadata for any
transformation in a process flow diagram.

Transformations Tree
The Transformations tree organizes transformations into a
set of folders. You can drag a transformation from the
Transformations tree to the Job Editor, where you can
connect it to source and target tables and update its
default metadata. By updating a transformation with the
metadata for actual sources, targets, and transformations,
you can quickly create process flow
diagrams for common scenarios.
The display shows the standard
Transformations tree.

17
5.2 Using the SQL Join Transformation 5-33

5.2 Using the SQL Join Transformation

Objectives
Discuss components of SQL Joins Designer window.

SQL Join Transformation

The Splitter transformation was used
in the previous job. As the course
progresses, you will touch upon many
of the available transformations.
The next demonstration uses the
SQL Join transformation, which can
be used to create SQL queries that
run in the context of SAS Data
Integration Studio jobs. The SQL Join
transformation features a graphical
interface that provides a setting for
building the statements and clauses
that constitute queries.

21
5-34 Chapter 5 Creating Metadata for Jobs

Open the SQL Join Transformation

The process of building the SQL
query is performed in the
Designer window. You access
this window when you right-click
the SQL Join transformation in a
SAS Data Integration Studio job
and select Open.
You use the Designer window to
create, edit, and review an SQL
query.

SQL Joins Designer Window

Designer Window Bar

Navigate
pane

SQL Clauses
pane

Properties
pane

23
5.2 Using the SQL Join Transformation 5-35

Navigate Pane: SQL Join

Navigate Pane: Table Selection

The Tables pane displays when a table object is selected in the Navigate pane, and when the Select
keyword is selected in the Navigate pane. The Tables pane might also display when other aspects of
particular joins are requested (for instance, the surfacing of Having, Group by, and Order by
information). The Tables pane is displayed in the same location as the SQL Clauses pane.
5-36 Chapter 5 Creating Metadata for Jobs

Navigate Pane: Select

Navigate Pane: Where

27
5.2 Using the SQL Join Transformation 5-37

Loading the Product Dimension Table

The DIFT Product Dimension table will be loaded
from the defined metadata table objects, DIFT Product_List
and DIFT Supplier Information, by using the SQL Join
transformation.
Three computed columns were designed:
Product_Group

Product_Category

Product_Line

The calculations for these columns are derived from a

single variable, Product_ID, and take advantage of the
user-defined format PRODUCT.

Calculating Product_Group Using Product_ID

Product_Group values are calculated by:
Q performing a grouping using Product_ID
210100100001 210100100050 group to 210100100000
210200100001 210200100019 group to 210200100000
210200200001 210200200050 group to 210200200000
210200300001 210200300125 group to 210200300000

Q applying the PRODUCT. format to the result
210100100000 formats to Outdoor Things, Kids
210200100000 formats to A-Team, Kids
210200200000 formats to Bathing Suits, Kids
210200300000 formats to Eclipse, Kids Clothes

This calculation creates a less fine view of the data.

29
5-38 Chapter 5 Creating Metadata for Jobs

Calculating Product_Category
Product_Category values are calculated by:
Q performing a grouping using Product_ID

210100100000 210100x00000 group to 210100000000

210200100000 210200x00000 group to 210200000000
220100100000 220100x00000 group to 220100000000
220200100000 220200x00000 group to 220200000000

Q applying the PRODUCT. format to the result
210100000000 formats to Children Outdoors
210200000000 formats to Children Sports
220100000000 formats to Clothes
220200000000 formats to Shoes

This calculation creates an even less fine view of the data.

Calculating Product_Line
Product_Line values are calculated by:
Q performing a grouping using Product_ID

210100000000 210x00000000 group to 210000000000

220100000000 220x00000000 group to 220000000000
230100000000 230x00000000 group to 230000000000
240100000000 240x00000000 group to 240000000000

Q applying the PRODUCT. format to the result
210000000000 formats to Children
220000000000 formats to Clothes & Shoes
230000000000 formats to Outdoors
240000000000 formats to Sports

This calculation creates an even less fine view of the data.

31
5.2 Using the SQL Join Transformation 5-39

Computed Columns View of Data

After they are defined, these new columns
(Product_Line, Product_Category,
Product_Group) can provide a wider or closer view
of the data, depending on what is needed.
Product_Line (widest view)

Product_Category (finer view)

Product_Group (even finer view)

Product_ID (finest view)

Computed Columns Calculations

The necessary calculations for the three computed
columns are as follows:
Product_Group:
put(int(product_id/100000)*100000,product.)

Product_Category:
put(int(product_id/100000000)*100000000,product.)

Product_Line:
put(int(product_id/10000000000)*10000000000,product.)

33 ...
5-40 Chapter 5 Creating Metadata for Jobs

Computed Columns Calculations (Easier)

The necessary calculations for the three computed
columns are as follows:
Product_Group:
put(int(product_id/1e5)*1e5,product.)

Product_Category:
put(int(product_id/1e8)*1e8,product.)

Product_Line:
put(int(product_id/1e10)*1e10,product.)

34 ...
5.2 Using the SQL Join Transformation 5-41

Populating the Product Dimension Table

In this demonstration, you can take advantage of the SQL Join transformation to join the DIFT
Product_List and DIFT Supplier Information source tables to create the target table
DIFT Product Dimension.

In addition, three calculated columns are to be constructed.

The final process flow diagram will look like the following:

1. If necessary, access SAS Data Integration Studio using Barbaras credentials.

a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
b. Select Barbaras Work Repository as the connection profile.

c. Click to close the Connection Profile window and access the Log On window.

d. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.

e. Click to close the Log On window.

2. Click the Folders tab.

3. Expand Data Mart Development Orion Jobs.
4. Verify that the Orion Jobs folder is selected.
5. Select File New Job. The New Job window opens.
a. Type DIFT Populate Product Dimension Table as the value for the Name field.

b. Verify that the Location is set to /Data Mart Development/Orion Jobs.

c. Click .
5-42 Chapter 5 Creating Metadata for Jobs

The Job Editor window opens:

5.2 Using the SQL Join Transformation 5-43

6. Add the source data objects to the process flow.

a. If necessary, from the Folders tab, expand Data Mart Development Orion Source Data.
b. Select the DIFT PRODUCT_LIST table object and click and hold the CTRL key to also select
DIFT Supplier Information external file object.
c. Drag the objects to the Diagram tab of the Job Editor.

d. Arrange the source data objects so that they are separated.

7. Select File Save to save diagram and job metadata to this point.
5-44 Chapter 5 Creating Metadata for Jobs

8. Add the File Reader transformation to the diagram.

a. In the tree view, click the Transformations tab.
b. Expand the Access grouping.
c. Select the File Reader transformation.

d. Drag the File Reader transformation to the diagram.

e. Position the File Reader transformation so that it is next to (to the right of) the external file object,
DIFT Supplier Information.

f. Connect DIFT Supplier Information to the File Reader transformation.

5.2 Using the SQL Join Transformation 5-45

9. Rename the temporary table object associated with the File Reader transformation.
a. Right-click on the green temporary table object and select Properties.

b. Click the Physical Storage tab.

c. Type FileReader as the value for the Name field.
5-46 Chapter 5 Creating Metadata for Jobs

d. Click to close the File Reader Properties window.

10. Select File Save to save diagram and job metadata to this point.
11. Add the SQL Join transformation to the diagram.
a. In the tree view, click the Transformations tab.
b. Expand the Data grouping.
c. Select the SQL Join transformation.
5.2 Using the SQL Join Transformation 5-47

d. Drag the SQL Join transformation to the diagram.

e. Center the SQL Join so that it is in middle of the DIFT PRODUCT_LIST table object and the
File Reader transformation.

12. Select File Save to save diagram and job metadata to this point.
13. Add inputs to the SQL Join transformation.
a. Place your cursor over the SQL Join transformation in the diagram to reveal the two default
ports.

b. Connect the DIFT PRODUCT_LIST table object to one of the input ports for the SQL Join.

c. Connect the File Reader transformation (click on the temporary table icon, , associated with
the File Reader and drag) to the other port of the SQL Join transformation.
5-48 Chapter 5 Creating Metadata for Jobs

The diagram updates to the following:

14. Select File Save to save diagram and job metadata to this point.
15. Add the DIFT Product Dimension table object as the output for the SQL Join.

a. Right-click on the temporary table objects associated with the SQL Join transformation and select
Replace.

b. In the Table Selector window, expand the Data Mart Development Orion Target Data
folders.
c. Select DIFT Product Dimension table object.

d. Select .
5.2 Using the SQL Join Transformation 5-49

The process flow diagram updates to the following:

16. Select File Save to save diagram and job metadata to this point.
17. Review properties of the File Reader transformation.
a. Right-click on the File Reader transformation and select Properties.
b. Click the Mappings tab.
c. Verify that all target columns have a column mapping.

d. Click to close the File Reader Properties.

18. Select File Save to save diagram and job metadata to this point.
5-50 Chapter 5 Creating Metadata for Jobs

19. Specify the properties of the SQL Join transformation.

a. Right-click on the SQL Join transformation and select Open.
The Designer window opens.

The Designer window initially has

a Diagram tab that you use to create SQL statements, configure the clauses that are contained
in the statement, and edit the source to target mapping. The title of this tab changes to match
the object selected in the Navigate pane.
a Navigate pane that enables navigation through various part of the Designer window
an SQL Clauses pane that enables you to add SQL clauses to the flow shown on the Diagram
tab
a Properties pane that you use to display and update main properties of an object selected on
the Diagram tab
a Tables pane that you use to review the list of columns in source and target tables
5.2 Using the SQL Join Transformation 5-51

b. Select the Join item on the Diagram tab. Verify that the Join is an Inner join from the Properties
pane.

The type of join can also be verified and or changed by right-clicking on the Join
item in the process flow of the SQL Join clauses. A pop-up menu appears that has
the type of Join checked, but also enables selection of another type of join.
5-52 Chapter 5 Creating Metadata for Jobs

c. Select the Where keyword in the Navigate pane to surface the Where tab.
d. Verify that the Inner join will be executed based on the values of Supplier_ID columns from
the sources being equal.

e. Add an additional Where clause to subset the data.

1) Click in the top portion of the Where tab. A row gets added with the logical AND
as the Boolean operator.
5.2 Using the SQL Join Transformation 5-53

2) Select Choose column(s) from the drop-down list under the first Operand column.

The Choose Columns window opens.

3) Select Product_Level from the DIFT PRODUCT_LIST column listing.

4) Click to close the Choose Columns window.

5-54 Chapter 5 Creating Metadata for Jobs

5) Type 1 (numeral one) in the field for the second Operand column and press ENTER.

The SQL where clause updates to the following:

5.2 Using the SQL Join Transformation 5-55

f. Select the Select keyword in the Navigate pane to surface the Select tab.

Note that four of the target columns map.

5-56 Chapter 5 Creating Metadata for Jobs

g. Map the Country column to Supplier_Country by clicking on the Country column and
dragging to the Supplier_Country.

This leaves the three columns that are to be calculated.

h. Click to fold the target table over the source table. This provides more room to work on the
calculated expressions.

The Expression field is what needs to be filled in for the three columns.
5.2 Using the SQL Join Transformation 5-57

i. Re-order the columns so that Product_Group is first, then Product_Category, and then
Product_Line.

j. Specify an expression for Product_Group.

1) Locate the Product_Group column. In the Expression column, select Advanced

from the drop-down list.
5-58 Chapter 5 Creating Metadata for Jobs

The Expression window opens as displayed:

2) Locate and open the HelperFile.txt file in S:\Workshop\dift.

3) Copy the expression for Product_Group:
put(int(product_list.product_id/1e5)*1e5,product.)

4) Paste the copied code in the Expression Text area.

5) Click .

6) Click .

7) Click to close the Expression window.

5.2 Using the SQL Join Transformation 5-59

k. Specify an expression for Product_Category.

1) Locate the Product_Category column.

2) In the Expression column, select Advanced from the drop-down list. The Expression
window opens.
3) If necessary, access the HelperFile.txt file in S:\Workshop\dift.
4) Copy the expression for Product_Category:
put(int(product_list.product_id/1e8)*1e8,product.)

5) Paste the copied code in the Expression Text area.

6) Click .

7) Click .

8) Click to close the Expression window.

l. Specify an expression for Product_Line.

1) Locate the Product_Line column.

2) In the Expression column, select Advanced from the drop-down list. The Expression
window is opened.
3) If necessary, access the HelperFile.txt file in S:\Workshop\dift.
4) Copy the expression for Product_Line:
Put(int(product_list.product_id/1e10)*1e10,product.)

5) Paste the copied code in the Expression Text area.

6) Click .

7) Click .

8) Click to close the Expression window.

5-60 Chapter 5 Creating Metadata for Jobs

The final settings for the calculated columns are as follows:

m. Click to fold the target table info back to the right side.
n. Select File Save to save changes to the SQL Join transformation.

o. Click to return to the Job Editor.

20. Run the job.

a. Right-click in background of the job and select Run.

A job can also be processed by selecting Actions Run or by clicking on

.
5.2 Using the SQL Join Transformation 5-61

A warning occurred in the execution of the SQL Join. You see a change in the coloring of the
transformation in the process flow and the symbol overlay.

b. Click the Status tab in the Details area.

5-62 Chapter 5 Creating Metadata for Jobs

c. Double-click the Warning for the SQL Join. The Warnings and Errors tab is moved forward with
the warning message.
5.2 Using the SQL Join Transformation 5-63

21. Edit the SQL Join transformation to fix the column mappings.
a. Right-click on the SQL Join transformation and select Open.
b. Click the Select keyword on the Navigate pane to surface the Select tab. Note the warning
symbol, , associated with each of the three calculated columns.

c. Map the Product_ID column to the Product_Group column (click on Product_ID in the
Source table side and drag to Product_Group in the Target table side).
5-64 Chapter 5 Creating Metadata for Jobs

d. Map the Product_ID column to the Product_Category column.

e. Map the Product_ID column to the Product_Line column.

f. Select File Save to save changes to the SQL Join transformation.

g. Click to return to the Job Editor.

5.2 Using the SQL Join Transformation 5-65

22. Run the job by right-clicking in background of the job and selecting Run.

The job runs without errors or warnings.

5-66 Chapter 5 Creating Metadata for Jobs

23. Click to close the Details view.

24. View the log for the executed job by selecting the Log tab.
25. Scroll to view the note about the creation of the DIFTTGT.PRODDIM table:
5.2 Using the SQL Join Transformation 5-67

26. Click the Diagram tab.

27. Right-click on the DIFT Product Dimension table object and select Open.
28. Scroll to the new columns and verify that data were calculated properly.

29. Select File Close to close the View Data window.

30. Select File Close to close the Job Editor window. The new job object appears on the Checkouts
tab.
31. Select Check Outs Check In All.
a. Type Adding job that populates product dimension table as the value for
the Title field.

b. Click . Verify that the job object is selected.

c. Click . Review the information in the Summary window.

d. Click . The table objects should no longer be on the Checkouts tab.

5-68 Chapter 5 Creating Metadata for Jobs

Exercises

1. Loading the OrderFact Table

Some specifics for creating the job to load the OrderFact table are show below:
Name the job DIFT Populate Order Fact Table.
Two tables should be joined together, DIFT ORDER_ITEM and DIFT ORDERS.
The SQL Join transformation will be used for the inner join based on Order_ID from the input
tables.
No additional processing is necessary beyond the SQL Join; therefore, the targets can be loaded
directly from the SQL Join transformation.
After verifying that the table is created successfully (OrderFact should have 951,669
observations and 12 variables.), check in the job object.
2. Loading Recent and Old Orders Tables
Some specifics for creating the job to load the Old Orders and Recent Orders tables are
shown below:
Name the job DIFT Populate Old and Recent Orders Tables.
Use the SAS Splitter transformation to break apart the observations from the OrderFact table.
Old orders are defined to be orders older than those placed in 2005. An expression that can be used
to find the observations for this data is the following:

Order_Date < '01Jan2005'd

Recent orders are defined to be orders made within the years beginning in 2005. An expression that
can be used to find the observations for this data is the following:

Order_Date >= '01Jan2005'd

No additional processing is necessary beyond the Splitter; therefore, the targets can be loaded
directly from Splitter transformation.
After verifying that the tables are created successfully, check in the job object. Recent orders
should have 615,396 observations and 12 variables, and old orders should have 336,273
observations and 12 variables.
5.3 Working with Jobs 5-69

5.3 Working with Jobs

Objectives
Investigate mapping and propagation.
Investigate chaining of jobs.
Work with performance statistics.
Generate reports on metadata for tables and jobs.

Automatic Mappings
By default, SAS Data Integration Studio automatically
creates a mapping when a source column and a target
column have the same column name, data type, and
length.
Events that trigger automatic mapping include:
Q connecting a source and a target to the transformation
on the Diagram tab
Q clicking Propagate on the toolbar or in the pop-up
menu in the Job Editor window
Q clicking Propagate on the Mappings tab toolbar and
selecting a propagation option
Q clicking Map all columns on the Mappings tab toolbar.

40
5-70 Chapter 5 Creating Metadata for Jobs

Disabling Automatic Mappings

You can use the following methods to disable automatic
mapping:
Q disable automatic mapping globally for new SAS Data
Integration Studio jobs using the Options window
Q disable automatic mapping for the job using the
Settings tool within the Job Editor window
Q disable automatic mapping for the transformation in a
job using the Settings tool at the top of the Mappings
tab

If you disable automatic mapping for a transformation,

you must maintain its mappings manually.

Automatic Propagation
Automatic propagation sends column changes to tables
when process flows are created. If you disable automatic
propagation and refrain from using manual propagation,
you can propagate column changes on the Mappings tab
for a transformation that are restricted to the target tables
for that transformation. Automatic propagation controls
can occurs at various levels:
Q Global

Q Job

Q Process flow

Q Transformation

42
5.3 Working with Jobs 5-71

Level Control Set Propagation Direction

Automatically propagate columns in the Select one of the following
Global
Automatic Settings group box on the Job Editor directions in the Propagation
tab in the Options window. (Click Options in the Direction group box:
Tools menu to display the window.) This option From Beginning to End
controls automatic propagation of column
changes in all new jobs. From End to Beginning
Automatically Propagate Job in the dropdown Select one of the following
Job
menu that displays when you click Settings in directions in the drop-down
the toolbar on the Diagram tab in the Job Editor menu:
window. This option controls automatic From Beginning to End
propagation of column changes in the currently
opened job. From End to Beginning
Select one of the following
Process flow Propagate Columns in the pop-up menu on the
directions in the pop-up menu:
Diagram tab in the Job Editor window. This
option controls automatic propagation of column To Beginning
changes in the process flow in a currently opened To End
job.

Transformation Include Transformation in Propagation in the Not applicable

drop-down menu that displays when you click
Settings in the toolbar on the Mappings tab.
This option controls automatic propagation of
column changes in the selected transformation.

Transformation Include Selected Columns in Propagation in Not applicable

the drop-down menu that displays when you
click Settings in the toolbar on the Mappings
tab to propagate changes to columns that you
select in the source or target tables for the
selected transformation.
5-72 Chapter 5 Creating Metadata for Jobs

Investigate Mapping and Propagation Functionality

This demonstration investigates automatic and manual propagation and mappings. The propagation is
investigated from sources to targets only. Propagation can be done from targets to sources.
1. If necessary, access SAS Data Integration Studio using Brunos credentials.
a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
b. Verify that the connection profile is My Server.

c. Click to close the Connection Profile window and access the Log On window.

d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.

e. Click to close the Log On window.

2. Create a new folder.

a. Click the Folders tab in the tree view area.
b. Right-click on Data Mart Development folder and select New Folder.
c. Type DIFT Additional Examples as the name for the new folder.
5.3 Working with Jobs 5-73

3. Create metadata table objects to demo mappings and propagation.

a. Expand Data Mart Development Orion Source Data.
b. Locate the DIFT PRODUCTS metadata table object, right-click and select Copy.
c. Right-click on the DIFT Additional Examples folder and select Paste.
4. Review the metadata for the DIFT PRODUCTS table.

a. Right-click the DIFT PRODUCTS table object and select Properties.

b. On the General tab, update the name of the metadata object to DIFT PRODUCTS (Copy).

c. Click the Columns tab.

Note that four columns are character and two are numeric.

d. Click to close the DIFT PRODUCTS Properties window.

5-74 Chapter 5 Creating Metadata for Jobs

5. Create a target table with the same attributes as the DIFT PRODUCTS (Copy) table.

a. Right-click the DIFT PRODUCTS (Copy) table object and select Copy.
b. Right-click the DIFT Additional Examples folder and select Paste.

c. Right-click the Copy of DIFT PRODUCTS (Copy) table object and select Properties.
d. Type DIFT PRODUCTS Information as the value for the Name field (on the General tab).

e. Click the Physical Storage tab.

f. Click next to the Library field.

g. Collapse the Orion Source Data folder.

h. Expand the Orion Target Data folder.
i. Select DIFT Orion Target Tables Library.

j. Click to close the Select a library window.

5.3 Working with Jobs 5-75

k. Verify that the DBMS appropriately updated to SAS with this new library selection.

l. Type PRODUCTSInfo as the value for the Name field.

m. Click to close the properties window.

The new table object appears under the DIFT Additional Examples folder.

6. Create a target table with different attributes as the DIFT PRODUCTS (Copy) table.

a. Right-click the DIFT PRODUCTS (Copy) table object and select Copy.
b. Right-click the DIFT Additional Examples folder and select Paste.

c. Right-click the Copy of DIFT PRODUCTS (Copy) table object and select Properties.
d. Type DIFT PRODUCTS Profit Information as the value for the Name field (on the
General tab).

e. Click the Columns tab.

f. Change attributes of the TYPE column.

1) Type Product Type as the value for the Description field.

2) Type 15 as the value for the Length field.

3) Type $15. as the value for the Informat field.

4) Type $15. as the value for the Format field.

5-76 Chapter 5 Creating Metadata for Jobs

g. Change attributes of the DESCRIP column.

1) Type Product Description as the value for the Description field.

2) Type 20 as the value for the Length field.

3) Type $20. as the value for the Informat field.

4) Type $20. as the value for the Format field.

h. Change attributes of the PCODE column.

1) Type Product Code as the value for the Description field.

2) Select Numeric as the value for the Type field.

3) Delete the existing value for the Informat field.

4) Delete the existing value for the Format field.

i. Change attributes of the supplier column.

1) Type Product Supplier as the value for the Description field.

2) Type 32 as the value for the Length field.

3) Type $32. as the value for the Informat field.

4) Type $32. as the value for the Format field.

j. Click to add a new numeric column named Profit.

5.3 Working with Jobs 5-77

k. Click the Physical Storage tab.

l. Click next to the Library field.

1) Collapse the Orion Source Data folder.

2) Expand the Orion Target Data folder.
3) Select DIFT Orion Target Tables Library.

4) Click to close the Select a library window.

m. Verify that the DBMS appropriately updated to SAS with this new library selection.
n. Type PRODUCTSProfitInfo as the value for the Name field.

o. Click to close the properties window.

The new table object appears under the DIFT Additional Examples folder.
5-78 Chapter 5 Creating Metadata for Jobs

7. Create a target table with no column definitions.

a. Right-click on the DIFT Additional Examples folder and select New Table.
b. Type DIFT PRODUCTS Profit Information (2) as the value for the Name field.

c. Click .

d. Select DIFT Orion Target Tables Library as the value for the Library field.

e. Type PRODUCTSProfitInfo2 as the value for the Name field.

f. Click .

g. No columns will be selected, so click .

h. No columns will be defined, so click . A warning window opens as displayed.

i. Click to close the warning window.

j. Review the final settings and click .

The new table object appears under the DIFT Additional Examples folder.
5.3 Working with Jobs 5-79

8. Verify that Automatic Mapping is set globally.

a. Select Tools Options.
b. Click the Job Editor tab.
c. Verify that Automatically map columns and Automatically propagate columns are selected in
the Automatic Settings area.
5-80 Chapter 5 Creating Metadata for Jobs

d. Click to close the Options window.

9. Create initial job metadata to show automatic mappings.

a. Right-click on DIFT Additional Examples folder and select New Job.
b. Type DIFT Mappings Example as the value for the Name field.

c. Verify that /Data Mart Development/DIFT Additional Examples is the value for the
Location field.

d. Click .

10. Add table objects and transformations to the Diagram tab of the Job Editor.
a. Click and drag the DIFT PRODUCTS (Copy) table object to the Diagram tab of the Job Editor.
b. Click and drag the DIFT PRODUCTS Information table object to the Diagram tab of the Job
Editor.
c. Click the Transformations tab.
d. Expand the Data grouping of transformations.
e. Click and drag the Extract transformation to the Diagram tab of the Job Editor.
f. Expand the Access grouping of transformations.
g. Click and drag the Table Loader transformation to the Diagram tab of the Job Editor.
5.3 Working with Jobs 5-81

h. Click the status indicator for the Extract transformation.

A similar message regarding no mappings defined can be found for the Table Loader
transformation.

i. Click to close the message.

11. Connect the objects in the process flow diagram and investigate the mappings.
a. Connect the DIFT PRODUCTS (Copy) table object to the Extract transformation.
b. Connect the Extract transformation to the Table Loader transformation.
c. Connect the Table Loader transformation to the DIFT PRODUCTS Information table object.
The process flow diagram updates to the following:
5-82 Chapter 5 Creating Metadata for Jobs

Mappings can be investigated by opening the Properties window for each transformation.
Alternatively the Details section displays defined mappings for the selected transformation.
d. If necessary, select View Details to display the Details area within the Job Editor window.
(Optionally, the Details section can be displayed by clicking the tool in the Job Editor tools).

e. Click the Extract transformation on the Diagram tab.

f. Click the Mappings tab in the Details section.

Automatic mappings occur between a source column and a target column when these columns
have the same column name, data type, and length.
g. Click the Table Loader transformation on the Diagram tab.
h. View the mappings in the Details section.

All mappings were automatically established between the source columns and the target columns.
5.3 Working with Jobs 5-83

12. Add additional table objects and transformations to the Diagram tab of the Job Editor.
a. Click and drag the DIFT PRODUCTS (Copy) table object to the Diagram tab of the Job Editor.
b. Click and drag the DIFT PRODUCTS Profit Information table object to the Diagram tab of
the Job Editor.
c. Click the Transformations tab.
d. Expand the Data grouping of transformations.
e. Click and drag the Extract transformation to the Diagram tab of the Job Editor.
f. Expand the Access grouping of transformations.
g. Click and drag the Table Loader transformation to the Diagram tab of the Job Editor.

13. Connect the new objects in the process flow diagram and investigate the mappings.
a. Connect the DIFT PRODUCTS (Copy) table object to the Extract transformation.
b. Connect the Extract transformation to the Table Loader transformation.
c. Connect the Table Loader transformation to the DIFT PRODUCTS Profit Information table
object.
The process flow diagram updates to the following:

d. Select File Save.

5-84 Chapter 5 Creating Metadata for Jobs

14. Investigate the mappings for the new transformations.

a. If necessary, select View Details to display the Details area within the Job Editor window.

b. Click the newest Extract transformation on the Diagram tab.

c. Click the Mappings tab in the Details section.

The columns are all automatically propagated, and mapped.

d. Click the newest Table Loader transformation.
e. View the mappings in the Details section.

Several points are to be noted:

The TYPE, DESCRIP and supplier columns did not map automatically because of the
differing lengths.
The PCODE columns mapped, even though one is character and one is numeric. SAS Data
Integration Studio does an automatic character-to-numeric conversion.
No columns mapped to the column Profit.
5.3 Working with Jobs 5-85

f. Manually map the TYPE columns. A Warning will appear. An expression could be written to
avoid this warning otherwise the Log for the job will contain a WARNING message as well.

g. Manually map the DESCRIP columns.

h. Manually map the supplier columns.

i. Scroll in the target table side of the Mappings tab to view the automatic expression for the Sex
column.
5-86 Chapter 5 Creating Metadata for Jobs

15. Turn off automatic mappings for the job and rebuild the connections for the first part of this process
flow.
a. Right-click in the background of the Job Editor and select Settings Automatically Map
Columns.

b. Click on each of the connections for the first flow, right-click and select Delete.
5.3 Working with Jobs 5-87

c. Re-connect the DIFT PRODUCTS (Copy) table object to the Extract transformation.

d. Click on the status indicator for the Extract transformation.

e. Click to close the message.

f. Click the newest Extract transformation on the Diagram tab.

g. Click the Mappings tab in the Details section. Note that there are no mappings defined.
5-88 Chapter 5 Creating Metadata for Jobs

h. Click on the Mappings tab tool set. All columns are mapped.

i. Click on the Mappings tab tool set. All columns are no longer mapped.

j. Click next to the Name column on the Source table side.

k. Click next to the Name column on the Target table side.

l. Click on the Mappings tab tool set, and verify that Include Selected Columns in
Mapping is selected.

m. Click on the Mappings tab tool set. The selected column is manually mapped.
5.3 Working with Jobs 5-89

16. Close the current job; no need to save any changes.

a. Select File Close. A warning window is displayed.

b. Click .

17. Create initial job metadata to show propagation.

a. Right-click on DIFT Additional Examples folder and select New Job.
b. Type DIFT Propagation Example as the value for the Name field.

c. Verify that /Data Mart Development/DIFT Additional Examples is the value for the
Location field.

d. Click .

18. Select View Layout Top to Bottom.

5-90 Chapter 5 Creating Metadata for Jobs

19. Add table objects and transformations to the Diagram tab of the Job Editor.
a. Click and drag the DIFT PRODUCTS (Copy) table object to the Diagram tab of the Job Editor.
b. Click and drag the DIFT PRODUCTS Profit Information (2) table object to the Diagram tab
of the Job Editor.
c. Click the Transformations tab.
d. Expand the Data grouping of transformations.
e. Click and drag the Extract transformation to the Diagram tab of the Job Editor.
f. Click and drag the Sort transformation to the Diagram tab of the Job Editor.
g. Expand the Access grouping of transformations.
h. Click and drag the Table Loader transformation to the Diagram tab of the Job Editor.
5.3 Working with Jobs 5-91

20. Connect the source to the first transformation and investigate the propagation.
a. Connect the DIFT PRODUCTS (Copy) table object to the Extract transformation.
b. Click the Extract transformation on the Diagram tab.
c. Click the Mappings tab in the Details section.

All columns from the source are propagated and mapped to the target.
21. Turn off automatic propagation and mappings for the job.
a. Right-click in the background of the job and select Settings Automatically Propagate
Columns.
b. Right-click in the background of the job and select Settings Automatically Map Columns.

These selections can also be made from the Job Editors tool set.
5-92 Chapter 5 Creating Metadata for Jobs

22. Break the connection between the DIFT PRODUCTS (Copy) table object and the Extract
transformation (click on the connection, right-click, and select Delete).
23. Remove the already propagated columns from the output table of the Extract transformation.
a. Right-click on the output table object associated with the Extract transformation and select
Properties.

b. Click the Columns tab.

c. Select all columns (hold down the CTRL key and then click the A key).
d. Right-click and select Delete.

e. Click . to close the properties window for the Extract output table.
5.3 Working with Jobs 5-93

24. Reconnect the source to the Extract transformation and investigate the propagation.
a. Connect the DIFT PRODUCTS (Copy) table object to the Extract transformation.
b. Click the Extract transformation on the Diagram tab.
c. Click the Mappings tab in the Details section.

All columns from the source are NOT propagated and mapped to the target.
25. Manually propagate all columns.

a. Click to propagate all columns from source to targets.

b. Click on the Mappings tab tool set. All columns are removed from the target table side.
5-94 Chapter 5 Creating Metadata for Jobs

26. Manually propagate selected columns.

a. On the Source table side, click the TYPE column.
b. Hold down the CTRL key and click both the DESCRIP, PRICE, and COST columns.

c. Click , and then select Selected Source Columns To Targets.

Only the selected columns are propagated and mapped.

5.3 Working with Jobs 5-95

27. Manually propagate selected columns to the end.

a. Connect the Extract transformation to the Sort transformation.
b. Connect the Sort transformation to the Table Loader transformation.
c. Connect the Table Loader transformation to the target table object, DIFT PRODUCT Profit
Information (2).

d. Click the Extract transformation on the Diagram tab.

e. Click the Mappings tab in the Details section.
f. Select all four of the target columns.
5-96 Chapter 5 Creating Metadata for Jobs

g. Click , and then select Selected Target Columns To End.

h. Click the Sort transformation on the Diagram tab.

i. Click the Mappings tab in the Details section. Verify that only the three selected columns were
propagated (and mapped).

j. Click the Table Loader transformation on the Diagram tab.

k. Click the Mappings tab in the Details section. The propagation does not propagate to the target
table. This needs to be done manually.
5.3 Working with Jobs 5-97

l. Click to propagate all columns from source to targets.

28. Save and close the job.

a. Select File Save.
b. Select File Close.
5-98 Chapter 5 Creating Metadata for Jobs

Chaining Jobs
Existing jobs can be added to the Diagram tab of the Job
Editor window. These jobs are added to the control flow in
the order that they are added to the job. This sequence is
useful for jobs that are closely related - however, the jobs
do not have to be related. You can always change the
order of execution for the added jobs in the Control Flow
tab of the Details pane.

The added jobs are linked by dashed-line control flow

arrows and not by solid-line data flow arrows.
44
5.3 Working with Jobs 5-99

Chaining Jobs

1. If necessary, access SAS Data Integration Studio using Brunos credentials.

a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
b. Verify that the connection profile is My Server.

c. Click to close the Connection Profile window and access the Log On window.

d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.

e. Click to close the Log On window.

2. Click the Folders tab.

3. Expand Data Mart Development DIFT Additional Examples.
4. Verify that the Jobs folder is selected.
5. Select File New Job. The New Job window opens.
a. Type DIFT Create a Job of Jobs as the value for the Name field.

b. Verify that the Location is set to /Data Mart Development/Orion Jobs.

c. Click . The Job Editor window is displayed.

6. Return to the Folders tab, and expand the folders to Data Mart Development
Orion Jobs.
5-100 Chapter 5 Creating Metadata for Jobs

7. Select the DIFT Populate Old and Recent Orders Tables job and drag it to the Diagram tab of the
job editor window.
8. Select the DIFT Populate Order Fact Table job and drag it to the Diagram tab of the job editor
window.
The first job connects automatically to the second job.

The OrderFact table, created in the DIFT Populate Order Fact Table job, is the source table for
the DIFT Populate Old and Recent Orders Tables job. Therefore, DIFT Populate the Order Fact
Table job should run first and then the DIFT Populate Old and Recent Orders Tables job.
5.3 Working with Jobs 5-101

9. Click the Control Flow tab in the Details pane.

10. Select the DIFT Populate Order Fact Table job.

11. Click from the Control Flow tabs toolbar.

The process flow diagram updates to:

12. Select View Layout Left to Right. The diagram updates with the correct ordering and in the
horizontal view.

13. Select File Save.

5-102 Chapter 5 Creating Metadata for Jobs

14. Click to execute the job.

15. Click the Status tab in the Details pane and verify that the jobs both ran successfully in concurrence.

16. Select File Close to close the job editor window.

5.3 Working with Jobs 5-103

Investigating Performance Statistics

Performance statistics for a SAS Data Integration Studio
job can be tracked as long as the following prerequisites
are met:
Q The logging facility is enabled on the SAS Workspace
Server that executes the job.
(The logging facility is enabled by default.)
Q The collect runtime statistics option is enabled for the
job.

Enabling Performance Statistics

To collect runtime statistics for an
existing job
Q open the job in the Job Editor
window
Q right-click on the canvas and
select
Collect Runtime Statistics and,
if desired, also turn on collection
of table statistics by selecting
Collect Table Statistics.

47
5-104 Chapter 5 Creating Metadata for Jobs

Enable Statistics in Options Window

To collect runtime statistics for all new jobs:
Q select Tools Options

Q click the Job Editor tab

Q On the Job Editor tab, in the Statistics area:

Q click the check box for Collect runtime statistics

Q click the check box for Collect table statistics.

48
5.3 Working with Jobs 5-105

Investigating Performance Statistics

1. If necessary, access SAS Data Integration Studio using Brunos credentials.

a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
b. Verify that the connection profile is My Server.

c. Click to close the Connection Profile window and access the Log On window.

d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.

e. Click to close the Log On window.

2. Click the Folders tab.

3. Expand Data Mart Development DIFT Demo.
4. Double-click on the job DIFT Test Job OrderFact Table Plus to open it in the Job Editor.
5-106 Chapter 5 Creating Metadata for Jobs

5. If necessary, select View Details.

6. Right-click in the background of the Diagram tab and select Collect Runtime Statistics.
7. Right-click in the background of the Diagram tab and select Collect Table Statistics. (The Collect
Table Statistics menu item is dimmed unless the Collect Runtime Statistics item is selected.)
5.3 Working with Jobs 5-107

8. Click to execute the job.

9. Click the Statistics tab in the Details pane.

The Collect Table Statistics choice populates the Records field - otherwise, a zero is listed.
10. Scroll to the right in the table of statistics.
5-108 Chapter 5 Creating Metadata for Jobs

11. Click on the Statistics tab toolbar. The Save window is displayed.

12. Navigate to S:\Workshop\dift\reports.

13. Type DIFTTestJobStatsRun1 as the value for the File name field.

14. Click .
5.3 Working with Jobs 5-109

15. Access a Windows Explorer window by selecting Start All Programs Accessories
Windows Explorer.
16. Navigate to S:\Workshop\dift\reports.
17. Double-click DIFTTestJobStatsRun1.csv. Microsoft Excel opens and displays the saved statistics.

18. Click to close Microsoft Excel.

5-110 Chapter 5 Creating Metadata for Jobs

19. Click on the Statistics tab toolbar. The table view changes to a graphical view, which is a line
graph by default.

20. Click between Line Graph and Bar Chart. All of the reported statistics are selected by
default.
5.3 Working with Jobs 5-111

21. Clear the following selections: Current Memory

Operating System Memory
Current I/O
Operating System I/O

22. Click . The line graph updates to the following view of the requested statistics:
5-112 Chapter 5 Creating Metadata for Jobs

23. Clear the following selections: Records

Real Time
CPU Time
24. Click the following selections: Current Memory
Operating System Memory

25. Click . The line graph updates to the following view of the requested statistics:
5.3 Working with Jobs 5-113

26. Clear the following selections: Current Memory

Operating System Memory
27. Click the following selections: Current I/O
Operating I/O

28. Click . The line graph updates to the following view of the requested statistics:
5-114 Chapter 5 Creating Metadata for Jobs

29. Click Bar Chart on the Statistics tab toolbar. The table view changes to a bar chart view.

The bar chart quickly tells us that the Table Loader transformation took almost three times as long as
the rank transformation. The SQL Join transformation ran very quickly, compared to the other
transformations.
30. Place your cursor over the bar for the SQL Join transformation. Tooltip text appears with summarized
information about the processing of the SQL Join transformation.
5.3 Working with Jobs 5-115

31. Place your cursor over the bar for the Table Loader transformation. Tooltip text appears with
summarized information about the processing of the Table Loader transformation.
5-116 Chapter 5 Creating Metadata for Jobs

32. Click and choose the Table Loader.

The bar chart updates to a single bar for just the Table Loader transformation. The scaling for times is
easier to read for this selected transformation.
5.3 Working with Jobs 5-117

33. Click and choose All transformations.

34. Click on Statistics tab toolbar. The Print window opens as displayed.

The graph can be written to a file, and then printed from the file.

35. Click .
5-118 Chapter 5 Creating Metadata for Jobs

36. Click on Statistics tab toolbar. The Save to File window opens as displayed.

37. If necessary, navigate to the S:\Workshop\dift\reports directory.

38. Accept the default name and type of file.

39. Click .

40. With Job Editor window active, select File Close.

41. If prompted, do not save any changes.
5.3 Working with Jobs 5-119

About Reports
The reports featured in SAS Data Integration Studio can
be used to generate reports - metadata for tables and
jobs can be reviewed in a convenient format.
Reports enable you to:
Q find information about a table or job quickly

Q compare information between different tables or jobs

Q obtain a single file that contains summary information

of all tables or jobs in HTML, RTF, or PDF format
Q perform custom behaviors that are defined by user-
created plug-in SAS code, Java code, or both

50
5-120 Chapter 5 Creating Metadata for Jobs

Using the Reports Window

1. If necessary, access SAS Data Integration Studio using Brunos credentials.

a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
b. Verify that the connection profile is My Server.

c. Click to close the Connection Profile window and access the Log On window.

d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.

e. Click to close the Log On window.

2. Click on the SAS Data Integration Studio toolbar.

The Reports window opens as displayed.

The Reports window can also be accessed by selecting Tools Reports.

5.3 Working with Jobs 5-121

3. Click Tables Report.

4. Click from the Reports windows toolbar. The Report Options window is displayed.

Verify that the default report format is set to HTML. A valid CSS file can be specified as well as
additional ODS HTML statement options.
5-122 Chapter 5 Creating Metadata for Jobs

5. Click next to the Format field and choose RTF.

Valid ODS RTF statement options can be specified.

6. Click next to the Format field and choose PDF.

Valid ODS PDF statement options can be specified.

7. Click to close the Report Options window.

8. Click next to the Show field and choose Job.

5.3 Working with Jobs 5-123

The Reports window updates to the following:

5-124 Chapter 5 Creating Metadata for Jobs

9. Specify the location and name of the job reports.

a. Click next to the Default Location field. The Select a directory window is
opened.
b. Navigate to S:\Workshop\dift\reports.

c. Click to create a new folder.

d. Type JobsReport as the name of the new folder and press ENTER.

e. Navigate to the new JobsReport folder.

f. Click to close the Select a directory window.

5.3 Working with Jobs 5-125

10. Click Job Documentation and then select .

A Report View window is displayed.

5-126 Chapter 5 Creating Metadata for Jobs

11. Click . A browser opens with the report.

12. When done viewing the report, select File Close from the browser window.
5.3 Working with Jobs 5-127

13. To create a document object, click Job Documentation and then select .

The Save As Document window is opened.

14. Navigate to the Data Mart Development DIFT Additional Examples folder.
15. Type Job Information as the value for the Name field.

16. Click .
5-128 Chapter 5 Creating Metadata for Jobs

The document object appears on the Folders tab.

17. In the Reports window, click .

18. Click the Folders tab.

19. If necessary, expand the Data Mart Development DIFT Additional Examples folders.
20. Right-click on the Job Information (Document) and select Open. The document object is a link to
the physical file.
21. Select File Close from the browser window.
5.4 Solutions to Exercises 5-129

5.4 Solutions to Exercises

1. Loading the Order Fact Table
a. Click the Folders tab.
b. Expand Data Mart Development Orion Jobs.
c. Verify that the Orion Jobs folder is selected.
d. Select File New Job. The New Job window opens.
1) Type DIFT Populate Order Fact Table as the value for the Name field.

2) Verify that the Location is set to /Data Mart Development/Orion Jobs.

3) Click . The Job Editor window opens.

e. Add the source data objects to the process flow.

1) Click the Folders tab.
2) If necessary, expand Data Mart Development Orion Source Data.
3) Right-click on the DIFT ORDER_ITEM table object and select Add to Diagram.
4) Right-click on the DIFT ORDERS table object and select Add to Diagram.
5) Click on the DIFT ORDERS object on the Diagram tab of the Job Editor and drag it away
from the DIFT ORDER_ITEM object (initially, these objects are one on top of the other).

f. Select File Save to save diagram and job metadata to this point.
g. Add the SQL Join transformation to the diagram.
1) In the tree view, select the Transformations tab.
2) Expand the Data grouping.
3) Select the SQL Join transformation.
4) Drag the SQL Join transformation to the diagram.
5) Center the SQL Join so that it is in middle of the DIFT ORDER_ITEM table object and
DIFT ORDERS table object.

h. Add inputs to the SQL Join transformation.

1) Click on the connection selector for DIFT ORDER_ITEM and drag to one of the input ports
for the SQL Join transformation.
2) Click on the connection selector for DIFT ORDERS and drag to other input port for the SQL
Join transformation.
i. Select File Save to save diagram and job metadata to this point.
5-130 Chapter 5 Creating Metadata for Jobs

j. Add target table to the diagram.

1) Right-click on the temporary output table for the SQL Join and select Replace.
2) Verify the Folders tab is selected.
3) Expand the Data Mart Development Orion Target Data folder.
4) Select DIFT Order Fact.

5) Click .

k. Select File Save to save diagram and job metadata to this point.
l. Review the properties of the SQL Join transformation.
1) Right-click on the SQL Join transformation and select Open. The Designer window opens.
2) Select the Join item on the Diagram tab. Verify that the Join is an Inner Join from the
Properties pane.
3) Verify that the Inner join will be executed based on the values of Order_ID columns from
the sources being equal.
4) Select the Select keyword on the Navigate pane to surface the Select tab.
5) Verify that all target columns are mapped.

m. Click to return to the Job Editor.

n. Right-click in the background of the job and select Run.

o. Verify that the job runs successfully.
p. Select the Log tab and verify that DIFTTGT.ORDERFACT is created with 951669 observation
and 12 variables.
q. Click the Diagram tab.
r. Right-click on DIFT Order Fact and select Open.
s. Review the data then choose File Close.
t. Select File Close to close the job editor.
u. Select Check Outs Check In All.
1) Type Adding job that populates order fact table as the value for the
Title field.

2) Click . Verify that all table objects are selected.

3) Click . Review the information in the Summary window.

4) Click . The table objects should no longer be on the Checkouts tab.

5.4 Solutions to Exercises 5-131

2. Loading Recent and Old Orders Tables

a. Click the Folders tab.
b. Expand Data Mart Development Orion Jobs.
c. Verify that the Orion Jobs folder is selected.
d. Select File New Job. The New Job window opens.
1) Type DIFT Populate Old and Recent Orders Tables as the value for the
Name field.

2) Verify that the Location is set to /Data Mart Development/Orion Jobs.

3) Click . The Job Editor window opens.

e. Add the source data object to the process flow.

1) If necessary, click the Folders tab.
2) Expand Data Mart Development Orion Target Data.
3) Drag the DIFT Order Fact table object to the Diagram tab of the Job Editor.
f. Select File Save to save diagram and job metadata to this point.
g. Add the Splitter transformation to the diagram.
1) In the tree view, select the Transformations tab.
2) Expand the Data grouping.
3) Select the Splitter transformation.
4) Drag the Splitter transformation to the diagram.
5) Center the Splitter so that it is lined up with the source table object.
h. Select File Save to save diagram and job metadata to this point.
i. Connect the source table object to the Splitter transformation.
1) Place your cursor over the DIFT Order Fact table object. A connection selector appears.
2) To connect the DIFT Order Fact table object to the Splitter transformation, place your
cursor over the connection selector until a pencil icon appears. Click on this connection
selector and drag to the Splitter transformation.
j. Select File Save to save the diagram and job metadata to this point.
5-132 Chapter 5 Creating Metadata for Jobs

k. Add the target table objects to the diagram.

1) Right-click on one of the temporary output tables for the Splitter and select Replace.
2) Verify that the Folders tab is selected.
3) Expand the Data Mart Development Orion Target Data folder.
4) Select DIFT Old Orders.

5) Click .

6) Right-click on the other temporary output table for the Splitter and select Replace.
7) Verify that the Folders tab is selected.
8) Expand the Data Mart Development Orion Target Data folder.
9) Select DIFT Recent Orders.

10) Click .

11) If necessary, separate the two target table objects. The process flow diagram should
resemble the following:

l. Select File Save to save diagram and job metadata to this point.
5.4 Solutions to Exercises 5-133

m. Specify the properties of the Splitter transformation.

1) Right-click on the Splitter transformation and select Properties.
2) Click the Row Selection tab.
3) Specify the subsetting criteria for the DIFT Recent Orders table object.

a) Verify that DIFT Recent Orders table object is selected.

b) Select Row Selection Conditions as the value for the Row Selection Type field.

c) Click below the Selection Conditions area. The Expression window is

opened.
d) Click the Data Sources tab.
e) Expand the OrderFact table.
f) Select the Order_Date column.

g) Click .

h) Click in the operators area.

i) Type 01jan2005d.

j) Click .

k) Click .

l) Click to close the Expression Builder window.

The Selection Conditions area on the Row Selection tab updates to the following:
5-134 Chapter 5 Creating Metadata for Jobs

4) Specify the subsetting criteria for the DIFT Old Orders table object.

a) Verify that the DIFT Old Orders table object is selected.

b) Select Row Selection Conditions as the value for the Row Selection Type field.

c) Click below the Selection Conditions area. The Expression window is

opened.
d) Click the Data Sources tab.
e) Expand the OrderFact table.
f) Select the Order_Date column.

g) Click .

h) Click in the operators area.

i) Type 01jan2005d.

j) Click .

k) Click .

l) Click to close the Expression Builder window.

The Selection Conditions area on the Row Selection tab updates to the following:

5) Click the Mappings tab.

6) Verify that all Target Table columns have an arrow coming in to them (that is, all target
columns will receive data from a source column).

7) Click to close the Splitter Properties window.

n. Select File Save to save the diagram and job metadata to this point.
5.4 Solutions to Exercises 5-135

o. Run the job.

1) Right-click in background of the job and select Run.
2) Select the Status tab in the Details area. Note that all processes completed successfully.

3) Click to close the Details view.

4) View the Log for the executed Job.

5) Scroll to view the notes about the creation of the DIFTTGT.RECENT_ORDERS table and the
creation of the DIFTTGT.OLD_ORDERS table:

6) View the data for the DIFT Recent Orders table object.

a) Right-click on the DIFT Recent Orders table object and select Open.
b) When finished viewing the data, select File Close to close the View Data window.
7) View the data for the DIFT Old Orders table object.

a) Right-click on the DIFT Old Orders table object and select Open.
b) When finished viewing the data, select File Close to close the View Data window.
p. Select File Close to close the Job Editor. The new job object appears on the Checkouts tab.
5-136 Chapter 5 Creating Metadata for Jobs

q. Select Check Outs Check In All.

1) Type Adding job that populates old & recent orders as the value for the
Title field.

2) Click . Verify that all table objects are selected.

3) Click . Review the information in the Summary window.

4) Click . The table objects should no longer be on the Checkouts tab.

The Orion Jobs folder should resemble the following:

Chapter 6 Orion Star Case Study

6.1 Exercises ......................................................................................................................... 6-3

Define and Load the Customer Dimension Table ................................................................... 6-6

Define and Load the Organization Dimension Table .............................................................. 6-8

Define and Load the Time Dimension Table ......................................................................... 6-11

6.2 Solutions to Exercises ................................................................................................. 6-13

Define and Load the Customer Dimension Table ................................................................. 6-13

Define and Load the Organization Dimension Table ............................................................ 6-19

Define and Load the Time Dimension Table ......................................................................... 6-29

6-2 Chapter 6 Orion Star Case Study
6.1 Exercises 6-3

6.1 Exercises

Orion Star Target Star Schema (Review)

One goal of this class is to produce, from the Orion Star
source data, a dimensional data model that is a star
schema.

Organization Customer
Dimension Dimension

Order
Fact Table

Product Time
Dimension Dimension

Order Fact Table Diagram

Product Dimension Table Diagram

Customer Dimension Table Diagram

The Customer Dimension
table process was CustDim
sketched to be this. Computed
The source table and Columns
target table objects need
to be defined in Customer_Age
metadata, as well as the Customer_Age_Group
process flow.
There are two computed
columns defined for the
target table.
Customer Customer Types

5
6.1 Exercises 6-5

Organization Dimension Table Diagram

The Organization Dimension
table process was sketched OrgDim
to be this. Computed
The source table and target Columns
Company
table objects need to be
Department
defined in metadata, as well
Group
as the process flow.
Section
There are four computed
columns defined for the
target table.

Organization Staff

Time Dimension Table Diagram

7
6-6 Chapter 6 Orion Star Case Study

In this exercise set you will define and load the three remaining dimension tables. For each of these
target tables, you will need to define:
metadata for the source tables
metadata for the target table
metadata for the process flow to move the data from the source(s) to the target
In addition, metadata objects for data sources needed for showing features of the software will be defined.

Define and Load the Customer Dimension Table

1. Defining the Customer Dimension Source Table Metadata
Two source tables are needed for the customer dimension table, Customer and Customer
Types. The Customer table is a SAS table located in the DIFT Orion Source Tables Library.
The Customer Types table object was defined in metadata as part of a demonstration in Chapter
3. If necessary, complete these steps to define the Customer Types table object. Metadata objects
for tables should be placed in the /Data Mart Development/Orion Source Data folder. Use the
naming convention of DIFT <tablename> for the metadata table objects.
2. Defining the Customer Dimension Target Table Metadata
The following are some specifics for defining the metadata for the customer dimension table:
Name the metadata table object DIFT Customer Dimension.
Store the metadata object in the /Data Mart Development/Orion Target Data folder.
The physical table, a SAS table, is to be named CustDim and stored in the DIFT Orion Target
Tables Library.
From the Customer table, import the following column metadata:
ORIGINAL COLUMN
(in Customer): RENAME TO
Customer_ID
Country Customer_Country
Gender Customer_Gender
Customer_Name
Customer_FirstName
Customer_LastName
Birth_Date Customer_Birth_Date
From the Customer Types table, import the following column metadata:
ORIGINAL COLUMN
(in Customer Types):
Customer_Type
Customer_Group
6.1 Exercises 6-7

Add two new columns with the following properties:

NAME DESCRIPTION LENGTH TYPE
Customer_Age Customer Age 8 Numeric
Customer_Age_Group Customer Age Group 12 Character

Create a simple index for Customer_ID.

3. Loading the Customer Dimension Target Table

Some specifics for creating the job to load the DIFT Customer Dimension table are shown
below:
Name the job DIFT Populate Customer Dimension Table.
Store the metadata object in the /Data Mart Development/Orion Jobs folder.
Two tables should be joined together, DIFT Customer and DIFT Customer Types.
The SQL Join transformation will be used for the join, based on the Customer_Type_ID
variable from the input tables.
The join should include all rows from the DIFT Customer table and only matching rows from
the DIFT Customer Types table.
The data should be filtered using the following expression: Customer_Type_ID>=0
Two computed columns will be constructed, Customer_Age and Customer_Age_Group.
The calculations for the columns are shown below:

Column Expression
Customer_Age Floor(Yrdif(customer.birth_date, today(), 'actual'))

Customer_Age_Group Put(calculated customer_age,agegroup.)

The metadata for the columns can be imported from the DIFT Customer Dimension
metadata object and the text for the expressions can be found in HelperFile.txt.
4. Check-In the Metadata Objects for the Customer Dimension
After verifying that the job ran successfully, check in all the objects from the project repository.
6-8 Chapter 6 Orion Star Case Study

Define and Load the Organization Dimension Table

5. Defining the Organization Dimension Source Table Metadata
Two source tables are needed for the organization dimension table, STAFF and ORGANIZATION.
The ORGANIZATION table is a SAS table found in the DIFT Orion Source Tables Library. The
STAFF table object was defined in metadata as part of the exercises at the end of Chapter 3 (Exercise
#4, solution on pages 3-89 to 3-92). If necessary, complete these steps to define the Staff table
object. Metadata objects for tables should be placed in the /Data Mart Development/Orion Source
Data folder. Use the naming convention of DIFT <tablename> for the metadata table objects.
6. Organization Dimension Target Table Metadata
The following are some specifics for defining the metadata for the organization dimension table:
Name the metadata table object DIFT Organization Dimension.
Store the metadata object in the /Data Mart Development/Orion Target Data folder.
Name the physical table, a SAS table, OrgDim, and store it in the DIFT Orion Target Tables
Library.
From the Organization table, import the following column metadata:
ORIGINAL COLUMN
(in Organization): RENAME TO
Employee_ID
Org_Name Employee_Name
Country Employee_Country
From the Staff table, import the following column metadata:
ORIGINAL COLUMN
(in Staff): RENAME TO
Job_Title
Salary
Gender Employee_Gender
Birth_Date Employee_Birth_Date
Emp_Hire_Date Employee_Hire_Date
Emp_Term_Date Employee_Term_Date
Add four new columns with the following properties:
NAME LENGTH TYPE
Group 40 Character
Section 40 Character
Department 40 Character
Company 30 Character
6.1 Exercises 6-9

Create a simple index for Employee_ID.

7. Loading the Organization Dimension Target Table

Some specifics for creating the job to load the DIFT Organization Dimension table are
shown below:
Name the job DIFT Populate Organization Dimension Table.
Store the metadata object in the /Data Mart Development/Orion Jobs folder.
Two tables should be joined together, DIFT Organization and DIFT Staff.
The SQL Join transformation will be used for the join, based on the Employee_ID variable
from the input tables.
The Table Loader transformation will be used for the final loading of the OrgDim table.
The join should include all rows from the Staff table and only matching rows from the
Organization table.
The data should be subset using the following expression: Org_Level=1
Four computed columns must be constructed (Company, Department, Section, and
Group). Because of the nested complexities of the calculations for these columns, it is best to
create four temporary variables and use the temporary variables to build the desired ones.
The four temporary columns that must be added, along with their length, type, and expressions,
are shown below:

Name Length Type Expression

_Group 8 Numeric input(put(organization.employee_id,orgdim.),12.)
_Section 8 Numeric input(put(calculated _Group, orgdim.),12.)
_Department 8 Numeric input(put(calculated _Section, orgdim.),12.)
_Company 8 Numeric input(put(calculated _Department, orgdim.),12.)

The text for the expressions can be found in HelperFile.txt.

The calculations for the desired computed columns are shown below:

Name Expression
Group put(calculated _Group, org.)

Section put(calculated _Section, org.)

Department put(calculated _Department, org.)

Company put(calculated _Company, org.)

The metadata for the columns can be imported from the DIFT Organization Dimension
metadata object and the text for the expressions can be found in HelperFile.txt.
6-10 Chapter 6 Orion Star Case Study

Verify that all columns for the Table Loaders target table have a defined mapping.
The warning message regarding a compressed data set occurs because the DIFT Organization
table has a COMPRESS=YES property set. Edit the table properties to set this to NO.

8. Check-In the Metadata Objects for the Organization Dimension

After verifying the job has run successfully, check-in all the objects from project repository.
6.1 Exercises 6-11

Define and Load the Time Dimension Table

The time dimension will be created using user-written code; therefore, no metadata objects for source
tables are needed.
9. Defining the Time Dimension Target Table Metadata
The time dimension table will be created with a user-written SAS program. There are two ways to
create the target table metadata:
Method 1: Use the New Table wizard and register all the columns manually.
Method 2: Run a modified version of the program that will be used to generate the table. (This
generates a temporary version of the table.) When the table exists, the Register Tables
wizard can be used to register the table.
Choose one of the methods and create a metadata table object for DIFT Time Dimension.

For Method 1:
Use the New Table wizard to create a metadata table object named DIFT Time Dimension.
Store the metadata object in the /Data Mart Development/Orion Target Data folder.
The following columns must be entered manually:
NAME LENGTH TYPE FORMAT
Date_ID 4 Numeric Date9.
WeekDay_Num 8 Numeric
WeekDay_Name 9 Character
Month_Num 8 Numeric
Year_ID 4 Character
Month_Name 9 Character
Quarter 6 Character
Holiday_US 26 Character
Fiscal_Year 4 Character
Fiscal_Month_Num 8 Numeric
Fiscal_Quarter 6 Character
Name the physical table, a SAS table, TimeDim, and store it in the DIFT Orion Target Tables
Library.
6-12 Chapter 6 Orion Star Case Study

For Method 2:
In SAS Data Integration Studio, select Tools Code Editor.
In the Enhanced Editor window, include the TimeDim.sas program from the
S:\Workshop\dift\SASCode directory.
Submit the program and verify that no errors were generated in the Log window.
In SAS Data Integration Studio, invoke the Register Tables wizard.
Store the metadata object in the \Data Mart Development\Orion Target Data folder.
Select SAS as the source type, and select DIFT Orion Target Tables Library.
The TimeDim table should be available.
Set the name of the metadata table object to DIFT Time Dimension.
Verify (update if necessary) that the length of Date_ID is 4.
In the Code Editor window, uncomment the PROC DATASETS step and run just that step. Verify
that the TimeDim table is deleted (check the Log). You will re-create it via a SAS Data Integration
Studio job.
Close the Code Editor window. (Select File Close.) Do not save any changes.
10. Loading the Time Dimension Target Table
Some specifics for creating the job to load the DIFT Time Dimension table are shown below:
Name the job DIFT Populate Time Dimension Table.
Store the metadata object in the /Data Mart Development/Orion Jobs folder.
Use the User Written Code transformation to specify the code to load this table.
Add the Table Loader transformation to the process flow for visual effect but specify to exclude
this transformation from running.

The code for creating the TimeDim table is found in

S:\Workshop\dift\SASCode\timedimNoLibname.sas.
Open the Job Properties window and select the Precode and Postcode tab.
Enter the following LIBNAME statement in metadata:
libname target "S:\Workshop\dift\datamart";
Run the job. There should be 4384 observations and 11 variables.
11. Check-In the Metadata Objects for the Time Dimension
After verifying the job has run successfully, check-in all the objects from project repository.
6.2 Solutions to Exercises 6-13

6.2 Solutions to Exercises

Define and Load the Customer Dimension Table

1. Defining the Customer Dimension Source Table Metadata
a. Select File Register Tables. The Register Tables wizard opens.
b. Select SAS as the type of table to import information.

c. Click . The Select a SAS Library window opens.

d. Select next to the SAS Library field and then click DIFT Orion Source Tables Library.

e. Click . The Define Tables and Select Folder Location window displays.

f. Select CUSTOMER table.

g. Verify that /Data Mart Development/Orion Source Data is the folder listed for the Location
field.

h. Click . The review window displays.

i. Verify that the information is correct and click .

j. Right-click the CUSTOMER metadata table object and select Properties.

k. Type DIFT at the beginning of the default name.

l. Click to close the Properties window.

2. Defining the Customer Dimension Target Table Metadata

a. Select the Folders tab.
b. Expand Data Mart Development Orion Target Data.
c. Verify that the Data folder is selected and select File New Table. The New Table wizard
opens.
d. Type DIFT Customer Dimension as the value for the Name field.

e. Verify that the location is set to /Data Mart Development/Orion Target Data.

f. Click .

g. Verify that the DBMS field is set to SAS.

h. Select DIFT Orion Target Tables Library as the value for the Library field.

i. Type CustDim as the value for the Name field.

j. Click .
6-14 Chapter 6 Orion Star Case Study

k. Expand the Data Mart Development Orion Source Data folder on the Folders tab.
l. From the Data folder, expand DIFT Customer Types table object.
m. Select the Customer_Type and Customer_Group columns from DIFT Customer Types and
click to move the columns to the Selected pane.

n. Select the Checkouts tab.

o. Expand DIFT CUSTOMER table object.

p. Select the following columns from DIFT CUSTOMER and click to move the columns to the
Selected pane:
Customer_ID
Country
Gender
Customer_Name
Customer_FirstName
Customer_LastName
Birth_Date

q. Click .

r. Change the name of the following columns:

ORIGINAL COLUMN
(in Customer): RENAME TO
Country Customer_Country
Gender Customer_Gender
6.2 Solutions to Exercises 6-15

s. Add new column metadata.

1) Select the last column.

2) Click to define an additional column.

3) Enter the following information for the new column:

Column Name Description Length Type

Customer_Age Customer Age 8 Numeric

4) Click to define an additional column.

5) Enter the following information for the new column:

Column Name Description Length Type

Customer_Age_Group Customer Age Group 12 Character

t. Define needed indexes:

1) Click .

2) Click to add the index.

3) Enter an index name of Customer_ID and press ENTER.

4) Select the Customer_ID column and move it to the Indexes panel by clicking .

5) Click .

u. Click .

v. Review the metadata listed in the finish window and then click . The new table object
appears on the Checkouts tab.
3. Loading the Customer Dimension Target Table
a. Select the Folders tab.
b. Expand Data Mart Development Orion Jobs.
c. Verify that the Orion Jobs folder is selected.
d. Select File New Job. The New Job window opens.
1) Type DIFT Populate Customer Dimension Table as the value for the Name
field.
2) Verify that the Location is set to /Data Mart Development/Orion Jobs.

3) Click . The Job Editor window opens.

6-16 Chapter 6 Orion Star Case Study

e. Add the source data objects to the process flow.

1) Click the Checkouts tab.
2) Right-click on the DIFT CUSTOMER table object and select Add to Diagram.
3) Select the Folders tab.
4) If necessary, expand Data Mart Development Orion Source Data.
5) Right-click on the DIFT Customer Types table object and select Add to Diagram.
f. Select File Save to save diagram and job metadata to this point.
g. Add the SQL Join transformation to the diagram.
1) In the tree view, select the Transformations tab.
2) Expand the Data grouping.
3) Select the SQL Join transformation.
4) Drag the SQL Join transformation to the diagram.
5) Center the SQL Join so that it is in middle of the DIFT CUSTOMER table object and DIFT
Customer Types table object.

h. Add inputs to the SQL Join transformation.

1) Click on the connection selector for DIFT CUSTOMER and drag to one of the input ports for
the SQL Join transformation.
2) Click on the connection selector for DIFT Customer Types and drag to other input port for the
SQL Join transformation.
i. Select File Save to save diagram and job metadata to this point.
j. Add target table to the diagram.
1) Right-click on the temporary output table for the SQL Join and select Replace.
2) Click the Checkouts tab.
3) Select DIFT Customer Dimension.

4) Click .

4) Establish the join criteria of the Customer_Type_ID columns from the sources being
equal.
a) Double-click the Left icon in the process flow diagram for SQL clauses.

b) Click .

c) Under the first Operand field, click and then Choose column(s).
d) Expand DIFT Customer Types and select Customer_Type_ID.

e) Click .

f) Under the Operator field, verify that the Operator is =.

g) Under the second Operator field, click and then Choose column(s).
h) Expand DIFT Customer and select Customer_Type_ID.

i) Click .

5) Establish the subsetting criteria of Customer_Type_ID >= 0.

a) Double-click the Where keyword on the Navigate pane to surface the Where tab.

b) Click .

c) Under the first Operand field, click and then Choose column(s).
d) Expand DIFT Customer Types and select Customer_Type_ID.

e) Click .

f) Under the Operator field, click and then >=.

g) Under the second Operator field, click, type in 0, and then press ENTER.
6) Verify mappings and establish new ones if necessary.
a) Select the Select keyword on the Navigate pane to surface the Select tab.
b) Manually map the Country source column to the Customer_Country column.

c) Manually map the Gender source column to the Customer_Gender column.

d) Click to fold the target table over the source table.

e) Locate and open the HelperFile.txt file in S:\Workshop\dift\.
7) Copy the expression for Customer_Age.
floor(yrdif(customer.birth_date, today(),'actual'))

8) Paste the copied code in the Expression area.

6-18 Chapter 6 Orion Star Case Study

9) Copy the expression for Customer_Age_Group.

put(calculated customer_age, agegroup.)

10) Paste the copied code in the Expression area.

11) Manually map the Birth_Date source column to the Customer_Age column.

m. Click to return to the Job Editor.

n. Right-click in background of the job and select Run.

o. Verify that the job runs successfully.
p. Select the Log tab and verify that DIFTTGT.CUSTDIM is created with 89,954 observation and
11 variables.
q. Select the Diagram tab.
r. Right-click on DIFT Customer Dimension and select Open.
s. Review the data then choose File Close.
t. Select File Close to close the job editor.
4. Check-In the Metadata Objects for the Customer Dimension
a. Select the repository name on the Checkouts tab.
b. Select Check Outs Check In All.
c. Type Adding source/target/job for Customer Dimension as the value for the
Title field.

d. Click . Verify that all table objects are selected.

e. Click . Review the information in the Summary window.

f. Click . The table object should no longer be on the Checkouts tab.

6.2 Solutions to Exercises 6-19

Define and Load the Organization Dimension Table

5. Defining the Organization Dimension Source Table Metadata
a. Select File Register Tables. The Register Tables wizard opens.
b. Select SAS as the type of table to import information.

c. Click . The Select a SAS Library window opens.

d. Select next to the SAS Library field and then click DIFT Orion Source Tables Library.

e. Click . The Define Tables and Select Folder Location window displays.

f. Select ORGANIZATION table.

g. Verify that /Data Mart Development/Orion Source Data is the folder listed for the Location
field.

h. Click . The review window displays.

i. Verify that the information is correct and click .

j. Select the Checkouts tab.

k. Right-click the ORGANIZATION metadata table object and select Properties.
l. Type DIFT at the beginning of the default name.

m. Click to close the Properties window.

6. Organization Dimension Target Table Metadata

a. Select the Folders tab.
b. Expand Data Mart Development Orion Target Data.
c. Verify that the Data folder is selected and select File New Table. The New Table wizard
opens.
d. Type DIFT Organization Dimension as the value for the Name field.

e. Verify that the location is set to /Data Mart Development/Orion Target Data.

f. Click .

g. Verify that the DBMS field is set to SAS.

h. Select DIFT Orion Target Tables Library as the value for the Library field.

i. Type OrgDim as the value for the Name field.

j. Click .
6-20 Chapter 6 Orion Star Case Study

k. Expand the Data Mart Development Orion Source Data folder on the Folders tab.
l. From the Data folder, expand DIFT STAFF table object.

m. Select the following columns from DIFT STAFF and click to move the columns to the
Selected pane.
Job_Title
Salary
Gender
Birth_Date
Emp_Hire_Date
Emp_Term_Date
n. Select the Checkouts tab.
o. Expand DIFT ORGANIZATION table object.

p. Select the following columns from DIFT ORGANIZATION and click to move the columns
to the Selected pane.
Employee_ID
Org_Name
Country

q. Click .

r. Change the name of the following columns:

ORIGINAL COLUMN RENAME TO
Org_Name Employee_Name
Country Employee_Country
Gender Employee_Gender
Birth_Date Employee_Birth_Date
Emp_Hire_Date Employee_Hire_Date
Emp_Term_Date Employee_Term_Date

s. Add new column metadata.

1) Select the last column.

2) Click to define an additional column.

3) Enter the following information for the new column:

NAME LENGTH TYPE
Group 40 Character
6.2 Solutions to Exercises 6-21

4) Click to define an additional column.

5) Enter the following information for the new column:

NAME LENGTH TYPE
Section 40 Character

6) Click to define an additional column.

7) Enter the following information for the new column:

NAME LENGTH TYPE
Department 40 Character

8) Click to define an additional column.

9) Enter the following information for the new column:

NAME LENGTH TYPE
Company 30 Character

t. Define needed indexes:

1) Click .

2) Click to add the index.

3) Enter an index name of Employee_ID and press ENTER.

4) Select the Employee_ID column and move it to the Indexes panel by clicking .

5) Click .

u. Click .

v. Review the metadata listed in the finish window and then click . The new table object
appears on the Checkouts tab.
6-22 Chapter 6 Orion Star Case Study

7. Loading the Organization Dimension Target Table

a. Select the Folders tab.
1) Expand Data Mart Development Orion Jobs.
2) Verify that the Orion Jobs folder is selected.
b. Select File New Job. The New Job window opens.
1) Type DIFT Populate Organization Dimension Table as the value for the
Name field.

2) Verify that the Location is set to /Data Mart Development/Orion Jobs.

3) Click .

c. Add the source data objects to the process flow.

1) If necessary, expand Data Mart Development Orion Source Data.
2) Select the DIFT STAFF table object and drag it to the Diagram tab of the Job Editor.
3) Click the Checkouts tab.
4) Select DIFT ORGANIZATION table object and drag it to the Diagram tab of the Job
Editor.
5) Arrange the source data objects so that they are separated.
d. Select File Save to save diagram and job metadata to this point.
e. Add the SQL Join transformation to the diagram.
1) In the tree view, select the Transformations tab.
2) Expand the Data grouping.
3) Select the SQL Join transformation.
4) Drag the SQL Join transformation to the diagram.
5) Center the SQL Join so that it is in middle of the two table objects.
f. Select File Save to save diagram and job metadata to this point.
g. Add inputs to the SQL Join transformation.
1) Place your cursor over the SQL Join transformation in the diagram to reveal the two default
ports.
2) To connect the DIFT STAFF table object to the SQL Join, place your cursor over the
connection selector until a pencil icon appears.
3) Click on this connection selector and drag to one of the input ports for the SQL Join
transformation.
6.2 Solutions to Exercises 6-23

4) To connect the DIFT ORGANIZATION table object to the SQL Join, place your cursor over
the connection selector until a pencil icon appears.
5) Click on this connection selector and drag to one of the input ports for the SQL Join
transformation.
h. Select File Save to save diagram and job metadata to this point.
i. Add the Table Loader transformation to the diagram.
1) In the tree view, select the Transformations tab.
2) Expand the Access grouping.
3) Select the Table Loader transformation.
4) Drag the Table Loader transformation to the diagram.
5) Center the Table Loader so that it is to the right of the SQL Join transformation.
j. Connect the SQL Join transformation (click on the temporary table icon, , associated with the
SQL Join and drag) to the Table Loader transformation.
k. Select File Save to save diagram and job metadata to this point.
l. Add the DIFT Organization Dimension table object to the process flow.

1) Click the Checkouts tab.

2) Drag the DIFT Organization Dimension table object to the diagram.
3) Center the table object so that it is to the right of the Table Loader transformation.
m. Connect the Table Loader transformation to the DIFT Organization Dimension table
object.
n. Select File Save to save diagram and job metadata to this point.
o. Rename the temporary table object associated with the SQL Join transformation.
1) Right-click on the temporary table element attached to the SQL Join transformation and select
Properties.
2) Select the Physical Storage tab.
3) Type SQLJoin as the value for the Name field.

4) Click to close the SQL Join Properties window.

p. Select File Save to save diagram and job metadata to this point.
q. Specify the properties of the SQL Join transformation.
1) Right-click on the SQL Join transformation and select Open. The Designer window opens.
2) Right-click the Join item on the Diagram tab and change the join to a Left join.
3) Double-click the Where keyword on the SQL Clauses pane.
6-24 Chapter 6 Orion Star Case Study

4) Select the Where keyword on the Navigate pane to surface the Where tab.

a) Click .

b) Select Choose column(s) from the drop-down list under the first Operand column.

c) Expand the DIFT ORGANIZATION table and select Org_Level.

d) Click .

e) Verify Operator is an equals sign (=).

f) Type 1 for the second Operand field and press ENTER.

5) Select the Select keyword on the Navigate pane to surface the Select tab.
a) Remove all target columns by right-clicking someplace over the target table side and
choose Select All.

b) Choose the delete tool and then Delete Target Columns.

c) Import column definitions from the target table.

d) Choose the import tool, .

e) Select the Checkouts tab.

f) Select the DIFT Organization Dimension table object.

g) Click to move columns to the Selected area.

h) Click to close the Import Columns window

i) Map columns.
(1) Right-click in the panel between source and target columns, and select Map All.
Three of the target columns map.
(2) Map the Country column to Employee_Country by clicking on the Country
column and dragging to the Employee_Country.
(3) Map the Gender column to Employee_Gender by clicking on the Gender
column and dragging to the Employee_Gender.
(4) Map the Org_Name column to Employee_Name by clicking on the Org_Name
column and dragging to the Employee_Name.
(5) Map the Birth_Date column to Employee_Birth_Date by clicking on the
Birth_Date column and dragging to the Employee_Birth_Date.
(6) Map the Emp_Hire_Date column to Employee_Hire_Date by clicking on the
Emp_Hire_Date column and dragging to the Employee_Hire_Date.
6.2 Solutions to Exercises 6-25

(7) Map the Emp_Term_Date column to Employee_Term_Date by clicking on the

Emp_Term_Date column and dragging to the Employee_Term_Date.
This leaves four columns that are to be calculated.

j) Click to fold the target table over the source table. This provides more room to work on
the calculated expressions.
(1) Select Employee_Country (or the last column before the four that need expressions)
and then click on the toolbar.

(2) Specify new column attributes.

Type _Group as the value for the Name.
Select Numeric as the value for Type.

(3) Select _Group and then click on the toolbar.

(4) Specify new column attributes.

Type _Section as the value for the Name.
Select Numeric as the value for Type.

(5) Select _Section then click on the toolbar.

(6) Specify new column attributes.

Type _Department as the value for the Name.
Select Numeric as the value for Type.

(7) Select _Department then click on the toolbar.

(8) Specify new column attributes.

Type _Company as the value for the Name.
Select Numeric as the value for Type.
(9) Locate and open the HelperFile.txt file in S:\Workshop\dift\SASCode.
(10) Locate the _Group column. In the Expression column, select Advanced
from the drop-down list. The Expression window opens as displayed:
(11) Copy the expression for _Group from HelperFile.txt.
Input(Put(organization.employee_id,orgdim.),12.)

(12) Paste the copied code in the Expression Text area.

(13) Select to close the Expression window.

(14) Locate the _Section column. In the Expression column, select Advanced
from the drop-down list. The Expression window opens as displayed.
6-26 Chapter 6 Orion Star Case Study

(15) Copy the expression for _Section from HelperFile.txt.

Input(Put(calculated _group,orgdim.),12.)

(16) Paste the copied code in the Expression Text area.

(17) Select to close the Expression window.

(18) Locate the _Department column. In the Expression column, select

Advanced from the drop-down list. The Expression window opens as displayed.
(19) Copy the expression for _Department from HelperFile.txt.
Input(Put(calculated _section,orgdim.),12.)

(20) Paste the copied code in the Expression Text area.

(21) Select to close the Expression window.

(22) Locate the _Company column. In the Expression column, select Advanced
from the drop-down list. The Expression window opens as displayed:
(23) Copy the expression for _Company from HelperFile.txt.
Input(Put(calculated _department,orgdim.),12.)

(24) Paste the copied code in the Expression Text area.

(25) Select to close the Expression window.

(26) Locate the Group column. In the Expression column, select Advanced from
the drop-down list. The Expression window opens as displayed.
(27) Copy the expression for Group from HelperFile.txt.
Put(calculated _group,org.)

(28) Paste the copied code in the Expression Text area.

(29) Select to close the Expression window.

(30) Locate the Section column. In the Expression column, select Advanced
from the drop-down list. The Expression window opens as displayed:
(31) Copy the expression for Section from HelperFile.txt.
Put(calculated _section,org.)

(32) Paste the copied code in the Expression Text area.

(33) Select to close the Expression window.

(34) Locate the Department column. In the Expression column, select

Advanced from the drop-down list. The Expression window opens as displayed.
6.2 Solutions to Exercises 6-27

(35) Copy the expression for Department from HelperFile.txt.

Put(calculated _department,org.)

(36) Paste the copied code in the Expression Text area.

(37) Select to close the Expression window.

(38) Locate the Company column. In the Expression column, select Advanced
from the drop-down list. The Expression window opens as displayed.
(39) Copy the expression for Company from HelperFile.txt.
Put(calculated _company,org.)

(40) Paste the copied code in the Expression Text area.

(41) Select to close the Expression window.

k) Click to fold the target table info back to the right side.
l) Map the Employee_ID column from the ORGANIZATION table to _Group.

m) Select File Save to save changes to the SQL Join transformation.

6) Click to return to the Job Editor.

r. Specify properties of the Table Loader transformation.

1) Right-click on the Table Loader transformation and select Properties.

2) Select the Mappings tab.

3) Right-click in the panel between the source columns and the target columns and select
Map All. The mappings are updated. Note that all target columns are mapped.

4) Click to close the Table Loader Properties window.

s. Run the job by right-clicking in background of the job and select Run. The job runs without
Errors or Warnings.

t. Click to close the Details view.

u. View the Log for the executed Job by selecting the Log tab.
6-28 Chapter 6 Orion Star Case Study

v. Scroll to view the note about the creation of the DIFTTGT.ORGDIM table:

8. Check-In the Metadata Objects for the Organization Dimension

a. Select Check Outs Check In All.
b. Type Adding source/target/job for Organization Dimension as the value for
the Title field.

c. Click . Verify that all table objects are selected.

d. Click . Review the information in the Summary window.

e. Click . The table object should no longer be on the Checkouts tab.

6.2 Solutions to Exercises 6-29

Define and Load the Time Dimension Table

9. Defining the Time Dimension Target Table Metadata
a. Select the Folders tab.
b. Expand Data Mart Development Orion Target Data.
c. Verify that the Data folder is selected and select File New Table. The New Table wizard
opens.
d. Type DIFT Time Dimension as the value for the Name field.

e. Verify that the location is set to /Data Mart Development/Orion Target Data.

f. Click .

g. Verify that the DBMS field is set to SAS.

h. Select DIFT Orion Target Tables Library as the value for the Library field.

i. Type TimeDim as the value for the Name field.

j. Click .

k. Click .

l. Add the 11 new columns:

1) Click to define the first column.

2) Enter the following information for the new column:

Column Name Description Length Type Format

Date_ID 4 Numeric DATE9.

3) Click to define an additional column.

4) Enter the following information for the new column:

Column Name Description Length Type

WeekDay_Num 8 Numeric
6-30 Chapter 6 Orion Star Case Study

5) Click to define an additional column.

6) Enter the following information for the new column:

Column Name Description Length Type

WeekDay_Name 9 Character

7) Click to define an additional column.

8) Enter the following information for the new column:

Column Name Description Length Type

Month_Num 8 Numeric

9) Click to define an additional column.

10) Enter the following information for the new column:

Column Name Description Length Type

Year_ID 4 Character

11) Click to define an additional column.

12) Enter the following information for the new column:

Column Name Description Length Type

Month_Name 9 Character

13) Click to define an additional column.

14) Enter the following information for the new column:

Column Name Description Length Type

Quarter 6 Character

15) Click to define an additional column.

16) Enter the following information for the new column:

Column Name Description Length Type

Holiday_US 26 Character

17) Click to define an additional column.

6.2 Solutions to Exercises 6-31

18) Enter the following information for the new column:

Column Name Description Length Type

Fiscal_Year 4 Character

19) Click to define an additional column.

20) Enter the following information for the new column:

Column Name Description Length Type

Fiscal_Month_Num 8 Numeric

21) Click to define an additional column.

22) Enter the following information for the new column:

Column Name Description Length Type

Fiscal_Quarter 6 Character

m. Click to define a simple index for Date_ID.

1) Click to add the first index.

2) Enter an index name of Date_ID and press ENTER.

3) Select the Date_ID column and move to the Indexes panel by clicking .

4) Click .

n. Click .

a. Review the information.

o. Click .

10. Loading the Time Dimension Target Table

a. Select the Folders tab.
1) Expand Data Mart Development Orion Jobs.
2) Verify that the Orion Jobs folder is selected.
b. Select File New Job. The New Job window opens.
1) Type DIFT Populate Time Dimension Table as the value for the Name field.

2) Verify that the Location is set to /Data Mart Development/Orion Jobs.

3) Click .
6-32 Chapter 6 Orion Star Case Study

c. Add the target table metatdata object to the process flow.

1) Select Checkouts tab.
2) Select the DIFT Time Dimension table object and drag it to the Diagram tab of the Job
Editor.
d. Select File Save to save diagram and job metadata to this point.
e. Add the User Written Code transformation to the diagram.
1) In the tree view, select the Transformations tab.
2) Expand the Data grouping.
3) Select the User Written Code transformation.
4) Drag the User Written Code transformation to the diagram and place it to the left of the
DIFT Time Dimension table object.

f. Select File Save to save diagram and job metadata to this point.
g. Add the Table Loader transformation to the diagram.
1) In the tree view, select the Transformations tab.
2) Expand the Access grouping.
3) Select the Table Loader transformation.
4) Drag the Table Loader transformation to the diagram.
5) Center the Table Loader so that it is to the right of the User Written Code transformation.
h. Connect the User Written Code transformation to the Table Loader transformation.
i. Select File Save to save diagram and job metadata to this point.
j. Add target table to the diagram.
1) Select Checkouts tab.
2) Select the DIFT Time Dimension table object.
3) Drag the DIFT Time Dimension table object to the diagram.
4) Connect the Table Loader transformation to DIFT Time Dimension table object.
k. Select File Save to save diagram and job metadata to this point.
l. Specify properties for the User Written code transformation.
1) Right-click on User Written Code transformation and select Properties.
2) Select Code tab.

3) Select All user written as the value for the Code generation mode field.
6.2 Solutions to Exercises 6-33

4) Click . The Open window is displayed.

5) Click .

6) Navigate to S:\Workshop\dift\SASCode.

7) Select TimeDim.sas.
6-34 Chapter 6 Orion Star Case Study

8) Click . The path and filename of the code file are now listed in the Open
window.

9) Click to close the Open window.

10) Click to close the User Written Properties window.

m. Specify properties of the Table Loader transformation.

1) Right-click on the Table Loader transformation and select Properties.
2) Click the Mappings tab.

3) Click (propagate from target to sources tool) from the tool set on the Mappings tab.
6.2 Solutions to Exercises 6-35

4) Verify that all columns are mapped. If not, right-click in the panel between the source
columns and the target columns and select Map All. The mappings will be updated.
5) Click the Code tab.
6) Click Exclude transformation from run.

7) Click to close the Table Loader Properties window.

n. Run the job by right-clicking in the background of the job and select Run. The job runs without
Errors or Warnings.

o. View the Log for the executed Job by selecting the Log tab.
p. Select File Save to save diagram and job metadata to this point.
q. Select File Close to close the Job Editor.
11. Check-In the Metadata Objects for the Time Dimension
a. Select Check Outs Check In All.
b. Type Adding target table & job for Time Dimension as the value for the
Title field.

c. Click . Verify that all table objects are selected.

d. Click . Review the information in the Summary window.

e. Click . The table object should no longer be on the Checkouts tab.

6-36 Chapter 6 Orion Star Case Study
Chapter 7 Working with
Transformations

7.1 Introduction..................................................................................................................... 7-3

Demonstration: Create Orion Reports Subfolders ................................................................. 7-5

7.2 Using Extract, Summary Statistics, and Loop Transformations ................................ 7-7
Demonstration: Using the Extract and Summary Statistics Transformation .......................... 7-9

Demonstration: Using the Loop Transformations ................................................................ 7-34

Exercises .............................................................................................................................. 7-54

7.3 Establishing Status Handling ...................................................................................... 7-57

Demonstration: Working with Transformation and Job Status Handling .............................. 7-61

Demonstration: Using the Return Code Check Transformation .......................................... 7-71

7.4 Using the Data Validation Transformation ................................................................. 7-77

Demonstration: Using the Data Validation Transformation .................................................. 7-79

Exercises .............................................................................................................................. 7-93

7.5 Using Transpose, Sort, Append, and Rank Transformations ................................... 7-95
Demonstration: Using the Transpose, Sort, Append, and Rank Transformations ............... 7-99

7.6 Basic Standardization with the Apply Lookup Standardization Transformation .. 7-123
Demonstration: Using the Apply Lookup Standardization Transformation......................... 7-125

Exercises ............................................................................................................................ 7-138

7.7 Solutions to Exercises ............................................................................................... 7-140

7-2 Chapter 7 Working with Transformations
7.1 Introduction 7-3

7.1 Introduction

Objectives
List transformations that are discussed in this chapter.

Transformations Tree
The Transformations tree organizes transformations into a
set of folders. You can drag a transformation from the
Transformations tree to the Job Editor, where you can
connect it to source and target tables and update its
default metadata. By updating a transformation with the
metadata for actual sources, targets,
and transformations, you can quickly
create process flow diagrams for
common scenarios.

4
7-4 Chapter 7 Working with Transformations

Transformation Examples
This chapter has examples that use a number of
transformations available in SAS Data Integration Studio.
Here is a partial listing of those transformations:
Q Sort

Q Rank

Q Transpose

Q Data Validation

Q Extract

Q Append

Q Summary Statistics

Q One-Way Frequency

5
7.1 Introduction 7-5

Create Orion Reports Subfolders

This demonstration creates a series of subfolders under the Orion Reports folder. The new folders will be
used to organize the various metadata objects created and used in the subsequent sections of this chapter.
1. If necessary, access SAS Data Integration Studio using Brunos credentials.
a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
b. Verify that the connection profile is My Server.

c. Click to close the Connection Profile window and open the Log On window.

d. Type Bruno as the value for the User ID field and Student1 as the value for the Password
field.

e. Click to close the Log On window.

2. Click the Folders tab in the tree view area.

3. Expand the Data Mart Development folder (click in front of Data Mart Development).
4. Right-click on the Orion Reports folder and select New Folder.
5. Type Extract and Summary as the name for the new folder.

6. Press ENTER.
7. Right-click on the Orion Reports folder and select New Folder.
8. Type Loop Transforms as the name for the new folder.

9. Press ENTER.
10. Right-click on the Orion Reports folder and select New Folder.
11. Type Data Validation as the name for the new folder.

12. Press ENTER.

13. Right-click on the Orion Reports folder and select New Folder.
14. Type Transpose and Rank as the name for the new folder.

15. Press ENTER.

16. Right-click on the Orion Reports folder and select New Folder.
17. Type Lookup Standardization as the name for the new folder.

18. Press ENTER.

7-6 Chapter 7 Working with Transformations

19. Right-click on the Orion Reports folder and select New Folder.
20. Type Status Handling as the name for the new folder.

21. Press ENTER.

The final set of folders for Orion Reports should resemble the following:
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-7

7.2 Using Extract, Summary Statistics, and Loop

Transformations

Objectives
Discuss and use the Extract and Summary Statistics
transformation.
Discuss and use the Loop transformations.

Extract Transformation
The Extract transformation is
typically used to create a subset
from a source. It can also be used to
create columns in a target that are
derived from columns in a source.

10
7-8 Chapter 7 Working with Transformations

Summary Statistics Transformation

The Summary Statistics
transformation provides an interface
to the MEANS procedure, which
provides data summarization tools.
By default, the MEANS procedure
displays output but can also be used
to store statistics in a SAS data set.

11
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-9

Using the Extract and Summary Statistics Transformation

This demonstration creates a report on customer order information for customers from the United States
who placed orders in 2007. The customer dimension information first need to be joined to the order fact
table and then subset, which is done in a separate job. A second job is created that will extract the desired
rows, and then a summary statistics report will be created from this extracted data.
1. If necessary, access SAS Data Integration Studio using Brunos credentials.

a. Select Start All Programs SAS SAS Data Integration Studio 4.2.

b. Verify that the connection profile is My Server.

c. Click to close the Connection Profile window and open the Log On window.

d. Type Bruno as the value for the User ID field and Student1 as the value for the Password
field.

e. Click to close the Log On window.

Create the Job to Load the Customer Order Information Table

2. Create the initial job metadata.

a. Click the Folders tab.

b. Expand Data Mart Development Orion Reports Extract and Summary.

c. Verify that the Extract and Summary folder is selected.

d. Select File New Job. The New Job window opens.

e. Type DIFT Populate Customer Order Information Table as the value for the
Name field.

f. Verify that the Location is set to

/Data Mart Development/ Orion Reports/Extract and Summary.

g. Click . The Job Editor window opens.

3. Add source table metadata to the diagram for the process flow.
a. Select the Data Mart Development Orion Target Data Folders.
b. Drag the DIFT Customer Dimension table object to the Diagram tab of the Job Editor.
c. Drag the DIFT Order Fact table object to the Diagram tab of the Job Editor.
7-10 Chapter 7 Working with Transformations

4. Add the SQL Join transformation to the process flow.

a. Click the Transformation tab.
b. Expand the Data folder and locate the SQL Join transformation template.
c. Drag the SQL Join transformation to the Diagram tab of the Job Editor. Center it between the
two table objects as follows:

d. Connect the DIFT Customer Dimension table object to one input port for the SQL Join
transformation.
e. Connect the DIFT Order Fact table object to the second input port for the SQL Join
transformation.
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-11

5. Add the target table to the process flow.

a. Right-click on the green temporary table object associated with the SQL Join transformation and
select Register Table.
b. Type DIFT Customer Order Information as the value for the Name field.

c. Verify that the Location is set to

/Data Mart Development/ Orion Reports/Extract and Summary.
7-12 Chapter 7 Working with Transformations

d. Click the Physical Storage tab.

e. Click next to the Library field.

f. Expand Data Mart Development Orion Target Data.

g. Click DIFT Orion Target Tables Library.

h. Click .

The selected library is returned to the Library field.

7.2 Using Extract, Summary Statistics, and Loop Transformations 7-13

i. Type CustomerOrderInfo as the value for the Name field.

j. Click .

The process flow diagram should resemble the following:

6. Select File Save to save diagram and job metadata to this point.
7-14 Chapter 7 Working with Transformations

7. Verify properties for the SQL Join transformation.

a. Right-click on the SQL Join transformation and select Open.
b. Verify that the type of Join is an inner join by selecting the Join icon in the process flow, and view
the Type in the Join Properties pane.

c. Select the Where keyword in the Navigate pane.

d. Verify that the inner join will be performed based on the matching (equality) Customer_ID
columns from each of the sources.
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-15

e. Add an additional WHERE clause.

1) Click .

2) Under the first Operand field, click and then Advanced. The Expression Builder
window opens.

3) On the Functions tab, click the Date and Time folder under the Categories list.

4) On the Functions tab, under the Functions list, click YEAR(date).

5) Click .

6) Click the Data Sources tab.

7) Expand the DIFT Order Fact table.
8) Click Order_Date.

9) Click .

10) Click .

11) Verify that = is set as the Operator.

12) In the second Operand field, click and type 2007 and then press ENTER.

f. Select the Select keyword on the Navigate pane.

7-16 Chapter 7 Working with Transformations

g. Verify that all 22 target columns will be mapped one-to-one using a source column.

h. Select .

8. Select File Save to save diagram and job metadata to this point.
9. Run the job.
a. Right-click in background of the job and select Run.
b. Click the Status tab in the Details area. Note that all processes completed successfully.

c. Click to close the Details view.

d. View the Log for the executed Job. Scroll to view the note about the creation of the
DIFTTGT.CUSTOMERORDERINFO:

e. Click the Diagram tab of the Job Editor window.

f. Right-click on the DIFT Customer Order Information table and select Properties.
1) Click the Columns tab.
2) Update the Format field for the Customer_Country column to $COUNTRY20..

3) Update the Format field for the Customer_Gender column to $GENDER6..

4) Click .

g. Right-click on the DIFT Customer Order Information table and select Open.
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-17

10. When you are finished viewing the DIFT Customer Order Information table, close the
View Data window by selecting File Close.
11. Save and close the Job Editor window.

b. Select File Save to save diagram and job metadata to this point.

c. Select File Close to close the Job Editor window.

Create the Job to Perform the Extract and to Create a Report.

1. Create the initial job metadata.

a. Click the Folders tab.

b. Expand Data Mart Development Orion Reports Extract and Summary.

c. Verify that the Extract and Summary folder is selected.

d. Select File New Job. The New Job window opens.

e. Type DIFT Create Report for US Customer Order Information as the value
for the Name field.

f. Verify that the Location is set to

/Data Mart Development/ Orion Reports/Extract and Summary.

g. Click . The Job Editor window opens.

2. Add source table metadata to the diagram for the process flow.

a. If necessary, click the Folders tab.

b. If necessary, expand Data Mart Development Orion Reports Extract and Summary.

c. Drag the DIFT Customer Order Information table object to the Diagram tab of the Job Editor.
3. Add the Extract transformation to the process flow.

a. Click the Transformation tab.

b. Expand the Data folder and locate the Extract transformation template.

c. Drag the Extract transformation to the Diagram tab of the Job Editor. Place the transformation
next to the table object.
7-18 Chapter 7 Working with Transformations

d. Connect the DIFT Customer Order Information table object to the Extract transformation.

4. Add the Summary Statistic transformation to the process flow.

a. If necessary, click the Transformation tab.

b. Expand the Analysis folder and locate the Summary Statistics transformation template.

c. Drag the Summary Statistics transformation to the Diagram tab of the Job Editor. Place the
transformation next to the table object.

d. Connect the Extract transformation to the Summary Statistics transformation.

5. Select File Save to save diagram and job metadata to this point.
6. Specify properties for the Extract transformation.

a. Right-click on the Extract transformation and select Properties.

b. Click the Where tab.

c. In bottom portion of the Where tab, Click the Data Sources tab.

d. Expand CustomerOrderInfo table.

e. Select Customer_Country.

f. Click . The Expression Text area updates to the following:

7.2 Using Extract, Summary Statistics, and Loop Transformations 7-19

g. In Expression Text area, type =US.

h. Click to close the Extract Properties window.

7. Select File Save to save diagram and job metadata to this point.
8. Specify properties for the Summary Statistics transformation.

a. Right-click on the Summary Statistics transformation and select Properties.

b. On the General tab, remove the default description.

7-20 Chapter 7 Working with Transformations

c. Click the Options tab.

1) Click in the Select analysis columns area to open the Select Data Source Items
window.

2) Select Total Retail Price, hold down the CTRL key and select Quantity, and then click .

3) Click to close the Select Data Source Items window. The Select analysis
columns area updates as displayed:
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-21

4) Click in the Select columns to subgroup data area to open the Select Data
Source Items window.
5) Select Customer Gender, hold down the CTRL key and select Customer Age Group, and
then click .

6) Click to close the Select Data Source Items window. The Select columns
to subgroup data area updates as displayed:
7-22 Chapter 7 Working with Transformations

d. Options Basic in the selection pane.

1) Under Selected area, click Standard deviation (STD) and then .

2) Under Selected area, click Number of observations (N) and then .

3) Under Selected area, click Minimum (MIN) and then .

7.2 Using Extract, Summary Statistics, and Loop Transformations 7-23

e. Click Percentiles in the selection pane.

1) Under Available area, click MEDIAN and then .

7-24 Chapter 7 Working with Transformations

f. Select Other options in the selection pane.

1) Type nodate nonumber ls=80 for Specify other options for OPTIONS
statement.

2) Type MAXDEC=2 NOLABELS for Other PROC MEANS options.

7.2 Using Extract, Summary Statistics, and Loop Transformations 7-25

g. Select Titles and footnotes in the selection pane.

1) Type Customer Order Statistics for Heading 1.

2) Type (United States Customers) for Heading 2.

7-26 Chapter 7 Working with Transformations

h. Select ODS options in the selection pane.

1) Select Use HTML for ODS result.

2) Click in the Location area.

3) Navigate to S:\Workshop\dift\reports.
4) Type UnitedStatesCustomerInfo.html in the Name field.

5) Click to close the Select a File window.

i. Click to close the Summary Statistics Properties window.

9. Select File Save to save diagram and job metadata to this point.
10. Run the job.

a. Right-click in background of the job and select Run.

b. Click the Status tab in the Details area. Note that all processes completed successfully.

c. Click to close the Details view.

d. View the Log for the executed Job.

7.2 Using Extract, Summary Statistics, and Loop Transformations 7-27

11. View the HTML document.

a. Open a Windows Explorer by right-clicking on Start and selecting Explore.

b. Collapse the C: drive listing.

c. Expand to S:\Workshop\dift\reports.

d. Double-click UnitedStatesCustomerInfo.html to open the generated report.

An information window appears as follows:
7-28 Chapter 7 Working with Transformations

e. Click .

f. Click to close the information bar in Internet Explorer window.

The generated report displays:

g. When done viewing the report, select File Close to close Internet Explorer.
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-29

12. Select File Save to save diagram and job metadata to this point.
13. Select File Close to close the Job Editor window. The Extract and Summary folder displays the
two jobs, and a target table:
7-30 Chapter 7 Working with Transformations

Report for Each Country

While the report on customer order information for United
States is exactly what that country manager requested,
other country managers are now requesting the same
report but for their respective country.

A separate job for each country could be created (all 42 of

them). However, if a change occurred in the requested
layout of the report, the management of all these jobs
becomes unwieldy.

A better solution would be to use an iterative process.

Loop and Loop End Transformation

The Loop and Loop End
transformations are used to
mark the beginning and end,
respectively, of an iterative
job that is processing.

15
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-31

Steps for Using Loop Transformations

You can fulfill this request in three steps:

Step 1

Create the control table.

Step 2

Create the parameterized job.

Step 3

Create the iterative job.

Control Table
The control table can be any table that contains rows of
data that can be fed into an iteration. The creation of this
table can be an independent job, or as part of the job flow
containing the Loop transformations.

In the previous example, we used that value of country in

three places:
Q The unformatted value was used to subset data in the
Extract transformation.
Q The formatted value was used as a title for the report.

Q The compressed formatted value was used to build

the name of the HTML file created.

17
7-32 Chapter 7 Working with Transformations

Step 1: Create the Control Table

This table will be populated with the distinct set of country
values. The SQL Join transformation can be used to
accomplish this task.

The code generated by the SQL Join will resemble the

following:
proc sql;
create table difttgt.distinctcountries as
select distinct put(customer_country,$country.)
as countryname,
compress(put(customer_country,$country.))
as ccountryname,
put(customer_country,$2.) as countryvalue
from difttgt.customerorderinfo;
quit;
18

Step 2: Create the Parameterized Job

The parameterized job will be used to extract data for
each supplier country.

Distinct country values

will be passed into this
job using the control
table and macro
variables defined for
the source table of this
job.

19
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-33

Step 3: Create the Iterative Job

The iterative job will use the control table as input to the
Loop/Loop End transformations where the Loop/Loop End
transformations execute the parameterized job in
succession for each country value.

20
7-34 Chapter 7 Working with Transformations

Using the Loop Transformations

This demonstration uses the loop transformations to iterate through the distinct customer country values
and create a separate summary report for each of the countries. Three basic steps will be accomplished:
Step 1: Create the control table.
Step 2: Create the parameterized job.
Step 3: Create the iterative job.

Step 1: Create the Control Table

1. Define the table metadata object for the control table.

a. Click the Folders tab.
b. Expand Data Mart Development Orion Reports Loop Transforms.
c. Verify that the Loop Transforms folder is selected.
d. Select File New Table.
e. Type DIFT Control Table - Countries as the value for the Name field.

f. Verify that the Location is set to

/Data Mart Development/ Orion Reports/Loop Transforms.

g. Click .

h. Verify that the DBMS field is set to SAS.

i. Select DIFT Orion Target Tables Library as the value for the Library field.

j. Type DistinctCountries as the value for the Name field.

k. Click .

l. No column metadata will be selected from existing metadata objects. Click .

7.2 Using Extract, Summary Statistics, and Loop Transformations 7-35

m. Define three new columns.

1) Click .

2) Type CountryName as the Name of the new column.

3) Type Country Name as the Description of the new column.

4) Type 20 as the Length of the new column.

5) Verify that the Type is set to Character.

6) Click .

7) Type CCountryName as the Name of the new column.

8) Type Compressed Country Name as the Description of the new column.

9) Type 20 as the Length of the new column.

10) Verify that the Type is set to Character.

11) Click .

12) Type CountryValue as the Name of the new column.

13) Type 2-Character Country Value as the Description of the new column.

14) Type 2 as the Length of the new column.

15) Verify that the Type is set to Character.

n. Click .

o. Review the metadata listed in the finish window and click .

7-36 Chapter 7 Working with Transformations

2. Define the job metadata object to load the control table.

a. Click the Folders tab.
b. Expand Data Mart Development Orion Reports Loop Transforms.
c. Verify that the Loop Transforms folder is selected.
d. Select File New Job. The New Job window opens.
e. Type DIFT Populate Control Table of Countries as the value for the Name field.

f. Verify that the Location is set to

/Data Mart Development/ Orion Reports/Loop Transforms.

g. Click . The Job Editor window is displayed.

3. Add the source table to the process flow.

a. If necessary, click the Folders tab.
b. If necessary, expand Data Mart Development Orion Reports Extract and Summary.
c. Drag the DIFT Customer Order Information table object to the Diagram tab of the Job Editor.
4. Add the SQL Join transformation to the process flow.
a. Click the Transformation tab.
b. Expand the Data folder and locate the SQL Join transformation template.
c. Drag the SQL Join transformation to the Diagram tab of the Job Editor. Place the
transformation next to the table object.

d. Connect the DIFT Customer Order Information table object to the SQL Join transformation.

By default, the SQL Join expects at least two input tables. However, for this instance, we need
just one input.
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-37

e. Click the status indicator on the SQL Join transformation to discover a source table is missing.

f. Click to close the message popup.

g. Right-click on the SQL Join transformation and select Ports Delete Input Port. The status
indicator now shows no errors.

5. Add the target table to the process flow.

a. Right-click on the green temporary table object associated with the SQL Join transformation and
select Replace.
b. Click the Checkouts tab.
c. Select DIFT Distinct Set of Countries.

d. Click . The process flow updates to the following:

Again, the status indicator for the SQL Join shows that there is a problem.
e. Click the status indicator on the SQL Join transformation to discover that mappings are needed.

f. Click to close the message popup.

6. Select File Save to save diagram and job metadata to this point.
7-38 Chapter 7 Working with Transformations

7. Specify the properties for the SQL Join node.

a. Right-click on the SQL Join transformation and select Open.
b. Click the Select keyword in the Navigate pane.
c. In the Select Properties pane, change the Distinct option from No to Yes.

d. On the Select tab, specify the following Expression information for the three target columns.

Column Name Expression

CountryName put(customer_country,$country.)

CCountryName compress(put(customer_country,$country.))

CountryValue put(customer_country,$2.)
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-39

e. Map the Customer_Country column to each of the three calculated columns.

f. Click to return to the Job Editor. Note that the status indicator associated with the SQL
Join transformation now shows no errors.

8. Select File Save to save diagram and job metadata to this point.
9. Run the job to generate the control table.
a. Right-click in background of the job and select Run.
b. Verify that the job runs successfully.
c. Click the Log tab and verify that DIFTTGT.DISTINCTCOUNTRIES is created with 45
observation and 3 variables.
7-40 Chapter 7 Working with Transformations

d. Click the Diagram tab.

e. Right-click DIFT Control Table - Countries and select Open.

f. Review the data and then choose File Close.

g. Select File Close to close the job editor.
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-41

Step 2: Create the Parameterized Job

The parameterized job will mimic the job that uses the Extract and Summary Statistics transformations.
1. Copy the job that extracts data and creates a summary report.

a. Click the Folders tab.

b. Expand Data Mart Development Orion Reports Extract and Summary.

c. Right-click DIFT Create Report for US Customer Order Information and select Copy.

d. Right-click the Loop Transforms folder and select Paste.

e. Right-click DIFT Create Report for US Customer Order Information (the copied job located
in the Loop Transforms folder) and select Properties.

f. Type DIFT Parameterized Job for Country Reports as the value for the Name
field.

g. Verify that the Location is set to

/Data Mart Development/ Orion Reports/Loop Transforms.

h. Click to close the Properties window.

2. Double-click the job DIFT Parameterized Job for Country Reports and it opens in the Job Editor
window.
3. Edit the Extract transformation.

a. Right-click on the Extract transformation and select Properties.

b. Click the Where tab.

c. Type &CtryValue in place of US (in the Expression Text area) be sure double quotes
are being used.

d. Click to close the Extract Properties window.

7-42 Chapter 7 Working with Transformations

4. Edit the Summary Statistics transformation.

a. Right-click on the Summary Statistics transformation and select Properties.

b. Click the Options tab.

c. Click Titles and footnotes in the selection pane.

d. In Heading 2 area, type (&Country Customers).

e. Click ODS options in the selection pane.

f. In Location area, update the filename to &CCountry.CustomerInfo.html.

Be sure to type in the period that separates the parameter name from the rest of the text.

g. Click to close the Summary Statistics Properties window.

5. Select File Save to save diagram and job metadata to this point.
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-43

6. Define job parameters.

a. Right-click in the background of the job and select Properties.

b. Click the Parameters tab.

c. Click .

1) Type Country as the value for the Name field.

2) Type Country Name as the value for the Displayed text field.
7-44 Chapter 7 Working with Transformations

3) Click the Prompt Type and Values tab.

4) Type United States as the value for the Default value field.

5) Click to close the New Prompt window.

d. Click .

1) Type CCountry as the value for the Name field.

2) Type Compressed Country Name as the value for the Displayed text field.

3) Select Prompt Type and Values tab.

4) Type UnitedStates as the value for the Default value field.

5) Click to close the New Prompt window.

7.2 Using Extract, Summary Statistics, and Loop Transformations 7-45

e. Click .

1) Type CtryValue as the value for the Name field.

2) Type Country Value as the value for the Displayed text field.

3) Select Prompt Type and Values tab.

4) Type US as the value for the Default value field.

5) Click to close the New Prompt window.

The final set of three parameters is listed on the Parameters tab.
7-46 Chapter 7 Working with Transformations

f. Click to validate the prompts and default values.

g. Click to close the Test the Prompts window.

h. Click to close the DIFT Parameterized Job for Country Reports Properties window.

7. Select File Save to save diagram and job metadata to this point.
The icon for the job object in the Loop Transforms folder is now decorated with an ampersand to
denote that the job is parameterized.
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-47

8. Run the job.

Parameterized jobs can be tested *if* all parameters are supplied default values.

a. Right-click in the background of the job and select Run.

b. Click the Status tab in the Details area. Note that all processes completed successfully.

c. Click to close the Details view.

d. View the Log for the executed Job.

e. Scroll towards the top of the Log and note that the parameters are all defined default values.

f. Scroll towards end of the Summary Statistics code area and verify that the correct HTML file
name is being generated, as well as the correct title2 text.

g. Additionally, verify that the data were subsetted correctly.

9. Select File Close to close the Job Editor window.

7-48 Chapter 7 Working with Transformations

10. View the HTML document.

a. Open a Windows Explorer by right-clicking on Start and select Explore.

b. If necessary, collapse the C: drive listing.

c. Expand to S:\Workshop\dift\reports.

d. Verify that UnitedStatesCustomerInfo.html exists (you can also check the date-time stamp to
verify that this HTML file was created with this job).
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-49

Step 3: Create the Iterative Job

1. Create the initial job metadata.

a. Click the Folders tab.
b. Expand Data Mart Development Orion Reports Loop Transforms.
c. Verify that the Loop Transforms folder is selected.
d. Select File New Job. The New Job window opens.
e. Type DIFT Loop Job for Country Reports as the value for the Name field.

f. Verify that the Location is set to

/Data Mart Development/ Orion Reports/Loop Transforms.

g. Click . The Job Editor window is displayed.

2. Add control table metadata to the diagram for the process flow.
a. Click the Folders tab.
b. If necessary, expand Data Mart Development Orion Reports Loop Transforms.
c. Drag the DIFT Control Table - Countries table object to the Diagram tab of the Job Editor.
3. Add the Loop transformation to the process flow.
a. Click the Transformations tab.
b. Expand the Control folder and locate the Loop transformation template.
c. Drag the Loop transformation to the Diagram tab of the Job Editor.
d. Connect the DIFT Control Table - Countries table object as input to the Loop transformation.

4. Add the parameterized job to the process flow.

a. Click the Folders tab.
b. If necessary, expand Data Mart Development Orion Reports Loop Transforms.
c. Drag the DIFT Parameterized Job for Country Reports job to the Diagram tab of the Job
Editor.
7-50 Chapter 7 Working with Transformations

5. Add the Loop End transformation to the process flow.

a. Click the Transformations tab.
b. If necessary, expand the Control folder and locate the Loop End transformation template.
c. Drag the Loop End transformation to the Diagram tab of the Job Editor.
The process flow diagram should resemble the following:
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-51

6. Edit the properties of the Loop transformation.

a. Right-click on the Loop transformation in the process flow diagram and select Properties.
b. Click the Parameter Mapping tab.

c. For the Country Name parameter, select CountryName as the value for the Mapped Source
Column.

d. For the Compressed Country Name parameter, select CCountryName as the value for the
Mapped Source Column.

e. For the Country Value parameter, select CountryValue as the value for the Mapped Source
Column.

f. Click to close the Loop Properties window.

7-52 Chapter 7 Working with Transformations

7. Select File Save to save diagram and job metadata to this point.
8. Run the job.
a. Right-click in background of the job and select Run.
b. Click the Status tab in the Details area. Note that all processes completed successfully.

c. Click to close the Details view.

d. View the Log for the executed Job.

As the inner job (the parameterized job) executes, an observation is pulled from the control
table and its values assigned to the parameters defined for the job. Scrolling in the Log reveals the
various assignments. A few are listed here:

and so on.
For each of these parameter sets, the inner job is executed and this execution results in an
HTML file.
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-53

9. Verify that the html files were created.

a. Access the Window Explorer.
b. Navigate to S:\Workshop\dift\reports.

c. View any of the reports.

d. Select File Close to close any report viewed.
e. Select File Close to close the Windows Explorer.
10. Select File Close to close the DIFT Loop Job for Country Reports.
The final set of metadata objects in the Loop Transforms folder should resemble the following:
7-54 Chapter 7 Working with Transformations

Exercises

The Marketing Department has been asked to examine buying habits of various age groups across the
genders. The same kind of marketing analysis will be applied to each distinct gender/age group
combination. To make this task easier, a request has been made to create a separate SAS table for each of
the distinct gender/age group combinations. You first use the Extract transformation to create one of the
needed tables. This job can then be parameterized and used with the Loop transformations to create the
series of desired tables.
1. Using the Extract Transformation to Create Table for Female Customers Aged 15-30 Years
Create a job that uses the Customer Dimension table to load a new table to contain just the female
customers aged 15-30 years.
Place the job in the Data Mart Development Orion Reports Extract and Summary folder.
Name the job DIFT Populate Female15To30Years Table.
Use the Customer Dimension table as the source table for the job (the metadata for this table can be
found in Data Mart Development Orion Target Data).
Add the Extract transformation to the job and build the following WHERE clause:
Customer_Gender = F and Customer_Age_Group = 15-30 years
Register the output table from the Extract transformation with the following attributes:

Tab Attribute Value

General Name: DIFT Customers Females 15-30

Years

Location: /Data Mart Development/Orion Reports/

Extract and Summary

Physical Storage Library: DIFT Orion Target Tables Library

Name: Female15To30Years
Run the job and verify that the new table has 12,465 observations and 11 variables.
The final job flow should resemble the following:
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-55

2. Using the Loop Transformations to Create Gender-Age Group Customer Tables

Create metadata for the control table.
Place the table in the Data Mart Development Orion Reports Loop Transforms folder.
Name the table object DIFT Control Table - Gender Age Groups.
Name the physical table DistinctGenderAgeGroups, and have the table created in the
DIFT Orion Target Tables Library.
The control table needs to have three columns with the following attributes:

Name Length Type To be used for:

GenVal 1 Character WHERE clause processing
AgeGroup 12 Character WHERE clause processing
GdrAgeGrp 20 Character Name of new (subsetted) table

Create a separate job to create the control table.

Place the job in the Data Mart Development Orion Reports Loop Transforms folder.
Name the job DIFT Populate Control Table of Gender Age Groups.

Use the SQL Join transformation to populate the control table. The control table needs to
contain the distinct combinations of all gender and age group values from the DIFT Customer
Dimension.
Use the following table to help define the calculations for the columns:

Column Expression
GenVal put(customer_gender,$1.)

AgeGroup <direct mapping of customer_age_group column>

GdrAgeGrp compress(put(customer_gender,$gender.)||
tranwrd(customer_age_group,"-","To"))

Run the job the control table should have 8 observations and 3 columns. Verify the data look
appropriate.
Create a table template to be used in the parameterized job.
Place the table in the Data Mart Development Orion Reports Loop Transforms folder.
Name the table object DIFT Table Template for Gender Age Group Table.
Name the physical table &GdrAgeGrp._Customers and have the table created in the DIFT
Orion Target Tables Library as a SAS table.
The table template needs to have the same column specifications as the DIFT Customer
Dimension table.
7-56 Chapter 7 Working with Transformations

Create the parameterized job.

Use the Extract transformation to subset data from the DIFT Customer Dimension table using a
compound WHERE clause (subset for gender and age group).
The target table object for this parameterized job will be the table template created in this
exercise.
Create three parameters for the job: one for the gender value (used with WHERE processing),
one for the age group value (also used with WHERE processing), and one for GdrAgeGrp
(used to define the name of the table). Be sure to set up valid default values for the parameters
(so this job can be tested outside of the Loop job).
Execute the job. Verify the creation of the new table.
Create the iterative job.
Use the control table, the Loop transformation, the parameterized job, and the Loop End
transformation to construct this iterative job.
Be sure to map the control table columns to the parameters from the parameterized job.
Run the job and check the Log to verify that eight tables are created.
The 8 tables should contain the following:

Number of Number of
Table Name Observations Columns
Female15To30Years_Customers 12465 obs 11 cols
Female31To45Years_Customers 9263 obs 11 cols
Female46To60Years_Customers 9295 obs 11 cols
Female61To75Years_Customers 9266 obs 11 cols
Male15To30Years_Customers 15261 obs 11 cols
Male31To45Years_Customers 11434 obs 11 cols
Male46To60Years_Customers 11502 obs 11 cols
Male61To75Years_Customers 11468 obs 11 cols
7.3 Establishing Status Handling 7-57

7.3 Establishing Status Handling

Objectives
Discuss return codes and how to capture a return
code in a SAS Data Integration Studio job.
Investigate where status handling is available.

Return Codes
When a job is executed in SAS Data Integration Studio,
a return code for each transformation in the job is
captured in a macro variable. The return code for the job
is set according to the least successful transformation in
the job.

26
7-58 Chapter 7 Working with Transformations

Example Actions
The return code can be associated with an action that
performs one or more of these tasks:
terminate the job or transformation

call a user-defined SAS macro

send a status message to a person, a file, or an event

broker
capture job statistics

Conditions and Actions

SAS Data Integration Studio enables you to associate a
return code condition, such as successful, with an action,
such as Send Email or Send Job Status. In this way, you
can specify how a return code is handled for the job or
transformation.
Example Example
Conditions Actions
Successful Send Email
Errors Send Entry to Text File
Warnings Send Entry to a Data Set
Send Job Status
Save Exception Report

Example: If job is SUCCESSFUL, then SEND EMAIL.

28
7.3 Establishing Status Handling 7-59

Status Handling Tab

A Status Handling tab is included in the Properties
windows for jobs and for some transformations. Users can
select from a list of code conditions and actions on the
Status Handling tab.
However, the Properties windows for most transformations
do not have a Status Handling tab. To return the status of
a transformation that does not have a Status Handling tab,
you can use a Return Code Check transformation to
insert status-handling logic at a desired point in the
process flow diagram for a job.

Status Handling Enabled

If Status Handling has been defined for a transformation
or the job, a macro definition (and call to it) will be added
to the generated code to check for the specified
condition(s).
For example, if Status Handling has been defined for:
the SQL Join transformation, then a definition and call
for %etls_sqlrccheck will be generated in the code
the Table Loader transformation, then a definition and
call for %etls_loadercheck will be generated in the
code
the job, then a definition and call for %etls_jobrccheck
will be generated in the code

30
7-60 Chapter 7 Working with Transformations

Status Handling Code Conditions

If Status Handling has been defined for a transformation
or the job, the different conditions specified will generate
different code segments for the status handling macros.
For example, when checking for a condition of:
SUCCESSFUL, the code will contain:
%if (&trans_rc eq 0) %then %do;

ERRORS, the code will contain:

%if (&trans_rc ge 5) %then %do;

WARNINGS, the code will contain:

%if (&trans_rc eq 4) %then %do;

Return Code Check Transformation

The Return Code Check
transformation provides status-
handling logic at a desired point in
the process flow diagram for a job. It
can be inserted between existing
transformations and removed later
without affecting the mappings in the
original process flow.

32
7.3 Establishing Status Handling 7-61

Working with Transformation and Job Status Handling

This demonstration illustrates establishing Status Handling for an SQL Join transformation and for a job.
Also illustrated is the use of the Return Code Check transformation.
1. If necessary, access SAS Data Integration Studio using Brunos credentials.
a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
b. Verify that the connection profile is My Server.

c. Click to close the Connection Profile window and access the Log On window.

d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.

e. Click to close the Log On window.

2. Locate and copy the Populate Customer Dimension Table job.

a. Click the Folders tab.
b. Navigate to Data Mart Development Orion Jobs folder.
c. Right-click DIFT Populate Customer Dimension Table and select Copy.
d. Navigate to Data Mart Development Orion Reports Status Handling folder.
e. Right-click Status Handling folder and select Paste.
3. Right-click DIFT Populate Customer Dimension Table and select Open.
4. Verify the job runs successfully by right-clicking in the background and selecting Run. There should
be 11 variables and 89954 observations.

5. Establish two successful status handling conditions for the SQL Join transformation.
a. Click the Diagram tab.
b. Right-click on the SQL Join transformation and select Properties.
c. Click the Status Handling tab.
7-62 Chapter 7 Working with Transformations

d. Click to add a new condition. By default, a Successful Condition is added

with an Action of None.

There are two other conditions that can be tested for: Warnings and Errors.

e. Click in the Action area, next to None.

f. Select Send Entry to File.

The Action Options window is displayed.

7.3 Establishing Status Handling 7-63

The conditions of Warnings and Errors produce the same list of actions. The Errors condition has
one additional option associated it Abort.
g. Specify S:\Workshop\dift\reports\SHforSQLJoin.txt for the File Name field.

h. Specify Successful running of SQL Join in Job &jobid for the Message
field.

i. Click to close the Action Options window.

j. Click to add a new condition. A second Successful Condition is added with

an Action of None.

k. Click in the Action area, next to None.

l. Select Send Entry to a Dataset. The Action Options window opens.

m. Specify difttgt for the Libref field.

n. Specify SHforSQLJoin for the Dataset field.

o. Specify Successful run for SQL Join in Job &jobid as the Message field.

p. Click to close the Action Options window.

7-64 Chapter 7 Working with Transformations

The two successful conditions appear on the Status Handling tab.

q. Click to close the SQL Join Properties window.

6. Select File Save to save diagram and job metadata to this point.
7. Re-run the job.
a. Right-click in background of the job and select Run.
b. Click the Status tab in the Details area. Note that all processes completed successfully.

c. Click to close the Details view.

d. View the Log for the executed Job.

7.3 Establishing Status Handling 7-65

The notes pertaining to the text file for a successful condition should resemble:
7-66 Chapter 7 Working with Transformations

The notes pertaining to the data set for a successful condition should resemble the following:

8. Run the job two more times.

a. Right-click in background of the job and select Run.
b. Right-click in background of the job and select Run.
9. View the information in the text file.
a. Open Windows Explorer by selecting Start All Programs Accessories
Windows Explorer.
b. Navigate to S:\Workshop\dift\reports.
7.3 Establishing Status Handling 7-67

c. Double-click SHforSQLJoin.txt.

The information in the text file is overwritten on each subsequent run.

d. Select File Exit to close the Notepad window.
10. View the SAS data set created if the job was successful.
a. In the Windows Explorer, navigate to S:\Workshop\dift\datamart.
b. Double-click SHforSQLJoin.sas7bdat.
The SAS data set opens in SAS Enterprise Guide.

The SAS data set received a new observation each time the job was run.
c. Select File Exit to close SAS Enterprise Guide and do not save any changes.
7-68 Chapter 7 Working with Transformations

11. Establish two successful status handling conditions for the job.
a. Right-click in the background of the job and select Properties.
b. On the General tab, change the name to DIFT Pop Cust Dim Table (SH).

c. Click the Status Handling tab.

d. Click to add a new condition.

e. Click in the Condition area, next to Successful, and select Send Job Status.

f. Select Send Job Status in the Action area. The Action Options window opens.

g. Specify difttgt for the Libref field.

h. Specify SHforJob for the Dataset field.

i. Click to close the Action Options window.

7.3 Establishing Status Handling 7-69

The new condition appears on the Status Handling tab.

j. Click to close the DIFT Pop Cust Dim Table (SH) Properties window.

12. Select File Save to save diagram and job metadata to this point.
13. Re-run the job.
a. Right-click in background of the job and select Run.
b. Click the Status tab in the Details area. Note that all processes completed successfully.

c. Click to close the Details view.

d. View the Log for the executed Job. The new job status data set is created with one observation.

14. Run the job two more times.

a. Right-click in the background of the job and select Run.
b. Right-click in the background of the job and select Run.
15. Click the Log tab and review the results. The job status data set now has three observations.
7-70 Chapter 7 Working with Transformations

16. View the SAS data set.

a. In the Windows Explorer, navigate to S:\Workshop\dift\datamart.
b. Double-click SHforJob.sas7bdat.
The SAS data set opens in SAS Enterprise Guide.

The SAS data set received a new observation each time the job was run.
The SAS data set can be used to gather some total time processing statistics for this job.
c. Select File Exit to close SAS Enterprise Guide and do not save any changes.
17. Close the DIFT Pop Cust Dim Table (SH) job and save any changes.
7.3 Establishing Status Handling 7-71

Using the Return Code Check Transformation

This demonstration shows the use of the Return Code Check transformation for transformations such as
the Extract transformation that do not have a Status Handling tab.
1. Locate and open the Report on US Customer Order Information job.
a. Click the Folders tab.
b. Expand Data Mart Development Orion Reports Extract and Summary.
c. Right-click DIFT Create Report for US Customer Order Information and select Copy.
d. Expand Data Mart Development Orion Reports Status Handling.
e. Right-click the Status Handling folder and select Paste.
f. Right-click the pasted job, DIFT Create Report on US Customer Order Information and
select Properties.
g. On the General tab, change the name of the job to DIFT Create Report on US Cust
Info (SH).

h. Click to close the properties window.

i. Right-click DIFT Create Report for US Cust Order Info (SH) and select Open.
7-72 Chapter 7 Working with Transformations

2. Right-click on the Extract transformation and select Properties. Note that the properties for this
transformation do not have a Status Handling tab.

The Return Code Check transformation can be used to take advantage of the status handling features
for those transformations that have no Status Handling tab. The Return Code Check transformation
captures the status of the previous transformation in the process flow, in this case, the Extract
transformation.

3. Click to close the SAS Extract Properties window.

4. Add the Return Code Check transformation to the process flow diagram.
a. Click the Transformations tab.
b. Expand the Control group.
c. Locate the Return Code Check transformation.
d. Drag the Return Code Check transformation in to the process flow diagram.
7.3 Establishing Status Handling 7-73

e. Click the Control Flow tab in the Details area.

f. Select the Return Code Check transformation and then click .

The processing order is rearranged.

7-74 Chapter 7 Working with Transformations

5. Update the properties of the Return Code Check transformation.

a. Right-click on the Return Code Check transformation and select Properties.
b. Click the Status Handling tab.

c. Click to add a new condition. By default, a Successful Condition is added

with an Action of None.

d. Click in the Action area, next to None.

e. Select Send Entry to File. The Action Options window opens.

f. Specify S:\Workshop\dift\reports\rccheck.txt for the File Name field.

g. Specify Successful running of Extract transformation in Job &jobid!

for the Message field.

h. Click to close the Action Options window. The Return Code Check Properties window
shows this one condition.

i. Click to close the Return Code Check transformation properties window.

7.3 Establishing Status Handling 7-75

6. Select File Save to save changes to the job.

7. Right-click in the background of the job and select Run.
8. Click the Log tab and review the results.
The notes pertaining to the text file for a successful condition should resemble the following:
7-76 Chapter 7 Working with Transformations

9. View the information in the text file.

a. Open Windows Explorer by selecting Start All Programs Accessories
Windows Explorer.
b. Navigate to S:\Workshop\dift\reports.
c. Double-click rccheck.txt.

d. Select File Exit to close the Notepad window.

10. Save and close this job.
a. Select the job editor window.
b. Click the Diagram tab.
c. Select File Save.
d. Select File Close.
7.4 Using the Data Validation Transformation 7-77

7.4 Using the Data Validation Transformation

Objectives
Discuss and use the Data Validation transformation.

Data Validation Transformation

The Data Validation transformation is
used to improve the quality of
operational data before that data is
loaded into a data warehouse or
data mart. With this transformation,
error conditions and actions can be
specified.

37
7-78 Chapter 7 Working with Transformations

Using the Error Table

The error table contains the same columns as the source
table, with the addition of a date/time column that identifies
when the job was run.
If the error table does not exist, it is created.

The date/time column, named ETL_Error_JobRunTime,

enables the same error table to be used each time that
the job is run.
New rows, with a new date/time value, are appended
to existing rows.
The libref and filename of the error table are specified
on the Options tab of the Data Validation transformation.
The error table differs from the exception report.

The exception report is generated and distributed

as specified on the Status Handling tab.
38
7.4 Using the Data Validation Transformation 7-79

Using the Data Validation Transformation

The following tasks are completed in this demonstration:

Define the necessary target tables, one for valid items and one for error items.
Create a job that applies the Data Validation transformation to the source table to create
the target table(s).
1. If necessary, access SAS Data Integration Studio using Brunos credentials.

a. Select Start All Programs SAS SAS Data Integration Studio 4.2.

b. Verify that the connection profile is My Server.

c. Click to close the Connection Profile window and access the Log On window.

d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.

e. Click to close the Log On window.

Part 1: Defining the Metadata for the Target Tables

2. Specify metadata for a valid products table.

a. Click the Folders tab.

b. Expand Data Mart Development Orion Reports Data Validation.

c. Verify that the Data Validation folder is selected.

d. Select File New Table.

e. Type DIFT Valid Products as the value for the Name field.

f. Verify that the Location is set to

/Data Mart Development/ Orion Reports/Data Validation.

g. Click .

h. Verify that the DBMS field is set to SAS.

i. Select DIFT Orion Target Tables Library as the value for the Library field.

j. Type Valid_Products as the value for the Name field.

k. Click .

l. Expand the Data Mart Development Orion Source Data folder on the Folders tab.
7-80 Chapter 7 Working with Transformations

m. From the Orion Source Data folder, select DIFT NEWORDERTRANS table object.

n. Click to move all columns to the Selected pane.

o. Click .

p. Accept the default column attributes and click .

q. Review the metadata listed in the finish window and click .

3. Specify metadata for an invalid products table.

a. Click the Folders tab.

b. Expand Data Mart Development Orion Reports Data Validation.

c. Verify that the Data Validation folder is selected.

d. Select File New Table.

e. Type DIFT Invalid Products as the value for the Name field.

f. Verify that the Location is set to

/Data Mart Development/ Orion Reports/Data Validation.

g. Click .

h. Verify that the DBMS field is set to SAS.

i. Select DIFT Orion Target Tables Library as the value for the Library field.

j. Type Invalid_Products as the value for the Name field.

k. Click .

l. Expand the Data Mart Development Orion Source Data folder on the Folders tab.

m. From the Orion Source Data folder, select DIFT NEWORDERTRANS table object.

n. Click to move all columns to the Selected pane.

o. Click .

p. Accept the default column attributes and click .

q. Review the metadata listed in the finish window and click .

The metadata table objects appear in the Data Validation folder:

7.4 Using the Data Validation Transformation 7-81

Part 2: Applying the Data Validation Transformation

4. Create the initial job metadata.

a. Click the Folders tab.

b. Expand Data Mart Development Orion Reports Data Validation.

c. Verify that the Data Validation folder is selected.

d. Select File New Job. The New Job window opens.

e. Type DIFT Populate Valid and Invalid Product Tables as the value for the
Name field.

f. Verify that the Location is set to

/Data Mart Development/ Orion Reports/Data Validation.

g. Click . The Job Editor window opens.

5. Add source table metadata to the diagram for the process flow.

a. Click the Folders tab.

b. Expand Data Mart Development Orion Source Data.

c. Drag the DIFT NEWORDERTRANS table to the Diagram tab of the Job Editor.
6. Add the Data Validation transformation to the process flow.

a. Click the Transformation tab.

b. Expand the Data folder and locate the Data Validation transformation template.

c. Drag the Data Validation transformation to the Diagram tab of the Job Editor.

d. Connect the DIFT NEWORDERTRANS table object to the Data Validation transformation.

e. Right-click DIFT NEWORDERTRANS and select Properties.

f. On the General tab, remove the description.

g. Click .
7-82 Chapter 7 Working with Transformations

7. Add a target table to the process flow.

a. Right-click on the green temporary table object associated with the Data Validation
transformation and select Replace.

b. Expand Data Mart Development Orion Reports Data Validation.

c. Select DIFT Valid Products.

d. Click .

8. Specify properties for the Data Validation transformation.

a. Right-click on the Data Validation transformation and select Properties.

b. Click the Options tab.

c. Locate the Enter an error table name (libref.tablename) option.

d. Type difttgt.invalid_products as the value for this option.

7.4 Using the Data Validation Transformation 7-83

e. Click the Status Handling tab.

1) Click to add a new condition.

2) Verify Data Exception is the Condition.

3) Select Save Report as the Action.

4) In the Action Options window, type

S:\Workshop\dift\reports\ValidInvalidProdExceptionReport.txt
as the value for File Name.

5) Click to close the Action Options window.

6) Click to add a new condition.

7) Verify Data Exception is the Condition.

8) Select Send Entry to File as the Action.

9) In the Action Options window, type

S:\Workshop\dift\reports\ValidInvalidProdEntryToFile.txt
as the value for File Name.

10) Click to close the Action Options window.

7-84 Chapter 7 Working with Transformations

11) Click to add a new condition.

12) Verify Data Exception is the Condition.

13) Select Send Entry to a Dataset as the Action.

14) In the Action Options window, type difttgt as the value for the Libref field.

15) Type ProdDataExcept as the value for the Dataset Name.

16) Click to close the Action Options window.

The Status Handling tab shows the following:

7.4 Using the Data Validation Transformation 7-85

f. Click the Invalid Values tab.

g. Click to add a new row.

h. Select Product_Name as the value for Column Name.

i. Click next to Lookup Table.

j. Expand the DIFT VALIDPRODUSAOUTDOOR table in the Data Mart Development

Orion Source Data folder and select Product_Name.

k. Click to move Product_Name to the Selected list.

l. Click to close the Lookup Table and Column window.

7-86 Chapter 7 Working with Transformations

The following is populated in the Invalid Values window:

m. Keep the default Action if invalid value, Move row to error table.

n. Click to close the Invalid Values window. The Invalid Values tab shows the following:
7.4 Using the Data Validation Transformation 7-87

o. Click the Missing Values tab.

p. Click to add a new row.

q. Select Product_ID as the value for Column Name.

r. Verify that Move row t o error table is set as the value for Action if missing.

s. Click to close the Missing Values window. The Missing Values tab shows the
following:

t. Click to close the Data Validation Properties window.

7-88 Chapter 7 Working with Transformations

9. Add a sticky note to the job. (A sticky note is a way to visual document within a job.)

a. Click on the Job Editor toolbar.

The sticky note is added to the upper-left corner of the diagram.

b. Drag the sticky note and place it under the Data Validation transformation.

c. Double-click the sticky note to expand to add some text.

d. Type The Invalid_Products table is populated through the execution

of the Data Validation transformation. as the text for the sticky note.

10. Select File Save to save diagram and job metadata to this point.
7.4 Using the Data Validation Transformation 7-89

11. Run the job.

a. Right-click in background of the job and select Run.

b. Click the Status tab in the Details area. Note that all processes completed successfully.

c. Click to close the Details view.

d. View the Log for the executed Job.

e. Scroll to view the note about the creation of the DIFTTGT.VALID_PRODUCTS table and the
creation of the DIFTTGT.INVALID_PRODUCTS table:
7-90 Chapter 7 Working with Transformations

12. View the DIFT Valid Products table.

a. Click the Diagram tab of the Job Editor window.

b. Right-click the DIFT Valid Products table object and select Open.

c. When you are finished viewing the DIFT Valid Products data set, close the View Data
window by selecting File Close.
13. View the DIFT Invalid Products table.

a. Click the Folders tab.

b. Expand Data Mart Development Orion Reports Data Validation.

c. Right-click the DIFT Invalid Products table object and select Open.

d. When you are finished viewing the Invalid Products data set, close the View Data window
by selecting File Close.
7.4 Using the Data Validation Transformation 7-91

14. View the exception report.

a. Open Windows Explorer and navigate to S:\Workshop\dift\reports.

b. Open the file ValidInvalidProdExceptionReport.txt to view the generated exception report.

c. When you are finished viewing the file, select File Exit to close it.
15. View ValidInvalidProdEntryToFile.txt.

a. Open Windows Explorer and navigate to S:\Workshop\dift\reports.

b. Open the file ValidInvalidProdEntryToFile.txt to view the generated exception report.

c. When you are finished viewing the file, select File Exit to close it.
16. View the data set difttgt.proddataexcept.

a. Copy the LIBNAME statement for the DIFT Orion Target Tables Library.
1) Click the Folders tab.
2) Expand Data Mart Development Orion Target Data.
3) Right-click on the DIFT Orion Target Tables Library object and select View Libname.
4) Right-click in the background of the Display Libname window and select Select All.
7-92 Chapter 7 Working with Transformations

5) Right-click in the background of the Display Libname window and select Copy.

6) Click to close the Display Libname window.

b. Select Tools Code Editor.

1) Right-click in the background of the Code Editor window and select Paste.
2) Enter the following PROC PRINT steps:
options ls=72 nodate nonumber;
proc print data=difttgt.DataExcept;
run;

3) Click to run the code.

4) Click the Output tab to view the information in the data set.

5) Select File Close to close the Code Editor.

6) Click to not save changes.

17. Select File Save to save the job metadata.

18. Select File Close to close the job editor.
7.4 Using the Data Validation Transformation 7-93

Exercises

3. Using the Data Validation Transformation

Use the Data Validation transformation to load the Valid Customers table from the DIFT Customer
Dimension table. The process flow diagram should be as follows:

Create metadata for the target table named DIFT Valid Customers.
The target table should be physically stored in the DIFT Orion Target Tables Library with a
name of Valid_Customers.
The table object should contain the exact same columns as the DIFT Customer
Dimension table object found in the Data Mart Development Orion Target Data
folder.
The target table object should end up in the Data Mart Development Orion Reports
Data Validation folder
Create metadata for the target table named DIFT Invalid Customers.
The target table should be physically stored in the DIFT Orion Target Tables Library with a
name of Invalid_Customers.
The table object should contain the exact same columns as the DIFT Customer
Dimension table object found in the Data Mart Development Orion Target Data
folder.
The target table object should end up in the Data Mart Development Orion Reports
Data Validation folder
Create the job that will load Valid Customers from the DIFT Customer Dimension table using the
Data Validation transformation.
7-94 Chapter 7 Working with Transformations

For the Data Validation transformation properties:

Specify difttgt.Invalid_Customers as the error table.
Verify that the Customer_Type values are all valid by comparing to the DIFT Customer
Types tables Customer_Type column (the DIFT Customer Types table object can be
found in the Orion Source Tables folder). If an invalid value is found for Customer_Type,
change it to Unknown Customer Type and write the information to an error table.
If missing values are found for the column Customer_ID then abort the job.
If duplicate values are found for the columns Customer_Name and
Customer_Birth_Date then move the rows to an error table.
For status handling, establish an exception report
(S:\Workshop\dift\reports\ValidInvalidCustExceptReport.txt).

How many rows were moved to the error table because the value for Customer_Type was invalid?

Were missing values found for Customer ID?

Were there any duplicate values found for Customer_Name and Customer_Birth_Date?
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-95

7.5 Using Transpose, Sort, Append, and Rank

Transformations

Objectives
Discuss and use the Rank, Transpose, Append, List,
and Sort transformations.

Transpose Transformation
The Transpose transformation
creates an output data set by
restructuring the values in a SAS
data set, and transposing selected
variables in to observations.
The Transpose transformation is an
interface to the TRANSPOSE
procedure.

44
7-96 Chapter 7 Working with Transformations

Sort Transformation
The Sort transformation provides an
interface for the SORT procedure.
The transformation can be used to
read data from a source, sort it, and
write the sorted data to a target in a
SAS Data Integration Studio job.

Append Transformation
The Append transformation can be
used to create a single target by
appending or concatenating two or
more sources.

46
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-97

Rank Transformation
The Rank transformation uses the
RANK procedure to rank one or
more numeric variables in the source
and store the ranks in the target.

List Data Transformation

The List Data transformation is used
to create a report on specified
columns from a source table. The
report can be an HTML report.

48
7-98 Chapter 7 Working with Transformations

Completed Process Flow

The upcoming demonstration will take advantage of each
of the Rank, Transpose, Append, List, and Sort
transformations.

49
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-99

Using the Transpose, Sort, Append, and Rank

Transformations
A staff table (SAS data set) was defined with only partial staff information (DIFT STAFF_PARTIAL).
Additional staff was entered in an external file, but the structure of the external file does not match that of
the partial staff table. The following job reads the partial staff information from the SAS data set and
reads additional partial staff information from the external file. The external file result, read through a File
Reader transformation, then needs to be sorted and transposed so that it can be appended to the SAS data
set of partial staff information. The final data is sorted and ranked, with a report listing the top ten salaries
for staff. The final process flow should resemble the following:

1. If necessary, access SAS Data Integration Studio using Brunos credentials.

a. Select Start All Programs SAS SAS Data Integration Studio 4.2.

b. Verify that the connection profile is My Server.

c. Click to close the Connection Profile window and access the Log On window.

d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.

e. Click to close the Log On window.

7-100 Chapter 7 Working with Transformations

2. Verify that the DIFT STAFF_PARTIAL metadata table object exists and has data loaded.

a. Click the Folders tab.

b. Navigate to the Data Mart Development Orion Source Data folder.

c. Right-click DIFT STAFF_PARTIAL and select Open.

d. Select File Close to close the View Data window.

3. Define metadata for the external file containing additional staff.

a. Click the Folders tab.

b. Expand Data Mart Development Orion Reports Transpose and Rank.

c. Verify that the Transpose and Rank folder is selected.

d. Select File New External File Delimited. The New Delimited External File wizard
opens:
1) Type DIFT Additional Staff Information as the value for the Name field.

2) Verify that the Location is set to

/Data Mart Development/ Orion Reports/Transpose and Rank.

e. Click . The External File Location window is displayed.

7.5 Using Transpose, Sort, Append, and Rank Transformations 7-101

f. Click to open the Select a file window.

1) Navigate to S:\Workshop\dift\data.

2) Change the Type field to Delimited Files (*.csv).

3) Select AddlStaff.csv.

4) Click .

g. To view the contents of the external file, click .

Previewing the file shows the first record contains column names and that the values are comma-
delimited and not space delimited.

h. Click to close the Preview File window.

i. Click . The Delimiters and Parameters window is displayed.

1) Clear Blank.
2) Click Comma.

j. Click .

1) Click .The Auto Fill Columns window opens.

2) Type 2 (the number two) as the value for the Start record field.

3) Click to close the Auto Fill Columns window. The top portion of the Column
Definitions window populates with three columns, one of them numeric and two of them
character.
7-102 Chapter 7 Working with Transformations

4) Click . The Import Column Definitions window opens.

5) Select Get the column names from column headings in this file.
6) Verify that 1 is set as the value for The column headings are in the file
record field.

7) Click . The Name field populates with all the column names.

k. Change the length, informat, and format for the Job_Title column.

1) In the top portion of the Column Definitions window, locate the Job_Title column.

2) Update the length to 25.

3) Update the informat to $25..

4) Update the format to $25..

l. Accept the defaults for the rest of the column settings.

m. Click .

n. Click . The metadata object for the external file is found on the Checkouts tab.

4. Create a target table object that is a duplicate of metadata from DIFT STAFF_PARTIAL.

a. Click the Folders tab.

b. Navigate to Data Mart Development Orion Source Data.

c. Right-click DIFT STAFF_PARTIAL and select Copy to Folder.

d. Navigate to Data Mart Development Orion Reports Transpose and Rank.

e. Click . A Warning window is displayed.

f. Click . The copied object is now in the Transpose and Rank folder.
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-103

5. Update metadata for the copied DIFT STAFF_PARTIAL object.

a. Right-click the Copy of DIFT STAFF_PARTIAL table object and select Properties.

b. On the General tab, change the name of the metadata object to DIFT Full Staff.

c. Click the Physical Storage tab.

d. Click next to the Library field. The Select a library window is displayed.

1) Collapse the Orion Source Data folder.

2) Expand the Orion Target Data folder.
3) Click DIFT Orion Target Tables Library.

4) Click .

e. Type STAFF_FULL as the value for the Name field.

f. Click to close the Properties window.

7-104 Chapter 7 Working with Transformations

6. Create initial job metadata.

a. Click the Folders tab.

b. Expand Data Mart Development Orion Reports Transpose and Rank.

c. Verify that the Transpose and Rank folder is selected.

d. Select File New Job. The New Job window opens.

e. Type DIFT Populate Full Staff & Create Rank Report as the value for the
Name field.

f. Verify that the Location is set to

/Data Mart Development/ Orion Reports/Transpose and Rank.

g. Click . The Job Editor window opens.

7. Add source table metadata to the diagram for the process flow.

a. Click the Folders tab.

b. Expand Data Mart Development Orion Source Data.

c. Drag the DIFT STAFF_PARTIAL table object to the Diagram tab of the Job Editor.

d. Collapse Orion Source Data.

e. Expand Data Mart Development Orion Reports Transpose and Rank.

f. Drag the DIFT Additional Staff Information external file object to the Diagram tab of the Job
Editor.
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-105

8. Add the File Reader transformation to the process flow.

a. Click the Transformation tab.

b. Expand the Access folder and locate the File Reader transformation template.

c. Drag the File Reader transformation to the Diagram tab of the Job Editor.
9. Connect the DIFT Additional Staff Information external file object to the File Reader
transformation.
10. Add the Sort transformation to the process flow.

a. Click the Transformation tab.

b. Collapse the Access folder.

c. Expand the Data folder and locate the Sort transformation template.

d. Drag the Sort transformation to the Diagram tab of the Job Editor.
11. Connect the File Reader transformation to the Sort transformation.
The process flow diagram at this point should resemble the following:

12. Add the Transpose transformation to the process flow.

a. Click the Transformation tab.

b. Expand the Data folder and locate the Transpose transformation template.

c. Drag the Transpose transformation to the Diagram tab of the Job Editor.
13. Connect the Sort transformation to the Transpose transformation.
14. Add the Append transformation to the process flow.

a. Click the Transformation tab.

b. Expand the Data folder and locate the Append transformation template.

c. Drag the Append transformation to the Diagram tab of the Job Editor.
15. Connect the Transpose transformation to one of the ports for the Append transformation.
16. Connect DIFT STAFF_PARTIAL table object to the other default port for the Append
transformation.
7-106 Chapter 7 Working with Transformations

The process flow diagram should resemble the following:

17. Add target table to the diagram.

a. Right-click on the temporary output table for the Append and select Replace. The Table
Selector window is displayed.

b. Expand Data Mart Development Orion Reports Transpose and Rank.

c. Click DIFT Full Staff.

d. Click .

18. Select File Save to save diagram and job metadata to this point.
19. Add the Rank transformation to the process flow.

a. Click the Transformation tab.

b. Expand the Data folder and locate the Rank transformation template.

c. Drag the Rank transformation to the Diagram tab of the Job Editor.
20. Connect the DIFT Full Staff table object to the Rank transformation.
21. Select File Save to save diagram and job metadata to this point.
The process flow diagram should resemble the following:
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-107

22. Verify the properties for the File Reader transformation.

a. Right-click on the File Reader transformation and select Properties.

b. Click the Mappings tab.

c. Verify that all columns are mapped properly.

d. Click .

23. Change the name of work table that is output for the File Reader transformation.

a. Right-click on the green temporary table object associated with the File Reader transformation
and select Properties.

b. Click the Physical Storage tab.

c. Type FileReader as the value for the Name field.

d. Click .
7-108 Chapter 7 Working with Transformations

24. Specify properties for the first Sort transformation.

a. Right-click on the Sort transformation and select Properties.

b. Click the Sort By Columns tab.

c. Select Job_Title under Available columns area.

d. Click to move the Job_Title column to the Sort by columns area.

e. Click .

25. Change the name of work table that is output for the first Sort transformation.

a. Right-click on the green temporary table object associated with the Sort transformation and select
Properties.

b. Click the Physical Storage tab.

c. Type SortedExtFile as the value for the Name field.

d. Click .
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-109

26. Change the name of work table that is output for the Transpose transformation.

a. Right-click on the green temporary table object associated with the Transpose transformation and
select Properties.

b. On the General tab, remove the description.

c. Click the Physical Storage tab.

d. Type TransposedData as the value for the Name field.

e. Click .

27. Specify properties for the Transpose transformation.

a. Right-click on the Transpose transformation and select Properties.

b. On the General tab, remove the description.

c. Click the Mappings tab.

1) Remove all target table columns.

a) Right-click in the background of the Target table area and select Select All.

b) Click and then select Delete Target Columns.

7-110 Chapter 7 Working with Transformations

2) Import correct target columns.

a) Right-click in the background of the Target table area and select Import Columns.
b) Expand Data Mart Development Orion Reports Transpose and Rank.
c) Select DIFT Full Staff table object.

d) Select to move all columns from the DIFT Full Staff table to the Selected area.

e) Click .
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-111

3) Map Job_Title from Source table area to Job_Title in the Target table
area.

d. Click the Options tab.

1) Verify that Assign columns is selected in the selection pane.
7-112 Chapter 7 Working with Transformations

2) Establish ColValues for the Select columns to transpose area.

a) Click in the Select columns to transpose area to open the Select Data
Source Items window.

b) Select ColValues and then click .

c) Click to close the Select Data Source Items window. The Select
analysis columns area updates as displayed:
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-113

3) Establish ColNames for the Select a column for output column names area.

a) Click in the Select a column for output column names area

to open the Select a Data Source Item window.
b) Select ColNames.

c) Click to close the Select a Data Source Item window. The Select a
column for output column names area updates as displayed:
7-114 Chapter 7 Working with Transformations

4) Establish Job_Title for the Select columns whose values define groups
of records to transpose area.

a) Click in the Select columns whose values define groups of

records to transpose area to open the Select Data Source Items window.

b) Select Job_Title and then click .

c) Click to close the Select Data Source Items window. The Select
columns whose values define groups of records to transpose
area updates as displayed:

e. Click the Precode and Postcode tab.

1) Click Postcode.
2) Type the following SAS code:
data work.transposed (drop=gdrnum);
set &syslast(rename=(gender=gdrnum));
if gdrnum=0 then gender='M';
else if gdrnum=1 then gender='F';
run;

f. Click to close the Transpose Properties window.

7.5 Using Transpose, Sort, Append, and Rank Transformations 7-115

28. Specify properties for the Append transformation.

a. Right-click on the Append transformation and select Properties.

b. Click the Mappings tab.

c. Click to map all source columns to target columns.

d. Click .
7-116 Chapter 7 Working with Transformations

29. Change the name of work table that is output for the Rank transformation.

a. Right-click on the green temporary table object associated with the Rank transformation and
select Properties.

b. Click the Physical Storage tab.

c. Type RankedData as the value for the Name field.

d. Click .

30. Specify properties for the Rank transformation.

a. Right-click on the Rank transformation and select Properties.

b. Click the Mappings tab.

1) On the target table side, right-click and select New Column.
2) Type RankedSalary as the name of the new column.

3) Select Numeric as the Type of the new column.

4) Select the following columns (click on one, then hold down the CTRL key and click on each
subsequent column):
Start_Date
End_Date
Birth_Date
Emp_Hire_Date
Emp_Term_Date
Manager_ID

5) Click and then select Delete Target Columns.

6) Re-order the columns so that Salary and RankedSalary are the last two columns (click
on the row number and drag a column to desired ordering).

7) Verify that all columns are mapped properly (the RankedSalary column will not have a one-
to-one column mapping).
8) Right-click on the RankedSalary column and select Propagate From Targets To End.
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-117

c. Click the Rank Variable Columns tab.

1) Select Salary in the Available source columns area, and then click to move
Salary to the Selected source columns area.

2) Select RankSalary in the Available target columns area, and then click to
move RankSalary to the Selected target columns area.

d. Click to close the Rank Properties window.

31. Right-click in the background of the job and select Settings Automatically Propagate Columns
(effectively disabling the automatic propagation for this job, from this point of edit on).
32. Add the Sort transformation to the process flow.

a. Click the Transformation tab.

b. Expand the Data folder and locate the Sort transformation template.

c. Drag the Sort transformation to the Diagram tab of the Job Editor.
7-118 Chapter 7 Working with Transformations

33. Connect the Rank transformation to the Sort transformation.

34. Add the List Data transformation to the process flow.

a. Click the Transformation tab.

b. Expand the Output folder and locate the List Data transformation template.

c. Drag the List Data transformation to the Diagram tab of the Job Editor.
35. Connect the Sort transformation to the List Data transformation.
36. Select File Save to save diagram and job metadata to this point.
37. Change the name of work table that is output for the second Sort transformation.

a. Right-click on the green temporary table object associated with the second Sort transformation
and select Properties.

b. Click the Physical Storage tab.

c. Type SortedRanks as the value for the Name field.

d. Click .

38. Right-click on the Rank transformation and select

Propagate Columns From Selected Transformations Targets To End.
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-119

39. Specify properties for the second Sort transformation.

a. Right-click on the second Sort transformation and select Properties.

b. Click the Sort By Columns tab.

c. Select RankedSalary under Available columns area.

d. Click to move the RankSalary column to the Sort by columns area.

e. Change the Sort order to Descending.

f. Click .
7-120 Chapter 7 Working with Transformations

40. Specify properties for the List Data transformation.

a. Right-click on the List Data transformation and select Properties.

b. On the General tab, remove the description.

c. Click the Options tab.

1) Select Report formatting in the selection pane.

2) Select Use column labels as column headings (LABEL) and then click .

3) Select Other options in the selection pane.

4) Type nodate nonumber ls=80; as the value for the System options area.

5) Type NOOBS as the value for the Additional PROC PRINT options area.

6) Type format salary dollar12.; as the value for the Additional PROC PRINT
statements area.
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-121

7) Select the Columns to report on item in the selection pane.

8) Establish Job_Title, Gender, and Salary for the Select other columns to
print area.

a) Click in the Select other columns to print area to open the Select Data
Source Items window.
b) Select Job_Title, hold down the CTRL key and select Gender, hold down the CTRL key
and select Salary, and then click .

c) Click to close the Select Data Source Items window. The Select other
columns to print area updates as displayed:

9) Select the Titles and footnotes item in the selection pane.

10) Type Top Ten Paying Positions at Orion Star as the value for Heading 1.

d. Click the Precode and Postcode tab.

1) Click Precode.
2) Type options obs=10;.

3) Click Postcode.
4) Type options obs=MAX;.
7-122 Chapter 7 Working with Transformations

e. Click to close the List Data Properties window.

41. Select File Save to save diagram and job metadata to this point.
42. Run the job.

a. Right-click in background of the job and select Run.

b. Click the Status tab in the Details area. Note that all processes completed successfully.

c. Click to close the Details view.

d. View the Log for the executed Job.

e. Click the Output tab to view the report.

43. Select File Close to close the job editor.

7.6 Basic Standardization with the Apply Lookup Standardization Transformation 7-123

7.6 Basic Standardization with the Apply Lookup

Standardization Transformation

Objectives
Discuss and use the Apply Lookup Standardization
transformation.
Discuss and use the One-Way Frequency
transformation.

One-Way Frequency Transformation

The One-Way Frequency and
Frequency transformations can be
used to generate frequency
statistics. These transformations are
based on the FREQ procedure.

54
7-124 Chapter 7 Working with Transformations

Apply Lookup Standardization Transformation

The Apply Lookup Standardization
transformation is used to apply one
or more schemes to one or more
columns in a source table. Applying
schemes modifies the source data
according to rules that are defined in
the schemes.

55
7.6 Basic Standardization with the Apply Lookup Standardization Transformation 7-125

Using the Apply Lookup Standardization Transformation

A table of potential customers was defined in metadata (DIFT Contacts). This demonstration creates
a job that initially reports on this data by creating a one-way frequency reports for two of the columns.
The job is then updated by adding the Apply Lookup Standardization transformation to apply two
predefined standardization schemes. The final step is reporting on the newly transformed data. The final
process flow should resemble the following:

1. If necessary, access SAS Data Integration Studio using Brunos credentials.

a. Select Start All Programs SAS SAS Data Integration Studio 4.2.

b. Verify that the connection profile is My Server.

c. Click to close the Connection Profile window and access the Log On window.

d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.

e. Click to close the Log On window.

7-126 Chapter 7 Working with Transformations

2. Verify that the DIFT Contacts metadata table object exists and has data available.

a. Click the Folders tab.

b. Navigate to the Data Mart Development Orion Source Data folder.

c. Right-click on DIFT Contacts and select Open.

d. Select File Close to close the View Data window.

7.6 Basic Standardization with the Apply Lookup Standardization Transformation 7-127

3. Create initial job metadata.

a. Click the Folders tab.

b. Expand Data Mart Development Orion Reports Lookup Standardization.

c. Verify that the Lookup Standardization folder is selected.

d. Select File New Job. The New Job window opens.

e. Type DIFT Standardize and Report on Contacts Table as the value for the
Name field.

f. Verify that the Location is set to

/Data Mart Development/ Orion Reports/Lookup Standardization.

g. Click . The Job Editor window opens.

4. Add source table metadata to the diagram for the process flow.

a. Click the Folders tab.

b. Navigate to the Data Mart Development Orion Source Data folder.

c. Drag the DIFT Contacts table object to the Diagram tab of the Job Editor.
5. Add the One-Way Frequency transformation to the process flow.

a. Click the Transformation tab.

b. Expand the Analysis folder and locate the One-Way Frequency transformation template.

c. Drag the One-Way Frequency transformation to the Diagram tab of the Job Editor.
6. Connect the DIFT Contacts table object to the One-Way Frequency transformation.
7. Select File Save to save diagram and job metadata to this point.
7-128 Chapter 7 Working with Transformations

8. Specify properties for the One-Way Frequency transformation.

a. Right-click on the One-Way Frequency transformation and select Properties.

b. Click the Options tab.

1) Verify that Assign columns is selected in the selection pane.
2) Establish OS and Database in the Select columns to transpose area.

a) Click in the Select columns to perform a one-way frequency

distribution on area to open the Select Data Source Items window.

b) Select OS, hold down the CTRL key and select DATABASE, and then click .

c) Click to close the Select Data Source Items window. The Select
columns to perform a one-way frequency distribution on area
updates as displayed:

3) Select Specify other options in the selection pane.

7.6 Basic Standardization with the Apply Lookup Standardization Transformation 7-129

4) Type nocum nopercent as the value for Specify other options for TABLES
statement area.

5) Select Titles and footnotes in the selection pane.

6) Type OS and Database Frequency Counts as the value for the Heading 1.

c. Click to close the One-Way Frequency Properties window.

9. Select File Save to save diagram and job metadata to this point.
10. Run the job.

a. Right-click in background of the job and select Run.

b. Click the Status tab in the Details area. Note that all processes completed successfully.

c. Click to close the Details view.

d. View the Log for the executed Job.

e. Click the Output tab to view the report.

Partial Output OS frequency report:
7-130 Chapter 7 Working with Transformations

Partial Output DATABASE frequency report:

A quick glance verifies the initial suspicion that the OS and Database columns have not had
any standards imposed for data values.
11. Select File Close to close the job editor.
7.6 Basic Standardization with the Apply Lookup Standardization Transformation 7-131

Two schemes have been pre-built for these types of column data. The next steps will
establish the necessary options to access these schemes
add the Apply Lookup Standardization
re-run the One-Way Frequency task against the standardized table.
1. Select Tools Options.
2. Select Data Quality tab.

3. Verify that the following fields are set appropriately in the Data Quality area:

Default Locale: ENUSA

DQ Setup Location:
C:\Program Files\SAS\SASFoundation\9.2\dquality\sasmisc\dqsetup.txt

Scheme Repository Type: dfPower scheme (BFD)

Scheme Repository: C:\Program Files\DataFlux\QltyKB\CI\2008A\scheme

4. Click to close the Options window.

7-132 Chapter 7 Working with Transformations

5. Re-open the job editor.

a. Click the Folders tab.

b. Expand Data Mart Development Orion Reports Lookup Standardization.

c. Double-click on the job DIFT Standardize and Report on Contacts Table. The job editor will
be opened.
6. Break the connection between the DIFT Contacts table object and the One-Way Frequency
transformation.

a. Select the connection line between the table and the transformation.

b. Right-click on the selected connection and select Delete.

7. Add the Apply Lookup Standardization transformation to the process flow.

a. Click the Transformation tab.

b. Expand the Data Quality folder and locate the Apply Lookup Standardization transformation
template.

c. Drag the Apply Lookup Standardization transformation to the Diagram tab of the Job Editor.
8. Connect the DIFT Contacts table object to the Apply Lookup Standardization transformation.
9. Connect the Apply Lookup Standardization transformation to the One-Way Frequency
transformation.
7.6 Basic Standardization with the Apply Lookup Standardization Transformation 7-133

10. If necessary, re-order the processing for the transformations.

a. Click the Control Flow tab in the Details area.

b. Select the Apply Lookup Standardization transformation and then click .

c. On the Diagram tab, click to auto-arrange the elements in the process flow.

The process flow diagram should resemble the following:

11. Select File Save to save diagram and job metadata to this point.
7-134 Chapter 7 Working with Transformations

12. Specify properties for the Apply Lookup Standardization transformation.

a. Right-click on the Apply Lookup Standardization transformation and select Properties.

b. Click the Standardizations tab.

c. For the DATABASE column, select DIFT Database Scheme.sch.qkb as the value for the
Scheme field.

d. For the OS column, select DIFT OS Scheme.sch.qkb as the value for the Scheme field.

e. For the DATABASE column, select Phrase as the value for the Apply Mode field.

f. For the OS column, select Phrase as the value for the Apply Mode field.

g. Click to close the Apply Lookup Standardization Properties window.

13. Select File Save to save diagram and job metadata to this point.
7.6 Basic Standardization with the Apply Lookup Standardization Transformation 7-135

14. Specify properties for the One-Way Frequency transformation.

a. Right-click on the One-Way Frequency transformation and select Properties.

b. Click the Options tab.

1) Verify that Assign columns is selected in the selection pane.
2) Establish OS and Database in the Select columns to transpose area.

a) Click in the Select columns to perform a one-way frequency

distribution on area to open the Select Data Source Items window.

b) Select OS, hold down the CTRL key and select DATABASE, and then click .

c) Click to close the Select Data Source Items window. The Select
columns to perform a one-way frequency distribution on area
updates as displayed:
7-136 Chapter 7 Working with Transformations

3) Select Specify other options in the selection pane.

4) Type nocum nopercent as the value for Specify other options for TABLES
statement area.

5) Select Titles and footnotes in the selection pane.

6) Type OS and Database Frequency Counts as the value for the Heading 1.

c. Click to close the One-Way Frequency Properties window.

15. Select File Save to save diagram and job metadata to this point.
16. Run the job.

a. Right-click in background of the job and select Run.

b. Click the Status tab in the Details area. Note that all processes completed successfully.

c. Click to close the Details view.

d. View the Log for the executed Job.

e. Click the Output tab to view the report.

Output OS frequency report:
7.6 Basic Standardization with the Apply Lookup Standardization Transformation 7-137

Output DATABASE frequency report:

7-138 Chapter 7 Working with Transformations

Exercises

4. Standardizing Catalog Orders

A table from a prospective subsidiary was defined in metadata (DIFT Catalog_Orders). A
column (CATALOG) in this file has information on the types of products purchased.
Create a job to create a one-way frequency report based on the column CATALOG from the DIFT
Catalog_Orders table.

Partial Output:

Verify that the data are not clean.

Use the Apply Lookup Standardization transformation to apply a pre-defined scheme to the
CATALOG column. The scheme is named DIFT Catalog Orders.
7.6 Basic Standardization with the Apply Lookup Standardization Transformation 7-139

Create a one-way frequency report based on the standardized column CATALOG.

The final process flow should resemble the following:

7-140 Chapter 7 Working with Transformations

7.7 Solutions to Exercises

1. Using the Extract Transformation to Create Table for Female Customers Aged 15-30 Years
a. Create the initial job metadata.
1) Click the Folders tab.
2) Expand Data Mart Development Orion Reports Extract and Summary.
3) Verify that the Extract and Summary folder is selected.
4) Select File New Job. The New Job window opens.
5) Type DIFT Populate Female15To30Years Table as the value for the Name field.

6) Verify that the Location is set to

/Data Mart Development/ Orion Reports/Extract and Summary.

7) Click . The Job Editor window opens.

b. Add source table metadata to the diagram for the process flow.
1) Select the Data Mart Development Orion Target Data folder.
2) Drag the DIFT Customer Dimension table object to the Diagram tab of the Job Editor.
c. Add the Extract transformation to the process flow.
1) Click the Transformation tab.
2) Expand the Data folder and locate the Extract transformation template.
3) Drag the Extract transformation to the Diagram tab of the Job Editor. Place it next to the
DIFT Customer Dimension table object.

4) Connect the DIFT Customer Dimension table object to the Extract transformation.
d. Add the target table to the process flow.
1) Right-click on the green temporary table object associated with the Extract transformation
and select Register Table.
2) Type DIFT Customers - Females 15-30 Years as the value for the Name field.

3) Verify that the Location is set to

/Data Mart Development/ Orion Reports/Extract and Summary.
4) Click the Physical Storage tab.

5) Click next to the Library field.

6) Expand Data Mart Development Orion Target Data.

7) Click DIFT Orion Target Tables Library.
7.7 Solutions to Exercises 7-141

8) Click to close the Select a library window.

9) Type Female15To30Years as the value for the Name field.

10) Click to close the Register Table window.

The process flow diagram should resemble the following:

e. Select File Save to save diagram and job metadata to this point.
f. Specify properties for the Extract transformation.
1) Right-click on the Extract transformation and select Properties.
2) Click the Where tab.
3) Construct the following expression:

4) Click the Mappings tab.

5) Verify that all columns are mapped.

6) Click to close the Extract Properties window.

g. Run the job.

1) Right-click in background of the job and select Run.
2) Verify the job completed successfully.
The target table should have 12,465 observations and 11 columns.
7-142 Chapter 7 Working with Transformations

2. Using the Loop Transformations to Create Gender-Age Group Customer Tables

a. Define the table metadata object for the control table.
1) Click the Folders tab.
2) Expand Data Mart Development Orion Reports Loop Transforms.
3) Verify that the Loop Transforms folder is selected.
4) Select File New Table. The New Job window opens.
5) Type DIFT Control Table Gender Age Groups as the value for the Name
field.

6) Verify that the Location is set to

/Data Mart Development/ Orion Reports/ Loop Transforms.

7) Click .

8) Verify that the DBMS field is set to SAS.

9) Select DIFT Orion Target Tables Library as the value for the Library field.

10) Type DistinctGenderAgeGroups as the value for the Name field.

11) Click .

12) No column metadata will be selected from existing metadata objects. Click .

13) Define three new columns.

a) Click .

b) Type GenVal as the Name of the new column.

c) Type Single-character value for Gender as the Description of the new

column.
d) Type 1 as the Length of the new column.

e) Verify that the Type is set to Character.

f) Click .

g) Type AgeGroup as the Name of the new column.

h) Type Age Group values as the Description of the new column.

i) Type 12 as the Length of the new column.

j) Verify that the Type is set to Character.

7.7 Solutions to Exercises 7-143

k) Click .

l) Type GdrAgeGrp as the Name of the new column.

m) Type Concatenated Gender/AgeGroup values as the Description of the

new column.
n) Type 18 as the Length of the new column.

o) Verify that the Type is set to Character.

14) Click .

15) Review the metadata listed in the finish window and click .

b. Define the job metadata object to load the control table.

1) Click the Folders tab.
2) Expand Data Mart Development Orion Reports Loop Transforms.
3) Verify that the Loop Transforms folder is selected.
4) Select File New Job. The New Job window opens.
5) Type DIFT Populate Control Table of Gender-Age Groups as the value for
the Name field.
7-144 Chapter 7 Working with Transformations

6) Verify that the Location is set to

/Data Mart Development/ Orion Reports/ Loop Transforms.

7) Click . The Job Editor window is displayed.

c. Add source table metadata to the diagram for the process flow.
1) Expand Data Mart Development Orion Target Data.
2) Drag the DIFT Customer Dimension table object to the Diagram tab of the Job Editor.
d. Add the SQL Join transformation to the process flow.
1) Click the Transformation tab.
2) Expand the Data folder and locate the SQL Join transformation template.
3) Drag the SQL Join transformation to the Diagram tab of the Job Editor. Place it next to the
DIFT Customer Dimension table object.

4) Connect the DIFT Customer Dimension table object to the SQL Join transformation.
5) Right-click on the SQL Join transformation and select Ports Delete Input Port. The status
indicator now shows no errors.
e. Add the target table to the process flow.
1) Right-click on the green icon (output table icon) for the SQL Join transformation and select
Replace.
2) Expand Data Mart Development Orion Reports Loop Transforms.
3) Click the DIFT Control Table Gender Age Groups table object.

4) Click .

f. Specify the properties for the SQL Join node.

1) Right-click on the SQL Join transformation and select Open.
2) Click the Select keyword in the Navigate pane.
3) In the Select Properties pane, change the Distinct option from No to Yes.
4) On the Select tab, specify the following Expression information for the three target columns:
GenVal put(customer_gender,$1.)

AgeGroup <direct mapping of customer_age_group>

GdrAgeGrp compress(put(customer_gender,$gender.)||
tranwrd(customer_age_group,"-","To"))
7.7 Solutions to Exercises 7-145

5) Map the Customer_Gender column to both GenVal and GdrAgeGrp.

6) Map the Customer_Age_Group column to both AgeGroup and GdrAgeGrp.

7) Click to return to the Job Editor. Note that the status indicator associated with the
SQL Join transformation now shows no errors.

g. Select File Save to save diagram and job metadata to this point.
h. Run the job to generate the control table.
1) Right-click in background of the job and select Run.
2) Verify that the job runs successfully.
3) Click the Log tab and verify that DIFTTGT.DISTINCTCOUNTRIES is created with 45
observation and three variables.

4) Click the Diagram tab.

5) Right-click on DIFT Control Table Gender Age Groups and select Open.

6) Review the data and then choose File Close.

7) Select File Close to close the job editor.
7-146 Chapter 7 Working with Transformations

i. Create a table object template that will be used to generate the individual gender-age group tables.
1) Click the Folders tab.
2) Expand Data Mart Development Orion Reports Loop Transforms.
3) Verify that the Loop Transforms folder is selected.
4) Select File New Table.
5) Type DIFT Table Template for Gender Age Group Table as the value for the
Name field.

6) Verify that the Location is set to

/Data Mart Development/ Orion Reports/ Loop Transforms.

7) Click .

8) Verify that the DBMS field is set to SAS.

9) Select DIFT Orion Target Tables Library as the value for the Library field.

10) Type &GdrAgeGrp._Customers as the value for the Name field.

11) Click .

12) Click the Folders tab.

13) Expand Data Mart Development Orion Target Data.

14) Select DIFT Customer Dimension and click .

15) Click .

16) Click .

17) Review the metadata listed in the finish window and click .

j. Define the parameterized job metadata object to load the holding table.
1) Click the Folders tab.
2) Expand Data Mart Development Orion Reports Loop Transforms.
3) Verify that the Loop Transforms folder is selected.
4) Select File New Job. The New Job window opens.
5) Type DIFT Parameterized Job for Gender Age Group Tablesas the value
for the Name field.

6) Verify that the Location is set to

/Data Mart Development/ Orion Reports/ Loop Transforms.

7) Click . The Job Editor window is displayed.

7.7 Solutions to Exercises 7-147

8) Add source table metadata to the diagram for the process flow.
a) Expand Data Mart Development Orion Target Data.
b) Drag the DIFT Customer Dimension table object to the Diagram tab of the Job Editor.
9) Add the Extract transformation to the process flow.
a) Click the Transformation tab.
b) Expand the Data folder and locate the Extract transformation template.
c) Drag the Extract transformation to the Diagram tab of the Job Editor. Place it next to the
DIFT Customer Dimension table object.

d) Connect the DIFT Customer Dimension table object to the Extract transformation.
10) Add the target table to the process flow.
a) Expand Data Mart Development Orion Reports Loop Transforms.
b) Drag DIFT Table Template for Gender Age Group Tables table object to the Diagram
tab of the Job Editor.
c) Right-click on the output table object (green icon) for the Extract transformation and
select Delete.
d) Connect the Extract transformation to the DIFT Table Template for Gender Age Group
Tables table object.
11) Select File Save to save diagram and job metadata to this point.
12) Specify properties for the Extract transformation.
a) Right-click on the Extract transformation and select Properties.
b) Select Where tab.
c) In bottom portion of the Where tab, click the Data Sources tab.
d) Expand CustDim table.
e) Select Customer_Gender.

f) Click .

g) In Expression Text area, type ="&genval" AND (that is, an equals sign, the text
&genval, a space, the text AND, and another space).
h) On the Data Sources tab, double-click Customer_Age_Group to add this to the
Expression Text area.

i) In the Expression Text area, type ="&AgeGroup" following the

Customer_Age_Group column.
7-148 Chapter 7 Working with Transformations

The Expression Text area should be as shown below:

j) Click to close the Extract Properties window.

13) Select File Save to save diagram and job metadata to this point.
14) Define job parameters.
a) Right-click in the background of the job and select Properties.
b) Click the Parameters tab.

c) Click .

d) Type GenVal as the value for the Name field.

e) Type Gender Value as the value for the Displayed text field.

f) Click the Prompt Type and Values tab.

g) Type F as the value for the Default value field.

h) Click to close the New Prompt window.

i) Click .

j) Type AgeGroup as the value for the Name field.

k) Type Age Group Value as the value for the Displayed text field.

l) Click the Prompt Type and Values tab.

m) Type 1530 years as the value for the Default value field.

n) Click to close the New Prompt window.

o) Click .

p) Type GdrAgeGrp as the value for the Name field.

q) Type Gender Age Group Value as the value for the Displayed text field.

r) Click the Prompt Type and Values tab.

s) Type Female15To30years as the value for the Default value field.

t) Click to close the New Prompt window.

7.7 Solutions to Exercises 7-149

u) Click to validate the prompts and default values.

v) Click to close the Test the Prompts window.

w) Click to close the properties window.

15) Select File Save to save diagram and job metadata to this point.
16) Run the job.
a) Right-click in background of the job and select Run.

b) Click the Status tab in the Details area. Note that all processes completed successfully.

c) Click to close the Details view.

d) View the Log for the executed Job. Specifically, locate the note about the
FEMALE15TO30YEARS_CUSTOMERS table.

e) Select File Close to close the Job Editor window.

7-150 Chapter 7 Working with Transformations

k. Create the job metadata for the iterative job.

1) Click the Folders tab.
2) Expand Data Mart Development Orion Reports Loop Transforms.
3) Verify that the Loop Transforms folder is selected.
4) Select File New Job. The New Job window opens.
5) Type DIFT Loop Job for Gender Age Group Tables as the value for the Name
field.

6) Verify that the Location is set to

/Data Mart Development/ Orion Reports/ Loop Transforms.

7) Click . The Job Editor window is displayed.

l. Add control table metadata to the diagram for the process flow.
1) Click the Folders tab.
2) Expand Data Mart Development Orion Reports Loop Transforms.
3) Drag the DIFT Control Table Gender Age Groups table object to the Diagram tab of the
Job Editor.
m. Add the Loop transformation to the process flow.
1) Click the Transformations tab.
2) Expand the Control folder and locate the Loop transformation template.
3) Drag the Loop transformation to the Diagram tab of the Job Editor.
4) Connect the DIFT Control Table - Gender-Age Groups table object as input to the Loop
transformation.
n. Add the parameterized job to the process flow.
1) Click the Folders tab.
2) Expand Data Mart Development Orion Reports Loop Transforms.
3) Drag the DIFT Parameterized Job for Gender Age Group Tables job to the Diagram tab of
the Job Editor.
7.7 Solutions to Exercises 7-151

o. Add the Loop End transformation to the process flow.

1) Click the Transformations tab.
2) Expand the Control folder and locate the Loop End transformation template.
3) Drag the Loop End transformation to the Diagram tab of the Job Editor.
The process flow diagram should resemble the following:

p. Edit the properties of the Loop transformation.

1) Right-click on the Loop transformation in the process flow diagram and select Properties.
2) Click the Parameter Mapping tab.

3) For the Gender Value parameter, select GenVal as the value for the Mapped Source
Column.

4) For the Age Group Value parameter, select AgeGroup as the value for the Mapped
Source Column.

5) For the Gender Age Group Value parameter, select GdrAgeGrp as the value for the
Mapped Source Column.

6) Click to close the Loop Properties window.

7-152 Chapter 7 Working with Transformations

q. Select File Save to save diagram and job metadata to this point.
r. Run the job.
1) Right-click in background of the job and select Run.
2) Click the Status tab in the Details area. Note that all processes completed successfully.

3) Click to close the Details view.

4) View the Log for the executed Job.

s. Verify that the tables were created.
1) Access the Window Explorer.
2) Navigate to S:\Workshop\dift\datamart.
3) Verify that the data sets were created.

4) Select File Close to close the Windows Explorer.

7.7 Solutions to Exercises 7-153

3. Using the Data Validation Transformation

a. Specify metadata for a valid products table.
1) Click the Folders tab.
2) Expand Data Mart Development Orion Reports Data Validation.
3) Verify that the Data Validation folder is selected.
4) Select File New Table.
5) Type DIFT Valid Customers as the value for the Name field.

6) Verify that the Location is set to

/Data Mart Development/ Orion Reports/Data Validation.

7) Click .

8) Verify that the DBMS field is set to SAS.

9) Select DIFT Orion Target Tables Library as the value for the Library field.

10) Type Valid_Customers as the value for the Name field.

11) Click .

12) Expand the Data Mart Development Orion Source Data folder on the Folders tab.

13) From the Orion Source Data folder, select DIFT Customer Dimension table object.

14) Click to move all columns to the Selected pane.

15) Click .

16) Accept the default column attributes and click .

17) Review the metadata listed in the finish window and click .

b. Specify metadata for an invalid products table.

1) Click the Folders tab.
2) Expand Data Mart Development Orion Reports Data Validation.
3) Verify that the Data Validation folder is selected.
4) Select File New Table.
5) Type DIFT Invalid Customers as the value for the Name field.

6) Verify that the Location is set to

/Data Mart Development/ Orion Reports/Data Validation.

7) Click .
7-154 Chapter 7 Working with Transformations

8) Verify that the DBMS field is set to SAS.

9) Select DIFT Orion Target Tables Library as the value for the Library field.

10) Type Invalid_Customers as the value for the Name field.

11) Click .

12) Expand the Data Mart Development Orion Source Data folder on the Folders tab.
13) From the Orion Source Data folder, select DIFT Customer Dimension table object.

14) Click to move all columns to the Selected pane.

15) Click .

16) Accept the default column attributes and click .

17) Review the metadata listed in the finish window and click .

c. Create the initial job metadata.

1) Click the Folders tab.
2) Expand Data Mart Development Orion Reports Data Validation.
3) Verify that the Data Validation folder is selected.
4) Select File New Job. The New Job window opens.
5) Type Populate Valid and Invalid Customer Tables as the value for the
Name field.

6) Verify that the Location is set to

/Data Mart Development/ Orion Reports/Data Validation.

7) Click . The Job Editor window opens.

d. Add source table metadata to the diagram for the process flow.
1) Click the Data Mart Development Orion Target Data Folders.
2) Drag the DIFT Customer Dimension table to the Diagram tab of the Job Editor.
e. Add the Data Validation transformation to the process flow.
1) Click the Transformation tab.
2) Expand the Data folder and locate the Data Validation transformation template.
3) Drag the Data Validation transformation to the Diagram tab of the Job Editor.
4) Connect the DIFT Customer Dimension table object to the Data Validation transformation.
7.7 Solutions to Exercises 7-155

f. Add a target table to the process flow.

1) Right-click on the green temporary table object associated with the Data Validation
transformation and select Replace.
2) Expand Data Mart Development Orion Reports Data Validation.
3) Select DIFT Valid Customers.

4) Click .

g. Specify properties for the Data Validation transformation.

1) Right-click on the Data Validation transformation and select Properties.
2) Click the Options tab.

a) Locate the Enter an error table name (libref.tablename) option.

b) Type difttgt.invalid_customers as the value for this option.

3) Click the Status Handling tab.

a) Click to add a new condition.

b) Verify that Data Exception is the Condition.

c) Select Save Report as the Action.

d) In the Action Options window, type

S:\Workshop\dift\reports\ValidInvalidCustExceptReport.txt
as the value for File Name.

e) Click to close the Action Options window.

4) Click the Invalid Values tab.

a) Click to add a new row.

b) Select Customer_Type as the value for Column Name.

c) Click next to Lookup Table.

d) Expand the DIFT Customer Types table in the Data Mart Development
Data folder and select Customer_Type.

e) Click to close the Lookup Table and Column window.

f) Select Change value to as the value for the Action if invalid field.
7-156 Chapter 7 Working with Transformations

g) Type Unknown Customer Type as the value for the New Value field.

The following is populated in the Invalid Values window:

h) Click to close the Invalid Values window.

5) Click the Missing Values tab.

a) Click to add a new row.

b) Select Customer_ID as the value for Column Name.

c) Verify that Abort job is set as the value for Action if missing.

d) Click to close the Missing Values window.

7.7 Solutions to Exercises 7-157

6) Click the Duplicate Values tab.

a) Click to add a new row.

b) Select both Customer_Name and Birth_Date.

c) Click to close the Missing Values window.

7) Click to close the Data Validation Properties window.

7-158 Chapter 7 Working with Transformations

h. Add a sticky note to the job. (A sticky note is a way to visually document within a job.)

1) Click on the Job Editor toolbar.

2) Drag the sticky note and place it under the Data Validation transformation.
3) Double-click the sticky note to expand to add some text.
4) Type The Invalid_Customers table is populated through the
execution of the Data Validation transformation. as the text for the
sticky note.

i. Select File Save to save the job metadata.

j. Run the job.
1) Right-click in background of the job and select Run.

2) Click the Status tab in the Details area. Note that all processes completed successfully.

3) Click to close the Details view.

4) View the Log for the executed Job.

5) Scroll to view the note about the creation of the DIFTTGT.INVALID_CUSTOMERS table
and the creation of the DIFTTGT.VALID_CUSTOMERS table.
7.7 Solutions to Exercises 7-159

k. View the exception report.

1) Open Windows Explorer and navigate to S:\Workshop\dift\reports.
2) Open the file ValidInvalidCustExceptReport.txt to view the generated
exception report. The exception report shows that all error exceptions were due to duplicate
values.

3) When you are finished viewing the file, select File Exit to close it.

l. View the DIFT Valid Customers table.

1) Click the Diagram tab of the Job Editor window.

2) Right-click on the DIFT Valid Customers table and select Open. Identify the duplicate
observations.
3) When you are finished viewing the DIFT Valid Customers data set, close the View
Data window by selecting File Close.
m. Select File Save to save the job metadata.
n. Select File Close to close the job editor.
Questions:
How many rows were moved to the error table because the value for Customer_Type was invalid?
None
Were missing values found for Customer ID?
No
Were there any duplicate values found for Customer_Name and Customer_Birth_Date?
Yes four pairs.
7-160 Chapter 7 Working with Transformations

4. Standardizing Catalog Orders.

a. Create initial job metadata.
1) Click the Folders tab.
2) Expand Data Mart Development Orion Reports Lookup Standardization.
3) Verify that the Lookup Standardization folder is selected.
4) Select File New Job. The New Job window opens.
5) Type DIFT Standardize and Report on Catalog Orders Table as the
value for the Name field.

6) Verify that the Location is set to

/Data Mart Development/ Orion Reports/Lookup Standardization.

7) Click . The Job Editor window opens.

b. Add source table metadata to the diagram for the process flow.
1) Click the Folders tab.
2) Navigate to the Data Mart Development Orion Source Data folder.
3) Drag the DIFT Catalog_Orders table object to the Diagram tab of the Job Editor.
c. Add the One-Way Frequency transformation to the process flow.
1) Click the Transformation tab.
2) Expand the Analysis folder and locate the One-Way Frequency transformation template.
3) Drag the One-Way Frequency transformation to the Diagram tab of the Job Editor.
d. Connect the DIFT Catalog_Orders table object to the One-Way Frequency transformation.
e. Select File Save to save diagram and job metadata to this point.
f. Specify properties for the One-Way Frequency transformation.
1) Right-click on the One-Way Frequency transformation and select Properties.
2) Click the Options tab.
3) Verify that Assign columns is selected in the selection pane.
4) Establish CATALOG in the Select columns to perform a one-way frequency
distribution on area.

a) Click in the Select columns to perform a one-way frequency

distribution on area to open the Select Data Source Items window.

b) Select CATALOG and then click .

c) Click to close the Select Data Source Items window.

7.7 Solutions to Exercises 7-161

5) Select Specify other options in the selection pane.

a) Type nocum nopercent as the value for Specify other options for
TABLES statement area.

6) Select Titles and footnotes in the selection pane.

a) Type CATALOG Frequency Counts as the value for the Heading 1.

7) Click to close the One-Way Frequency Properties window.

g. Select File Save to save diagram and job metadata to this point.
h. Run the job.
1) Right-click in background of the job and select Run.
2) Click the Status tab in the Details area. Note that all processes completed successfully.

3) Click to close the Details view.

4) View the Log for the executed Job.

5) Click the Output tab to view the report.
7-162 Chapter 7 Working with Transformations

i. Add the Apply Lookup Standardization transformation to the process flow.

1) Click the Transformation tab.
2) Expand the Data Quality folder and locate the Apply Lookup Standardization
transformation template.
3) Drag the Apply Lookup Standardization transformation to the Diagram tab of the Job
Editor.
j. Connect the DIFT Catalog_Orders table object to the Apply Lookup Standardization
transformation.
k. Click the connection line between the table and the One-Way Frequency transformation, right-
click, and select Delete.
l. Connect the Apply Lookup Standardization transformation to the One-Way Frequency
transformation.
m. If necessary, re-order the processing for the transformations.

1) Click the Control Flow tab in the Details area.

2) Select the Apply Lookup Standardization transformation and then click .

n. Select File Save to save diagram and job metadata to this point.
o. Specify properties for the Apply Lookup Standardization transformation.
1) Right-click on the Apply Lookup Standardization transformation and select Properties.
2) Click the Standardizations tab.
3) For the CATALOG column, select DIFT Catalog Orders.sch.qkb as the value for the
Scheme field.

4) For the CATALOG column, select Phrase as the value for the Apply Mode field.

5) For the OS column, select Phrase as the value for the Apply Mode field.

6) Click to close the Apply Lookup Standardization Properties window.

p. Select File Save to save diagram and job metadata to this point.
q. Specify properties for the One-Way Frequency transformation.
1) Right-click on the One-Way Frequency transformation and select Properties.
2) Click the Options tab.
3) Verify that Assign columns is selected in the selection pane.
7.7 Solutions to Exercises 7-163

4) Establish CATALOG in the Select columns to transpose area.

a) Click in the Select columns to perform a one-way frequency

distribution on area to open the Select Data Source Items window.

b) Select CATALOG and then click .

c) Click to close the Select Data Source Items window.

5) Select Specify other options in the selection pane.

a) Type nocum nopercent as the value for Specify other options for
TABLES statement area.

6) Select Titles and footnotes in the selection pane.

a) Type CATALOG Frequency Counts as the value for the Heading 1.

7) Click to close the One-Way Frequency Properties window.

r. Select File Save to save diagram and job metadata to this point.
s. Run the job.
1) Right-click in background of the job and select Run.

2) Click the Status tab in the Details area. Note that all processes completed successfully.

3) Click to close the Details view.

4) View the Log for the executed Job.

5) Click the Output tab to view the report.
7-164 Chapter 7 Working with Transformations
Chapter 8 Working with Tables and
the Table Loader Transformation

8.1 Basics of the Table Loader Transformation ................................................................. 8-3

8.2 Load Styles of the Table Loader Transformation ........................................................ 8-8

8.3 Table Properties and Load Techniques of the Table Loader Transformation ......... 8-14
8-2 Chapter 8 Working with Tables and the Table Loader Transformation

8.1 Basics of the Table Loader Transformation 8-3

8.1 Basics of the Table Loader Transformation

Objectives
Discuss available table loader techniques.
Discuss reasons to use the Table Loader
transformation.

Loader Transformations
SAS Data Integration Studio provides
three specific transformations to load data.
These Loader transformations are
designed to output to permanent,
registered tables (that is, tables
that are available in the Folder or
Inventory Tree).
Loaders can do the following:
create and replace tables

maintain indexes

do updates and appends

be used to maintain constraints

4
8-4 Chapter 8 Working with Tables and the Table Loader Transformation

Available Loader Transformations

SAS Data Integration Studio provides the following
transformations for loading data into permanent output
tables:
SPD Server Table Loader

Table Loader

SCD Type 2 Loader

SPD Server Table Loader Information

The SPD Server Table Loader reads a source and writes
to an SPD Server target and is automatically added to
a process flow when an SPD Server table is specified
as a source or as a target. Options that are specific
to SPD Server tables can be specified with this
transformation.

The SPD Server Table Loader is not discussed in this class.

8.1 Basics of the Table Loader Transformation 8-5

Table Loader Information

The Table Loader is a general loader that reads
a source table and writes to a target table and can
be used to load SAS and most DBMS tables, as well
as Excel spreadsheets. The code generated by this
transformation includes syntax that is specific to the
output data type.

The Table Loader transformation is discussed in more detail in this chapter.

SCD Type 2 Loader Information

The SCD Type 2 Loader
loads source data into a
dimension table, detects
changes between source
and target rows, updates
change tracking columns,
and applies generated
key values. This
transformation
implements slowly
changing dimensions.

The SCD Type 2 Loader is discussed in more detail in a later chapter.

8-6 Chapter 8 Working with Tables and the Table Loader Transformation

No Table Loader?
SAS Data Integration Studio data transformations can
perform a simple load of that transformations output
table. The transformations will drop and then replace
the table.

Why Use the Table Loader?

A Table Loader transformation can also be used to load
a permanent output table. The options of the Table
Loader transformation control how data is loaded into
the target table.

10
8.1 Basics of the Table Loader Transformation 8-7

Why Use the Table Loader?

A separate Table Loader transformation might be
desirable under the following conditions:
loading a DBMS table with any technique other than
drop and replace
loading tables that contain rows that must be updated
upon load (instead of dropping and re-creating the
table each time the job is executed)
creating primary keys, foreign keys, or column
constraints
performing operations on constraints before or after
the loading of the output table
performing operations on indexes other than after
the loading of the output table

11
8-8 Chapter 8 Working with Tables and the Table Loader Transformation

8.2 Load Styles of the Table Loader Transformation

Objectives
Discuss various load styles provided by the Table
Loader transformation.

Important Step
An important step in an ETL process usually involves
loading data into a permanent physical table that is
structured to match your data model. The designer or
builder of an ETL process flow must identify the type of
load that the process requires in order to:
append all source data to any
previously loaded data
replace all previously loaded
data with the source data
use the source data to update
and add to the previously
loaded data based on
specific key column(s)

14
8.2 Load Styles of the Table Loader Transformation 8-9

Load Style
In SAS Data Integration Studio, the Table Loader
transformation can be used to perform any of the three
load types (the Load style field on the Load Technique
tab).

When the type of load is known, various techniques

and options can be selected to maximize the steps
performance.

Add New Rows

The Table Loader transformation provides two techniques
for adding new rows for all three load styles.

New rows can be appended using:

PROC APPEND with the FORCE option (default)

PROC SQL with the INSERT statement

The APPEND procedure with the FORCE option is the default. If the source is a large table and the target
is in a database that supports bulk load, PROC APPEND can take advantage of the bulk-load feature.
Consider bulk loading the data into database tables by using the optimized SAS/ACCESS engine bulk
loaders. It is recommended that you use native SAS/ACCESS engine libraries instead of ODBC libraries
or OLE DB libraries for relational database data. SAS/ACCESS engines have native access to the
databases and have superior bulk-loading capabilities.
PROC SQL with the INSERT statement performs well when the source table is small (because the
overhead needed to set up bulk loading is not incurred). PROC SQL with the INSERT option adds one
row at a time to the database.
8-10 Chapter 8 Working with Tables and the Table Loader Transformation

Load Style - Replace

The default load style is Replace. Based on the type of
target table that is being loaded, three or four selections
are listed for the Replace field.

Replace Description
All rows using delete uses PROC SQL with DELETE * to remove all rows.
Entire table replaces the entire table using PROC DATASETS.
Simulating truncate uses DATA step with SET and STOP statements to
remove all rows (available only for SAS tables).
All rows using truncate uses PROC SQL with TRUNCATE to remove all rows
(only available for some databases).
17

When Entire table is selected, the table is removed and disk space is freed. Then the table is re-created
with 0 rows. Consider using this option unless your security requirements restrict table deletion
permissions (a restriction that is commonly imposed by a database administrator on database tables).
Also, avoid this method if the table has any indexes or constraints that SAS Data Integration Studio
cannot re-create from metadata (for example, check constraints).
If available, consider using All rows using truncate. Both All rows using selections enable you to
keep all indexes and constraints intact during the load. By design, using TRUNCATE is the quickest way
to remove all rows. The DELETE * syntax also removes all rows; however, based on the database and
table settings, this choice can incur overhead that will degrade performance. The database administrator
or database documentation should be consulted for a comparison of the two techniques.

Caution: When using All rows using delete repeatedly to clear a SAS table, the size of that
table should be monitors over time. All rows using delete only performs logical deletes for SAS
tables; therefore, a tables physical size will grow and the increased size can negatively affect
performance.
8.2 Load Styles of the Table Loader Transformation 8-11

Load Style Append to Existing

A second load style choice is Append to Existing.
The new rows are added to the existing target table
by using either PROC APPEND or PROC SQL,
as previously mentioned.

Load Style Update/Insert

When the load style is set to Update/Insert, there
are many additional choices available to affect the
processing.

19
8-12 Chapter 8 Working with Tables and the Table Loader Transformation

Match and Update Rows

The Table Loader transformation provides four
techniques for matching and updating rows in a table;
all four techniques are associated with the Update/Insert
load style.

The Skip Matching Rows choice ignores any input

rows that match rows in the output table (by selected
match-columns or index). This prevents any existing
rows in the output table from being modified.

20 ...
8.2 Load Styles of the Table Loader Transformation 8-13

Four Merge/Update Techniques

Setting Description
Modify by Column(s) specifies that the DATA STEP MODIFY BY
method will be used to update output table
rows with all matching input data.
Modify Using Index specifies that the DATA STEP MODIFY with
KEY= method will be used to update the
output table rows with all matching input
data.
Skip Matching Rows ignores any input rows that match rows in the
output table (by selected match-columns or
index). This action prevents any existing
rows in the output table from being modified.
SQL Set specifies that PROC SQL will be used to
update output table rows with all matching
input data.
21

When Modify by Column(s) is selected, the Match by Column(s) group box, which
enables you to select columns, is enabled.

When Modify Using Index is selected, the Modify Using Index group box, which enables
you to select an index, is enabled. Also, the Modify Using Index group box has a check box
enabled, Return to the top of the index for duplicate values coming
from the input data.

The options Modify by Columns and Modify using Index have the added benefit of being able to take
unmatched records and add them to the target table during the same single pass through the source table.
Of these three choices, the DATA step MODIFY with KEY= method often out-performs the other update
methods in tests conducted on loading SAS tables. The DATA step MODIFY with KEY= method can also
perform adequately for database tables when indexes are used.
When the SQL procedure with the WHERE or SET statements is used, performance varies. Neither of
these statements in PROC SQL requires data to be indexed or sorted, but indexing on the key column(s)
can greatly improve performance. Both of these statements use WHERE processing to match each row of
the source table with a row in the target table.
The update technique chosen should depend on the percentage of rows being updated. If the majority of
target records are being updated, the DATA step with MERGE (or UPDATE) might perform better than
the DATA step with MODIFY BY or MODIFY KEY= or PROC SQL because MERGE makes full use of
record buffers.
Performance results can be hardware and operating environment dependent, so you should consider
testing more than one technique.
8-14 Chapter 8 Working with Tables and the Table Loader Transformation

8.3 Table Properties and Load Techniques of the Table

Loader Transformation

Objectives
Discuss various types of keys and how to define
in SAS Data Integration Studio.
Discuss indexes and how to define in SAS Data
Integration Studio.
Discuss Table Loader options for keys and indexes.

Keys
Several transformations available in SAS Data Integration
Studio, including the Table Loader transformation, can
take advantage of different types of keys that can be
defined for tables.

This section simply defines:

Primary Keys

Foreign Keys

Unique Keys

Surrogate Keys

24
8.3 Table Properties and Load Techniques of the Table Loader Transformation 8-15

Key Definitions Primary Key, Foreign Key

Primary Key One or more columns that are used to
uniquely identify a row in a table. A table
can have only one primary key. A primary
key cannot contain null values.
Foreign Key One or more columns that are associated
with a primary key or unique key in
another table. A table can have one or
more foreign keys. A foreign key is
dependent upon its associated primary or
unique key. In other words, a foreign key
cannot exist without that primary or
unique key.

Key Definitions Unique Key, Surrogate Key

Unique Key One or more columns that can be used
to uniquely identify a row in a table. A
table can have one or more unique keys.
Unlike a primary key, a unique key can
contain null values.
Surrogate Key A numeric column in a dimension table
that is the primary key of that table. The
surrogate key column contains unique
integer values that are generated
sequentially when rows are added and
updated. In the associated fact table, the
surrogate key is included as a foreign
key in order to connect to specific
dimensions.
26
8-16 Chapter 8 Working with Tables and the Table Loader Transformation

Assigning Key Definitions

Key definitions can be assigned in the Properties window
for a table object, and in Properties for some
transformations.

27
8.3 Table Properties and Load Techniques of the Table Loader Transformation 8-17

Table Loader and Primary Keys

The Load Technique tab of the Table Loader enables the
data integration developer to specify how to handle
integrity constraints regardless of the load style.

Integrity constraints preserve the consistency and correctness of stored data. They restrict the data values
that can be updated or inserted into a table. Integrity constraints can be specified at table creation time or
after data already exists in the table. In the latter situation, all data are checked to verify that they satisfy
the candidate constraint before the constraint is added to the table. Integrity constraints are enforced
automatically by the SAS System for each add, update, and delete of data to the table containing the
constraint(s). Specifying constraints is the user's responsibility.
There are five basic types of integrity constraints:
Not Null (Required Data)
Check (Validity Checking)
Unique (Uniqueness)
Primary Key (Unique and Not Null)
Foreign Key (Referential)
The first four types of constraints are referred to as "general constraints" in this document. Foreign keys
and primary keys that are referenced by one or more foreign keys are referred to as "referential
constraints". Note that a primary key alone is insufficient for referential integrity. Referential integrity
requires a primary key and a foreign key.
8-18 Chapter 8 Working with Tables and the Table Loader Transformation

Indexes
An index is an optional file that you can create for
a SAS data file that does the following:
points to observations based on the values of one
or more key variables
provides direct access to specific observations

An index locates an observation by value.

The Purpose of Indexes

Indexes can provide direct access to observations
in SAS data sets to accomplish the following:
yield faster access to small subsets (WHERE)

return observations in sorted order (BY)

perform table lookup operations (SET with KEY=)

join observations (PROC SQL)

modify observations (MODIFY with KEY=)

30
8.3 Table Properties and Load Techniques of the Table Loader Transformation 8-19

Business Scenario
The SAS data set orion.sales_history is often
queried with a WHERE statement.
Partial Listing of orion.sales_history
Customer Order_ Order_ Product_
. . . Product_ID . . . . . .
_ID ID Type Group

14958 . . . 1230016296 1 210200600078 . . . N.D. Gear, Kids . . .

Eclipse
14844 . . . 1230096476 1 220100100354 . . . . . .
Clothing
14864 . . . 1230028104 2 240600100115 . . . Bathing Suits . . .
14909 . . . 1230044374 1 240100200001 . . . Darts . . .
14862 . . . 1230021668 1 240500200056 . . . Running Clothes . . .
14853 . . . 1230021653 1 220200200085 . . . Shoes . . .
14838 . . . 1230140184 1 220100300042 . . . Knitwear . . .
14842 . . . 1230025285 1 240200100053 . . . Golf . . .
14815 . . . 1230109468 3 230100700004 . . . Tents . . .
14797 . . . 1230168587 1 220101000004 . . . Shorts . . .

Business Scenario
You need to create three indexes on the most frequently
used subsetting columns.
Index Name Index Variables
Customer_ID Customer_ID
Product_Group Product_Group

SaleID Order_ID
Product_ID
Partial Listing of orion.sales_history
Customer Order_ Order_ Product_
. . . Product_ID . . . . . .
_ID ID Type Group

14958 . . . 1230016296 1 210200600078 . . . N.D. Gear, Kids . . .

Eclipse
14844 . . . 1230096476 1 220100100354 . . . . . .
Clothing

32
8-20 Chapter 8 Working with Tables and the Table Loader Transformation

Simplified Index File

The index file consists of entries that are organized
in a tree structure and connected by pointers.
Customer_ID
Partial Listing of Simplified Index
orion.sales_history Key Value Record Identifier (RID)
Customer_ID Employee_ID . . . Page(obs, obs, ...)
14958 121031 . . . 4006 17(85)
14844 121042 . . . 4021 17(89)
14864 99999999 . . .
14909 120436 . . . 4059 17(90)
14862 120481 . . . 4063 17(80, 86)
14853 120454 . . .
.
14838 121039 . . .
.
14842 121051 . . .
.
14815 99999999 . . .
14797 120604 . . . 14958 1(1, 24)
. . . 14972 1(14)
. . .
.
. . . .
.
33

Why Use an Index?

How is data processed if the input data is not indexed?

data customer14958;
set orion.sales_history;
where Customer_ID=14958;
run;

34
8.3 Table Properties and Load Techniques of the Table Loader Transformation 8-21

Reading SAS Data Sets without an Index

Input
SAS Buffers The WHERE statement
selects observations
Data by reading data
Data sequentially.
pages are
loaded.
PDV
Customer_ID Employee_ID
Output Buffers
SAS
Data

Why Use an Index?

How is data processed if the input data is indexed?

data customer14958;
set orion.sales_history;
where Customer_ID=14958;
run;

41
8-22 Chapter 8 Working with Tables and the Table Loader Transformation

Reading SAS Data Sets with an Index

Index Index The index file

is checked.

Input
SAS Buffers The WHERE statement
Data selects observations
Only by using direct access.
necessary
pages are
loaded. PDV
Customer_ID Employee_ID
Output Buffers
SAS
Data
48

SAS Data Integration Studio and Indexes

Indexes can be added to a table definition by using
the Properties window:

PROC DATASETS and PROC SQL can also

be used in user-written code to create indexes.
49
8.3 Table Properties and Load Techniques of the Table Loader Transformation 8-23

Indexes on SAS Data Tables

For large-scale ETL flows, indexes on SAS data sets
might be useful to perform the following processes:
resolve WHERE clauses

allow an index join as an optimization for two-table

equijoins
verify key uniqueness

WHERE Clauses Resolved in ETL Flows

In the context of ETL flows, indexes on SAS data sets are
beneficial for processing selective WHERE clauses that
qualify for index use.

When resolving a WHERE clause, if the number of rows

that qualify for the subsetting is 15% or less than the total
number of rows, then the index is used; otherwise, a
sequential scan occurs.

Keep this in mind when building indexes, because the

overhead of building and maintaining an index is costly.

51
8-24 Chapter 8 Working with Tables and the Table Loader Transformation

Index Usage in SAS

MSGLEVEL= controls the level of detail in messages
that are written to the SAS log.

If MSGLEVEL=I, SAS writes informative messages to the

SAS log about index processing. In general, when a
WHERE expression is executed for a data set with indexes:
if an index is used, a message displays that specifies
the name of the index.
if an index is not used but one exists that could optimize
at least one condition in the WHERE expression,
messages provide suggestions that describe what you
can do to influence SAS to use the index. For example,
a message could suggest to sort the data set into index
order or to specify more buffers.
52

Controlling Indexes and Constraints

The Load Technique tab of the Table Loader
transformation can be used to control the timing of index
and constraint removal. There are four settings that
enable you to specify the desired conditions for the
indexes and constraints before and after the load.

53
8.3 Table Properties and Load Techniques of the Table Loader Transformation 8-25

Condition Settings
The Constraint Condition and Index Condition options that
are available will depend on the load technique specified.

The choices translate to three different tasks:

Put on

Take off

Leave as is

Condition Settings Information

If leave off/off is specified for the Before Load
option(s), then the generated code checks for and
removes any indexes (or constraints) that are found
and then loads the table.

If an index is required for an update, then that index

is not removed or it will be added, as needed.

If leave on/on is specified for the After Load option(s),

then the indexes will be added after the load.

55
8-26 Chapter 8 Working with Tables and the Table Loader Transformation

Physical Table Out of Sync with Metadata

If indexes are specified as part of a tables metadata,
and if load options are specified to drop indexes and/or
constraints, then care needs to be taken not to allow the
physical table not to remain out-of-sync with the table
metadata.

Remove Indexes Before Load

There are times when performance can be improved
by removing non-essential indexes before a load and
re-creating those indexes after the load.

General Rule
Consider removing and re-creating indexes
if more than 10% of the data in the table
will be re-loaded.

57
8.3 Table Properties and Load Techniques of the Table Loader Transformation 8-27

Remove Constraints Before Load

Performance can also be improved if key constraints are
temporarily removed.

Example: If there are a significant number of

transactions and the data that is being
loaded conforms to the constraints, then
removing the constraints from the target
before a load removes the overhead that
is associated with maintaining those
constraints during the load.

58
8-28 Chapter 8 Working with Tables and the Table Loader Transformation
Chapter 9 Working with Slowly
Changing Dimensions

9.1 Defining Slowly Changing Dimensions ........................................................................ 9-3

9.2 Using the SCD Type 2 Loader and Lookup Transformations ................................... 9-15
Demonstration: Populate Star Schema Tables Using the SCD Type 2 Loader with the
Surrogate Key Method................................................................................ 9-29

9.3 Introducing the Change Data Capture Transformations ........................................... 9-93

9-2 Chapter 9 Working with Slowly Changing Dimensions
9.1 Defining Slowly Changing Dimensions 9-3

9.1 Defining Slowly Changing Dimensions

Objectives
Explain slowly changing dimensions.
Define keys.
List benefits of slowly changing dimensions.
Define SCD types 1, 2, and 3.

Understanding Slowly Changing Dimensions

Slowly changing dimensions
refers to a process that updates
dimension tables in a star
schema in a way that preserves
a history of changes.

4
9-4 Chapter 9 Working with Slowly Changing Dimensions

Star Schema (Review)

A star schema is a partly normalized data model that
separates data into two categories: facts and dimensions.

Facts Event-based numeric information, such as

number of items sold and invoice amount
in sales transactions
Dimensions Descriptive information that support or give
context to the facts, such as customer
information, product information, and
calendar information for sales transactions

A star schema uses keys to specify relationships between

the tables.

Star Schema Keys

The following keys are used in a star schema:
Business identifies a business entity, such as
Key Customer_ID and Product_ID.
Primary uniquely identifies a row in a dimension
Key table.
Foreign associates a row in the fact table to the
Key corresponding row in a dimension table.

6
9.1 Defining Slowly Changing Dimensions 9-5

Primary and Foreign Keys (Review)

A data model that separates data into tables uses
primary/foreign keys to maintain referential integrity.
Each dimension table in a star schema has a primary
key.
The associated foreign keys are in the fact table.

Primary and Foreign Keys (Review)

Referential integrity constraints include:
primary keys are unique and not null (dimensions)

foreign keys (in the fact table) can only have values
that exist in the primary key
a primary key value cannot be deleted if it exists in a
foreign key

8
9-6 Chapter 9 Working with Slowly Changing Dimensions

Primary and Foreign Keys (Review)

The key values
relate records in the fact table to records in the
dimension tables
can be used to join the fact table with one or more
dimension tables for analysis and reporting.

Business Keys
Often the business key in a dimension table can function
as a primary key in that table
Customer_ID in Customer Dimension
Product_ID in Product Dimension
etc

10
9.1 Defining Slowly Changing Dimensions 9-7

Using Slowly Changing Dimensions

With slowly changing dimensions:
dimension tables are structured to keep historical
records
dimension tables can accumulate multiple records per
business key
an alternate column or a combination of columns must
be used as a primary key
when a new record is loaded into the fact table, key
values are included that link that record to the
corresponding current records in the dimension tables
the historical facts maintain their correct business
context

Examples of Slowly Changing Dimensions

Examples of dimensional changes over time:
product name, product price
country name
customer name, customer address
organization name, organization address
employee job title, employee salary

12
9-8 Chapter 9 Working with Slowly Changing Dimensions

Additional Benefits of Slowly Changing

Dimensions
The record of changes in the dimension tables also
provide a basis for:
historical reporting
trend analysis of demographic data

Value of Historical Records

Historical reporting:
Through a merger a customer has been absorbed into a
new organization. In analyses of purchases by this
customer, the transactions before the merger can be
distinguished from the transactions after the merger.
Trend analysis:
A customer dimension table can be analyzed for changes
in mailing addresses in order to establish a subset of
customers who are most likely to remain in a given
residence for a given amount of time. This information
enables those customers to be targeted for special buying
incentives.

14
9.1 Defining Slowly Changing Dimensions 9-9

Types of Slowly Changing Dimensions

Slowly changing dimensions are categorized into three
types:
Type 1: no history
Type 2: history in rows
Type 3: history in columns

Example of Slowly Changing Dimensions

Consider an Employee Salary table to explain the three
types of slowly changing dimensions.
In the Employee Salary table Emp_ID
is the business key. Each value of Emp_ID uniquely
identifies a business entity
can be used as a primary key
Employee Salary Information
Emp_ID Emp_Name Year Salary
(PK)
1649 K Munch 2004 $42,000

16
9-10 Chapter 9 Working with Slowly Changing Dimensions

Type 1 Slowly Changing Dimension

A Type 1 SCD is updated by replacing old values with
new values.
In the year 2005 the salary of K Munch changes. The old
year (2004) and salary ($42,000) are updated by
replacing them with the new values. With Type 1 SCD, old
values are not recoverable. The table only contains
current information.

Employee Salary Information

Emp_ID Emp_Name Year Salary
(PK)
1649 K Munch 2005 $45,000

Type 2 Slowly Changing Dimension

A Type 2 SCD is updated by adding a record when a
change occurs.
In this Type 2 SCD a record is added with the new year
and salary values.
Emp_ID can no longer be used as a primary key.

Employee Salary Information

Emp_ID Emp_Name Year Salary
(PK)
1649 K Munch 2004 $42,000
1649 K Munch 2005 $45,000
18
9.1 Defining Slowly Changing Dimensions 9-11

Type 2 Slowly Changing Dimension

An effective datetime and an expiration datetime may be
added to the table.
These specify when the record is valid.
The combination of Emp_ID with the effective datetime
can be used as a primary key.

Employee Salary Information

Effective
Emp_I DateTime
Emp_Name Year Salary Expiration
(PK) DateTime
D (PK)
1649 01Jan2004 K Munch 2004 $42,000 31Dec2004
12:00PM 11:59PM
1649 01Jan2005 K Munch 2005 $45,000
12:00PM
19

Type 3 Slowly Changing Dimension

A Type 3 SCD is updated by moving the old value to a
column holding old values and adding the new value into
a column holding the most recent value.
Employee Salary Information
Emp_ID Old Old Salary
(PK) Emp_Name Year Salary Year
1649 K Munch 2005 $45,000 2004 $42,000

The Type 3 SCD is limited to retaining only a fixed

number of historical values.
Employee Salary Information
Emp_ID Old Old
(PK) Emp_Name Year Salary Year Salary
1649 K Munch 2007 $58,000 2006 $50,000
20
9-12 Chapter 9 Working with Slowly Changing Dimensions

Updating a Star Schema

SAS Data Integration Studio provides transformations to
implement type 1 and type 2 slowly changing dimensions.

SCD Type 2 loads a dimension table, with

Loader configurable type 2 primary key
generation and configurable
date/version/current record identification.
It also provides type 1 loading.
Lookup loads a fact table while using the
dimension tables as lookup tables for key
values.

Loading a Dimension Table

The SCD Type 2 transformation
1. loads new source records into a dimension table
2. uses the business key to detect new or historical
records.

1
2

22
9.1 Defining Slowly Changing Dimensions 9-13

Loading a Dimension Table

The SCD Type 2 transformation
3. generates surrogate key values or
4. generates retained key values.

3 4

Loading a Fact Table

The Lookup transformation
1. loads new source records into a fact table
2. uses the business key for lookups into the dimension
tables tables.

2 1

24
9-14 Chapter 9 Working with Slowly Changing Dimensions

Loading a Fact Table

The Lookup transformation
3. retrieves key values from the dimension tables and
adds them to the fact table.

25
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-15

9.2 Using the SCD Type 2 Loader and Lookup

Transformations

Objectives
List the functions of the SCD Type 2 transformation.
Define business keys.
Define surrogate and retained keys.
Detect and track changes.
List the functions of the Lookup transformation.

SCD Type 2 Loader Transformation

The SCD Type 2 Loader transformation implements type
2 and type 1 slowly changing dimensions.
It is used to update dimension tables. It performs the
following tasks:
detects changes

tracks changes

generates key values

Generated key values give the dimension table a primary

key that is not dependent on the business key.

29
9-16 Chapter 9 Working with Slowly Changing Dimensions

Business Key
The business key consists of one or more columns that
identify a business entity, like a customer, a product, or an
employee.
The Business Key tab is used to specify one or more
columns in a target dimension table that represent the
business key.

Change Detection
The business key is used as the basis for change
detection. The business keys in source rows are
compared to the business keys in the target.
The Detect Changes tab is used to specify one or more
columns in a dimension table that are monitored for
changes.

31
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-17

Change Detection
By default all columns are included in change detection,
except the columns that are specified in the Change
Tracking, Business Key, and Generated Key tabs.

Change Tracking
The SCD Type 2 Loader provides three methods for
tracking historical records:
Beginning and End Date (or datetime) values

Version number

Current record indicator

33
9-18 Chapter 9 Working with Slowly Changing Dimensions

Change Tracking
The Change Tracking tab is used to specify one or more
methods and associated columns in a target dimension
table to be used for tracking historical records.
Multiple methods can be selected.

Generated Key
The SCD Type 2 Loader generates values for a
Generated Key column in the target dimension table.
The Generated Key tab is used to specify a column for
the generated key values as well as a method for
generating the key values.

35
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-19

Generated Key
The generated key:
eliminates dependencies on the source data, as the
business key may be subject to redefinition, reuse, or
recycling
can be used as a primary key or as part of a
composite primary key
is generated at run time for each new row that is
added to the target

Generated Key
The generated key can be specified as a surrogate key or
a retained key.

Surrogate A numeric column in the target table that

Key is assigned a unique integer value for
each row in the table. It can function as a
primary key.
Retained A numeric column in the target table that
Key is assigned a unique integer value for
each business key value. It is combined
with a begin-date(time) column to make
up a primary key.

37
9-20 Chapter 9 Working with Slowly Changing Dimensions

Updating with the Surrogate Key Method

An incoming source record for the Customer Dimension
table has a changed Annual Income:
Cust ID Customer Name Annual Income
0001 Sam Cook $45,000

With the Surrogate Key method, a row is added as

follows:
Cust Cust Customer Annual Begin End
ID GenKey Name Income Date Date Current
0001 1 Sam Cook $40,000 01Jan05 31Dec05 0
0002 2 B Diddley $42,000 01Jan05 . 1
0003 3 John Doe $25,000 01Jan05 . 1
0001 4 Sam Cook $45,000 01Jan06 . 1

Cust Id is the business key in this example. The surrogate key method generates a new key value for
each added row.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-21

Updating with the Retained Key Method (1)

An incoming source record for the Customer Dimension
table has a changed Annual Income:
Cust ID Customer Name Annual Income
0001 Sam Cook $45,000

With the Retained Key method (new column), the

added row is as follows:
---Primary Key---

Cust Begin Customer Annual End Date

GenKey Date Name Income Current
1 01Jan05 Sam Cook $40,000 31Dec05 0
2 01Jan05 B Diddley $42,000 . 1
3 01Jan05 John Doe $25,000 . 1
1 01Jan06 Sam Cook $45,000 . 1
39 ...

Cust Id is the business key in this example.

The retained key method with new column generates a new retained key value if the added row
represents a new business key.
If the added row represents an existing business key:
the same retained key is assigned
the old row is closed out (end date assigned or current record turned off)
the new row is opened (begin date assigned or current record turned on)
9-22 Chapter 9 Working with Slowly Changing Dimensions

Updating with the Retained Key Method (2)

An incoming source record for the Customer Dimension
table has a changed Annual Income:
Cust ID Customer Name Annual Income
0001 Sam Cook $45,000

With the Retained Key method (existing column), the

added row is as follows (there is no generated key):
---PrimaryKey---

Cust Begin Customer Annual End Date

ID Date Name Income Current
0001 01Jan05 Sam Cook $40,000 31Dec05 0
0002 01Jan05 B Diddley $42,000 . 1
0003 01Jan05 John Doe $25,000 . 1
0001 01Jan06 Sam Cook $45,000 . 1
40 ...

Cust Id is the business key in this example.

The retained key method with existing column does not generate key values.
If the added row represents an existing business key:
the old row is closed out (end date assigned or current record turned off)
the new row is opened (begin date assigned or current record turned on)

Change Tracking: Retained Key Method

Retained keys consist of a generated integer column and
a begin date or datetime column. Change tracking by date
or datetime is required with a retained key.

Retained Key method Retained Key method

(new column) (existing column)
---Primary Key--- ---PrimaryKey---

Cust Begin Cust Begin

GenKey Date ID Date
1 01Jan05 0001 01Jan05
2 01Jan05 0002 01Jan05
3 01Jan05 0003 01Jan05
1 01Jan06 0001 01Jan06
41
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-23

Updating Dimension Tables

During the update of a dimensional target, incoming
source rows are compared to the current row for each
entry in the target tables.
Depending on data content, source rows can do the
following:
begin new entries
add new current rows to existing entries
close out existing entries

An entry consists of all the rows (the current row and the historical rows) for a business entity represented
by a business key value.
If a source row has a business key that does not exist in the target, then that row represents a new entry.
The new row is added to the target with appropriate change-tracking values.
If a source row has the same business key as a current row in the target, and a value in a column identified
as a Change Detection column differs, then that row represents an update to an existing entry. The source
row is added to the target. The source row becomes the new current row for that entry; it receives
appropriate change-tracking values. The superseded target row can also receive new change-tracking
values (closed out).
If a source row has the same business key and content as a current row in the target, it might indicate that
the entry is being closed out. The entry is closed out if change tracking is implemented with begin and
end datetime values, and if the end datetime value in the source is older than the same value in the target.
When this is the case, the new end date is written into the target to close out the entry.
If a source row has the same business key and the same content as a current row in the target, then that
source row is ignored.
9-24 Chapter 9 Working with Slowly Changing Dimensions

About Cross-Reference Tables

For efficiency, cross-reference tables are used during
updates of dimension tables or fact tables by the following
transformations:
SCD Type 2 Loader
Key Effective Date
Lookup
A cross-reference table contains only the current rows in
the lookup or target table. Columns in the cross-reference
table include the key columns and sometimes a digest
column.

The digest column contains a concatenated encryption of values from selected columns other than the key
columns. It is a character column with a length of 32 and named DIGEST_VALUE. The encrypted
concatenation uses the MD5 algorithm.

Lookup Transformation
The Lookup transformation can be used to load a target
table with columns taken from a source and from a
number of lookup tables.
When a job containing a Lookup transformation is run,
each source row is processed as follows:
The key columns in the source row are compared to
the key columns in the specified lookup tables.
If matches are found, specified lookup values and
source values are added to the target row in the
transformation temporary output table.
The temporary output table rows are then loaded into
the target.

44
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-25

Updating Fact Tables

An example of using the
Lookup transformation is to
load source data into a fact
table in a star schema. The
dimension tables are used
as lookup tables.

Lookups Tab
The Lookups tab in the Lookup transformation is used
to specify lookup properties:
Source to Lookup columns
Lookup to Target columns
Where expression
Exceptions

A set of lookup properties must be specified for each

Lookup table.

46
9-26 Chapter 9 Working with Slowly Changing Dimensions

Lookups: Source to Lookup Mapping

The column(s) specified in the Source to Lookup
Mapping tab are used to find the matching record in the
lookup (dimension) table for each source record.
Map the business key column(s) in the source table to
the corresponding column(s) in the lookup table.

Lookups: Lookup to Target Mapping

The column(s) specified on the Lookup to Target
Mapping tab are copied from the lookup (dimension)
table to the target (fact) table when a match is found.
Map the generated keys such as the surrogate or
retained key and possibly the begindate (time) in the
lookup table to the corresponding columns in the target
table.

48
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-27

Lookups: Selection Criteria

The Where tab specifies selection criteria that refine the
match between source row and the lookup table.

Lookups: Exception Handling

The Exceptions tab is used to specify exception actions.
Available Conditions Available Actions
Lookup table missing Abort
Lookup table contains no records Skip the record
Lookup value not found Add row to error table
Add row to exception table
Set target column value
Set target column value to
missing
None

50
9-28 Chapter 9 Working with Slowly Changing Dimensions

Error Tables
The Errors tab in the properties of the Lookup
transformation is used to specify:
Errors Table: includes a generated column (source
row number) and any column from the source data.
Exception Table: includes four generated columns
(exception information) and any column from the
source data.

Steps for Workshop

1. Define metadata for source
(external file)
2. Define metadata for targets
(2 dimensions,1 fact)
3. Create job flow diagram
4. Propagate Columns from
Sources to Targets
5. Specify properties of the
transformations (File Reader,
Column Splitter, Sort Dedup,
SCD Type 2, Lookup, Table
Loader)
6. Specify the Processing order

52
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-29

Populate Star Schema Tables Using the SCD Type 2 Loader

with the Surrogate Key Method

Login to SAS Data Integration Studio as Barbara

1. If necessary, access SAS Data Integration Studio using Barbaras credentials.

a. Select Start All Programs SAS SAS Data Integration Studio 4.2.

b. Select Barbaras Work Repository as the connection profile.

c. Click to close the Connection Profile window and access the Log On window.

d. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.

e. Click to close the Log On window.

9-30 Chapter 9 Working with Slowly Changing Dimensions

Define Metadata for the Source Table (External CSV File)

1. Register the Source Data Template data set. The Source Data Template data set is a SAS data set that
was prepared with 29 column definitions for use in this workshop. The data set has no rows, so it
stores no data. It serves only as a repository for column definitions.
a. Select the Folders tab.
b. Expand Data Mart Development Orion SCD.
c. Verify that the Orion SCD folder is selected.
d. Select File Register Tables. The Register Tables wizard starts.

e. Select SAS as the type of table and click . The Select a SAS Library window is
displayed.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-31

f. Select the DIFT SAS Library from the SAS Library drop-down list and click .

g. Select the Orion_Source_Data_Template table and click .

9-32 Chapter 9 Working with Slowly Changing Dimensions

h. Review the final information, and then click .

i. Rename Orion_Source_Data_Template. Right-click Orion_Source_Data_Template and

select Properties.

j. Type DIFT SCD Source Data Template in the Name field and click .
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-33

2. Define metadata for the source data table.

a. Select the Folders tab.
b. Expand Data Mart Development Orion SCD.
c. Verify that the Orion SCD folder is selected.
d. Select File New External File Delimited. The New Delimited External File wizard
starts.

e. Type DIFT SCD Source Data as the value for the Name field.

f. Verify that the location is set to /Data Mart Development/Orion SCD.

9-34 Chapter 9 Working with Slowly Changing Dimensions

g. Click . The External File Location window is displayed.

1) Click to open the Select a file window.

2) Navigate to S:\Workshop\dift\data.

3) Change the Type field to Delimited Files (*.csv).

4) Select OrionSourceDataM01.csv.

5) Click to close the Select a file window.

h. Verify that the correct file is selected

i. To view the contents of the file, click .

9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-35

The file has 7 records. Note that the first record has column names and that the data fields are
comma-delimited.

j. Click to close the Preview File window.

k. Click . The Delimiters and Parameters window is displayed.

1) Clear Blank and select Comma.

2) Click . The Advanced File Parameters window is displayed.

3) Set the Start record value to 2.

4) Click to close the Advanced File Parameters window.

l. Click . The Column Definitions window is displayed. Increase the size of the window
by dragging the corners.

m. Click to see the source data.

9-36 Chapter 9 Working with Slowly Changing Dimensions

n. A dialog window indicates that there are only 7 records. Click to close the Warning
dialog window.
o. Import column definitions from a template data set.

1) Click . The Import Column Definitions window opens.

2) Select Get the column definitions from other existing tables or external files.

3) Click . The Import Columns window opens.

4) Select the Folders tab.

5) Expand Data Mart Development Orion SCD.

6) Select the DIFT SCD Source Data Template table and click .
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-37

7) 29 columns from the DIFT SCD Source Data Template table are selected.

8) Click to close the Import Columns window.

9) Click to close the Import Column Definitions window.

p. Definitions for 29 columns are imported. These column definitions will be forward propagated
through the process flow to the target tables.
9-38 Chapter 9 Working with Slowly Changing Dimensions

q. Verify the data.

Select the Data tab in the lower pane and click . Scroll to verify the data.

r. Click .

s. Review the final information, and then click .

9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-39
9-40 Chapter 9 Working with Slowly Changing Dimensions

Define Metadata for the Target Tables

1. Define metadata for the Customer Dimension target table.

a. Select the Folders tab.
b. Expand Data Mart Development Orion SCD.
c. Verify that the Orion SCD folder is selected.
d. Select File New Table. The New Table wizard starts.

e. Type DIFT SCD Customer Dimension as the value for the Name field.

f. Verify that the location is set to /Data Mart Development/ Orion SCD.

g. Click .

h. Verify that the DBMS field is set to SAS.

9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-41

i. Define a new library.

1) Click next to the Library field. The New Library Wizard opens.

2) Type DIFT SCD Target Tables Library as the value for the Name field.

3) Verify that the location is set to /Data Mart Development/ Orion SCD.

4) Click .

5) Select SASApp in the Available servers pane.

6) Click to move SASApp to the Selected servers pane.

7) Click .
9-42 Chapter 9 Working with Slowly Changing Dimensions

8) Type diftscd as the value for the Libref field.

9) Define a new path for this library.

a) Click in the Path Specification area. A new path needs to be created

for the SCD tables.

b) Click next to Paths in the New Path Specification window.

c) In the Browse window, navigate to S:\Workshop\dift.

d) Click to create a new folder.

e) Type scdtarget as the new folder name and press ENTER.

9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-43

f) Single-click on the new folder, scdtarget, to select it.

g) Click to close the Browse window.

h) Click to close the New Path Specification window.

i) Verify that S:\Workshop\dift\scdtarget is now listed in the Selected items pane.

10) Click .

11) Verify the information is correct, and then click .

j. Change the default name of the new table. Type SCDCustDim in the Name field.
9-44 Chapter 9 Working with Slowly Changing Dimensions

k. Click . The Select Columns window is displayed.

l. Do not select columns at this time. You propagate columns from sources to the targets in a later
step. Click .

m. The Change Columns/Indexes window displays. No actions are needed. Click .A

dialog box displays.

n. Click .

o. Review the metadata listed in the New Table window.

p. Click .

2. Define metadata for the Product Dimension target table.

a. Select the Folders tab.
b. Expand Data Mart Development Orion SCD.
c. Verify that the Orion SCD folder is selected.
d. Select File New Table. The New Table wizard starts.
e. Type DIFT SCD Product Dimension as the value for the Name field.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-45

f. Verify that the location is set to /Data Mart Development/ Orion SCD.

g. Click .

h. Verify that the DBMS field is set to SAS.

i. Select DIFT SCD Target Tables Library as the value for the Library field.

j. Change the value in the Name field to SCDProdDim.

k. Click . The Select Columns window is displayed.

l. Do not select columns at this time. You propagate columns from the source to the targets in a later
step.

m. Click . The Change Columns/Indexes window displays.

n. Click . A dialog saying that no columns are specified displays.

o. Click .

p. Review the metadata listed in the New Table window.

q. Click .

3. Define metadata for the Order Fact target table.

a. Select the Folders tab.
b. Expand Data Mart Development Orion SCD.
c. Verify that the Orion SCD folder is selected.
d. Select File New Table.
e. Type DIFT SCD Order Fact as the value for the Name field.

f. Verify that the location is set to /Data Mart Development/ Orion SCD.

g. Click .

h. Verify that the DBMS field is set to SAS.

9-46 Chapter 9 Working with Slowly Changing Dimensions

i. Select DIFT SCD Target Tables Library as the value for the Library field.

j. Change the value in the Name field to SCDOrderFact.

k. Click . The Select Columns window is displayed.

l. Do not select columns at this time. You propagate columns from the source to the targets in a later
step.

m. Click . The Change Columns/Indexes window displays.

n. Click . A dialog saying that no columns are specified displays.

o. Click .

p. Review the metadata listed in the finish window.

q. Click .
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-47

Define Metadata for the Job

1. Orient the process flow from top to bottom and turn off automatic column propagation.

a. Select Tools Options.

b. Select the Job Editor tab.

c. Click Top To Bottom in the Layout area.

2. Turn off automatic column mapping and automatic column propagation.

a. Locate the Automatic Settings area.

b. Deselect Automatically map columns.

c. Deselect Automatically propagate columns.

d. Click to close the Options window.

3. Create the initial job metadata.

a. Select the Folders tab.

b. Expand Data Mart Development Orion SCD.

c. Verify that the Orion SCD folder is selected.

d. Select File New Job. The New Job window opens.

9-48 Chapter 9 Working with Slowly Changing Dimensions

e. Type Populate Star Schema with SCD-RK Processing as the value for the Name
field.

f. Verify that the Location is set to /Data Mart Development/Orion SCD.

g. Click . The Job Editor window opens.

4. Add the source table to the diagram for the process flow.

a. Select the Checkouts tab.

b. Drag the DIFT SCD Source Data external file to the Diagram tab of the Job Editor.
5. Add a File Reader transformation to the process flow.

a. Select the Transformations tab.

b. Expand the Access group and locate the File Reader transformation template.

c. Drag the File Reader transformation to the Diagram tab of the Job Editor. Place it under the
DIFT SCD Orion Detail Information external file object.

d. Connect the DIFT SCD Orion Detail Information external file object to the File Reader
transformation.

6. Add a Splitter transformation to the process flow.

The Splitter transformation is used to split the columns from the source. Customer information
columns go to the Customer dimension table, product information columns go to the Product
dimension table, and fact columns go to the Fact table.

a. Select the Transformations tab.

b. Expand the Data group and locate the Splitter transformation template.

c. Drag the Splitter transformation to the Diagram tab of the Job Editor. Place it under the File
Reader transformation.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-49

d. Connect the File Reader transformation to the Splitter transformation.

7. Add two Sort transformations to the process flow.

Each source data file in this demonstration represents a monthly extract from a transactional database.
All transactions in a monthly extract must be added to the Fact table. But only unique Customer
records and unique Product records must be added to the Dimension tables. The Sort transformation
with the Nodupkey option will be used to remove duplicates.

a. Select the Transformations tab

b. Expand the Data group and locate the Sort transformation template.

c. Drag the Sort transformation to the Diagram tab of the Job Editor.

d. Connect one of the temporary table outputs from the Splitter to one of the Sort transformations.

e. Drag a second Sort transformation to the Diagram tab of the Job Editor.

f. Connect the second temporary table output from the Splitter to the other Sort transformation.
9-50 Chapter 9 Working with Slowly Changing Dimensions

8. Select File Save to save diagram and job metadata to this point.
9. Add two SCD Type 2 Loader transformations to the process flow.

a. Select the Transformations tab.

b. Expand the Data group locate the SCD Type 2 Loader transformation template.

c. Drag the SCD Type 2 Loader transformation to the Diagram tab of the Job Editor.

d. Connect the temporary table output from one Sort transformation to one of the SCD Type 2
Loader transformations.

e. Drag a second SCD Type 2 Loader transformation to the Diagram tab of the Job Editor.

f. Connect the temporary table output from the second Sort transformation to the other SCD Type 2
Loader transformation.

10. Add the two target dimension tables.

a. Select the Checkouts tab.

b. Drag the DIFT SCD Customer Dimension table object to the Diagram tab of the Job Editor,
placing it under one of the SCD Type 2 Loader transformations.

c. Connect the SCD Type 2 Loader transformation to the DIFT SCD Customer Dimension table
object.

d. Drag the DIFT SCD Product Dimension table object (from the Checkouts tab) to the Diagram
tab of the Job Editor, placing it under one of the SCD Type 2 Loader transformations.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-51

e. Connect the SCD Type 2 Loader transformation to the DIFT SCD Product Dimension table
object.

11. Select File Save to save diagram and job metadata to this point.
12. Add a third output table to the Splitter by right-clicking on the Splitter transformation and selecting
Add Work Table.

13. Add the Lookup transformation to the process flow.

a. Select the Transformations tab.

b. Expand the Data group and locate the Lookup transformation template.

c. Drag the Lookup transformation to the Diagram tab of the Job Editor.

d. Connect the third temporary output table from the Splitter transformation to the Lookup
transformation.

This must be the first input to the Lookup transformation.

e. Next connect the DIFT SCD Customer Dimension table object to the Lookup transformation.
9-52 Chapter 9 Working with Slowly Changing Dimensions

f. Add a third input port to the Lookup transformation by right-clicking on the Lookup
transformation and selecting Ports Add Input Port.

g. Connect the DIFT SCD Product Dimension table object to the Lookup transformation.

14. Add a Table Loader transformation to the process flow.

a. Select the Transformations tab.

b. Expand the Access group and locate the Table Loader transformation template.

c. Drag the Table Loader transformation to the Diagram tab of the Job Editor.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-53

d. Connect the Lookup transformation to the Table Loader transformation.

15. Add the DIFT SCD Order Fact table as the final output for this process flow.

a. Select the Checkouts tab.

b. Drag the DIFT SCD Order Fact table object to the Diagram tab of the Job Editor, placing it
under the Table Loader transformations.
9-54 Chapter 9 Working with Slowly Changing Dimensions

c. Connect the Table Loader transformation to the DIFT SCD Order Fact table object.

16. Select File Save to save diagram and job metadata.

9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-55

Propagate Columns from Sources to Targets

1. There are currently no columns defined in the temporary output tables or the target tables. You
manually propagate columns forward starting at the source table.
2. Define columns for the temporary output table from the File Reader.
a. Right-click on the File Reader and select Properties.
b. Select the Mappings tab. The 29 columns in the DIFT SCD Source Data table are listed in
the Source table pane on the left. Propagate all 29 columns to the Target table pane.

Click to propagate all columns from sources to targets.

All 29 columns are propagated forward from the DIFT SCD Source Data table to the
temporary output table of the File Reader transformation.

c. Click to close the File Reader properties window.

3. Rename the temporary output tables from the Splitter transformation.

a. Right-click on the first temporary output table (leading to the DIFT SCD Customer
Dimension table) from the Splitter and select Properties.

b. Select the Physical Storage tab.

9-56 Chapter 9 Working with Slowly Changing Dimensions

c. Enter TempForCustDim as the value for the Name field.

d. Click to close the Splitter 0 Properties window.

e. Right-click on the second temporary output table (leading to the DIFT SCD Product
Dimension table) from the Splitter and select Properties.

f. Select the Physical Storage tab.

g. Enter TempForProdDim as the value for the Name field.

Click to close the Splitter 1 Properties window.

h. Right-click on the first temporary output table (leading to the Lookup transformation and the
DIFT SCD Order Fact table) from the Splitter and select Properties.
i. Select the Physical Storage tab.

j. Enter TempForLookup as the value for the Name field.

Click to close the Splitter 2 Properties window.

4. Define columns for the temporary output tables from the Splitter.
a. Right-click on the Splitter and select Properties.
b. Select the General tab.

c. Type Column Splitter in the Name field.

9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-57

d. This is a reminder that the splitter is used to direct different columns from the data source to the
three target tables.
e. Select the Row Selection tab.
f. Verify that All Rows are selected for each of the three output tables.

g. Select the Mappings tab. The 29 columns in the temporary output table from the File Reader are
listed in the Source table pane on the left. Propagate only the necessary columns to each output
table.
h. Select the Splitter 0 (TempForCustDim) table from the Target table drop-down list.

i. Select columns 1 through 11 in the Source table pane. Use the Shift key to make the selection.
The selected columns should include only:
Customer_ID
Customer_Country
Customer_Gender
Customer_Name
Customer_FirstName
Customer_LastName
Customer_Birth_Date
Customer_Type
Customer_Group
Customer_Age
Customer_Age_Group
9-58 Chapter 9 Working with Slowly Changing Dimensions

j. Click and select Selected Source Columns To Targets.

11 columns are propagated forward and mapped.

k. Select the Splitter 1 (TempForProdDim) table from the Target table drop-down list.

l. Select columns 12 through 19 in the Source table pane. Use the Shift key to make the selection.
The following 8 columns should be selected.
Product_ID
Product_Name
Supplier_ID
Supplier_Name
Supplier_Country
Product_Group
Product_Category
Product_Line
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-59

m. Click and select Selected Source Columns To Targets.

Now you have 8 columns propagated forward and mapped.

n. Select the Splitter 2 (TempForLookup) table from the Target table drop-down list.

o. Select columns 20 through 29 in the Source table pane. Use the Shift key to make the selection.
The 9 selected columns are these:
Order_ID
Order_Item_Num
Quantity
Total_Retail_Price
CostPrice_Per_Unit
Discount
Order_Type
Employee_ID
Order_Date
Delivery_Date

p. Click and select Selected Source Columns To Targets.

q. Select Customer_Id (column 1) and while holding the CTRL key, select Product_Id (column 12).
The two selected columns are these:
Customer_ID
Product_ID
9-60 Chapter 9 Working with Slowly Changing Dimensions

r. Click and select Selected Source Columns To Targets.

This gives you 12 columns propagated forward and mapped.

s. Click to close the Splitter Properties window.

5. Define columns for the temporary output tables from the Sort transformations.
a. Right-click on the first Sort transformation and select Properties.
b. Select the General tab.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-61

c. Type Sort Dedup in the Name field.

This is a reminder that the sort transformation will be used to remove records with duplicate
Customer Id values.

d. Select the Mappings tab.

e. Click to propagate and map all 11 columns from sources to targets.

All 11 columns are propagated forward and mapped.

f. Click to close the Sort properties window

g. Right-click on the second Sort transformation and select Properties.

h. Select the General tab.
9-62 Chapter 9 Working with Slowly Changing Dimensions

i. Type Sort Dedup in the Name field.

This is a reminder that this sort transformation will be used to remove records with duplicate
Customer Id values.

j. Select the Mappings tab.

k. Click to propagate and map all 8 columns from sources to targets.

All 8 columns are propagated forward and mapped.

l. Click to close the Sort properties window.

6. Propagate columns to the Dimension target tables.

a. Right-click on the first SCD Type 2 transformation and select Properties.
b. Select the Mappings tab.

c. Click to propagate and map all 11 columns from sources to targets.

d. Click to close the SCD Type 2 Properties window.

e. Right-click on the DIFT SCD Customer Dimension table and select Properties.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-63

f. Select the Columns tab and verify that it has the 11 propagated columns.
g. Add 4 new columns.
h. Select the Customer_ID column.

1) Click 4 times to insert 4 new columns.

2) Update the metadata for the new columns as follows:

i. Click to close the DIFT SCD Customer Dimension Properties window.

j. Right-click on the second SCD Type 2 transformation and select Properties.

k. Select the Mappings tab.

l. Click to propagate and map all 8 columns from sources to targets.

m. Click to close the SCD Type 2 properties window.

n. Right-click on the DIFT SCD Product Dimension table and select Properties.
o. Select the Columns tab and verify that it has the 8 propagated columns.
7. Add metadata for 4 new columns.
a. Select the Product_ID column.

b. Click 4 times to insert 4 new columns.

c. Update the metadata for the new columns as follows:

d. Click to close the DIFT SCD Product Dimension Properties window.

8. Define columns for the temporary output table from the Lookup transformation.
a. Right-click on the Lookup transformation and select Properties.
b. Select the Mappings tab.
9-64 Chapter 9 Working with Slowly Changing Dimensions

c. Click to propagate and map all 12 columns from sources to targets.

d. Add metadata for 4 new columns in the Target table pane.

1) Select the Customer_ID column in the Target table pane.

2) Click . The Import Columns window opens.

3) Select the Folders tab.

4) Expand Data Mart Development ORION SCD DIFT SCD Customer Dimension.

5) Select the GenKeyCust and BeginDateTimeCust columns and click .

6) Click to close the Import Columns window.

7) Select the Product_ID column in the Target table pane.

8) Click . The Import Columns window opens.

9) Select the Folders tab.

10) Expand Data Mart Development Orion SCD DIFT SCD Product Dimension.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-65

11) Select the GenKeyProd and BeginDateTimeProd columns and click .

16 columns are defined for the temporary output table from the Lookup transformation.

e. Click to close the Lookup Properties window.

9. Propagate columns to the DIFT SCD Order Fact target table.

a. Right-click on the Table Loader transformation and select Properties.

b. Select the Mappings tab.

c. Click to propagate all 16 columns from sources to targets.

d. Click to close the Table Loader properties window.

e. Right-click on the DIFT SCD OrderFact table and select Properties.

9-66 Chapter 9 Working with Slowly Changing Dimensions

f. Select the Columns tab and verify that it has the 16 propagated columns.

g. Click to close the DIFT SCD OrderFact Properties window.

9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-67

Update the Properties of the Transformations

1. Accept the properties of the File Reader.

2. Accept the properties of the Splitter transformation.
3. Update the properties of the Sort transformations.

a. Right-click on the first Sort Dedup transformation and select Properties.

b. Select the Sort By Columns tab.

c. Select Customer_ID from the Available columns and click .

Select the Options tab.

d. Select SAS Sort * in the left pane.

e. In the Remove duplicate records area, select Remove rows with duplicate keys
(NODUPKEY).

f. Click to close the Sort Dedup Properties window

g. Right-click on the second Sort Dedup transformation and select Properties.

h. Select the Sort By Columns tab.

9-68 Chapter 9 Working with Slowly Changing Dimensions

i. Select Product_ID from the Available columns and click .

j. Select the Options tab.

k. Select SAS Sort * in the left pane.

l. In the Remove duplicate records area select Remove rows with duplicate keys (NODUPKEY).

m. Click to close the Sort Dedup Properties window.

4. Update the properties of the first SCD Type 2 Loader transformation that will populate DIFT SCD
Customer Dimension. Apply the Retained Key method.

a. Right-click on the SCD Type 2 transformation (for DIFT SCD Customer Dimension
table) and select Properties.

b. Click the Change Tracking tab. Use beginning and end dates to track changes for each
Customer ID and use a current record indicator to keep track of the current record for each
Customer ID.

1) Check Use beginning and end dates.

2) Select BeginDateTimeCust as the Column Name for Beginning Date.

3) Select EndDateTimeCust as the Column Name for the End Date.

4) Check Use current indicator.

9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-69

5) Select CurrRecCust as the Current indicator column.

SAS Data Integration Studio provides default expressions for the Change Tracking
columns. The DATETIME function is used to generate the beginning datetime value. A
datetime constant with a future date is used to specify an open ended value for the ending
datetime. You can click in the Expression field to specify a custom expression.

Use Version Number and Use Current Indicator are provided as alternative methods to
track the current record.

c. Click the Business Key tab. Specify Customer_ID as the business key in the Customer
Dimension.

1) Click . The DIFT SCD Customer Dimension Columns window is displayed.

2) Select Customer_ID.
9-70 Chapter 9 Working with Slowly Changing Dimensions

3) Click to close the DIFT SCD Customer Dimension Columns window.

d. Click the Generated Key tab. Specify GenKeyCust as the retained key column.
1) Select GenKeyCust as the column to contain the generated key values.
2) Check Generate retained key to implement the retained key method.

A default expression is provided to generate retained key values in the New record
field. To specify a custom expression, click .
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-71

A macro variable NewMaxKey is automatically updated with the current maximum

key value. To specify a custom method for obtaining the current maximum key value,
click Define Max Key.

To implement the surrogate key method, uncheck Generate retained key. The
surrogate key method provides default expressions for new and changed records.

e. Click the Detect Changes tab. Specify Customer_Name as the column on which changes are
based.

1) Select Customer_Name and move it form Available columns to Selected

columns.

If no columns are selected on the Detect Changes tab, then all columns are used to
detect changes, except those used for Change Tracking, Business Key, Generated
Key, and Type 1 columns.
9-72 Chapter 9 Working with Slowly Changing Dimensions

f. Click the Mappings tab.

1) Verify that all target columns are mapped except GenKeyCust, BeginDateTimeCust,
EndDateTimeCust, and CurrRecCust; these columns are populated by the
transformation.

g. Click the Precode and Postcode tab.

h. Check Postcode.

i. Enter the following PROC SORT step.

j. Click to close the SCD Type 2 Properties window.

9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-73

5. Update the properties for the second SCD Type 2 Loader transformation that will populate DIFT
SCD Product Dimension.

a. Right-click on the SCD Type 2 Loader transformation (for DIFT SCD Product
Dimension table) and select Properties.

b. Click the Change Tracking tab.

1) Verify that Use beginning and end dates is checked.

2) Select BeginDateTimeProd as the Column Name for Beginning Date.

3) Select EndDateTimeProd as the Column Name for the End Date.

4) Check Use current indicator.

5) Select CurrRecProd as the Current indicator column.

c. Click the Business Key tab.

1) Click . The DIFT SCD Product Dimension Columns window is displayed.

2) Select Product_ID.
9-74 Chapter 9 Working with Slowly Changing Dimensions

3) Click to close the DIFT SCD Product Dimension Columns window.

d. Click the Generated Key tab.

1) Select GenKeyProd as the column to contain the generated key values.
2) Check Generate retained key to implement the retained key method.
3) Accept the default expression to generate key values in the New record field.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-75

e. Click the Detect Changes tab.

1) Select Product_Name and move it form Available columns to Selected

columns.

f. Click the Mappings tab.

1) Verify that all target columns are mapped except GenKeyProd, BeginDateTimeProd,
EndDateTimeProd, and CurrRecProd; these columns are populated by the
transformation.

g. Click the Precode and Postcode tab.

9-76 Chapter 9 Working with Slowly Changing Dimensions

h. Check Postcode.

i. Enter the following PROC SORT step:

j. Click to close the SCD Type 2 Properties window.

6. Select File Save to save diagram and job metadata to this point.
7. Specify properties for the Lookup transformation that will populate DIFT SCD Order Fact.

a. Right-click on the Lookup transformation and select Properties.

b. Select the Lookups tab. Specify lookup mappings to the DIFT Customer Dimension table.

1) Select the row for DIFT SCD Customer Dimension and click Lookup Properties.

2) In the Lookup Properties - DIFT SCD Customer Dimension window, select the Source to
Lookup Mapping tab.
3) Click on the Customer_ID column in the Source table pane to select it and click on the
Customer_ID column in the Lookup table pane.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-77

4) Click to map the selected columns.

The Lookup transformation uses the Customer_Id value in an incoming transaction to do a

lookup into the Customer Dimension table to find a matching customer id value.

5) In the Lookup Properties - DIFT SCD Customer Dimension window, select the Lookup to
Target Mapping tab.
6) Click on the GenKeyCust column in the Lookup table pane to select it and click on the
GenKeyCust column in Target table the pane.
7) Click to map the selected columns.

8) Click on the BeginDateTimeCust column in the Lookup table pane to select it, and click on
the BeginDateTimeCust column in Target table the pane.
9-78 Chapter 9 Working with Slowly Changing Dimensions

9) Click to map the selected columns.

The Lookup will retrieve the GenKeyCust and BeginDateTimeCust values from the
Customer Dimension table and assign them to the GenKeyCust and
BeginDateTimeCust columns in the target table. This links the transaction in the target
table to the current record for the customer ID in the Customer Dimension table.

10) In the Lookup Properties - DIFT SCD Customer Dimension window, select the Where tab.
11) Click the Data Sources tab.
12) Expand the SCDCustDim table.

13) Select the CurrRecCust column and click to insert the column reference in the
Expression Text pane.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-79

14) Add an = 1 to complete the expression.

15) In the Lookup Properties - DIFT SCD Customer Dimension window, select the Exceptions
tab.

Note that several exceptions are defined by default.

Two exceptions specify that the process will be aborted:

1. if the lookup table (Customer Dimension) is missing
2. if the lookup table does not have any records
9-80 Chapter 9 Working with Slowly Changing Dimensions

Three exceptions take effect when a lookup value is not found:

1. a row is added to an error table
2. a row is added to an exception table
3. the target column will be assigned a missing value

16) Click to close the Lookup Properties - DIFT SCD Customer Dimension
window.

17) Click to close the Warning.

c. Specify lookup mappings to the DIFT Product Dimension table.

1) In the Lookup Properties window, select the row for DIFT SCD Product Dimension and
click Lookup Properties.

2) In the Lookup Properties - DIFT SCD Product Dimension window, select the Source to
Lookup Mapping tab.
3) Click on the Product_ID column in the Source table pane to select it, and click on the
Product_ID column in the Lookup table pane.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-81

4) Click to map the selected columns.

The Lookup transformation will use the Product_Id value in an incoming transaction to
do a lookup into the Product Dimension table to find a matching product id value.
5) In the Lookup Properties - DIFT SCD Product Dimension window, select the Lookup to
Target Mapping tab.
6) Click on the GenKeyProd column in the Lookup table pane to select it, and click on the
GenKeyProd column in Target table the pane.
7) Click to map the selected columns.

8) Click on the BeginDateTimeProd column in the Lookup table pane to select it, and click on
the BeginDateTimeProd column in Target table the pane.
9-82 Chapter 9 Working with Slowly Changing Dimensions

9) Click to map the selected columns.

The lookup will retrieve the GenKeyProd and BeginDateTimeProd values from the
Customer Dimension table and assign them to the GenKeyProd and
BeginDateTimeProd columns in the target table. This links the transaction in the target
table to the current record for the customer id in the Product Dimension table.

10) In the Lookup Properties - DIFT SCD Product Dimension window, select the Where tab.
11) Click the Data Sources tab.
12) Expand the SCDProdDim table.

13) Select the CurrRecProd column and click to insert the column reference in the
Expression Text pane.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-83

14) Add an = 1 to complete the expression.

15) In the Lookup Properties - DIFT SCD Customer Dimension window, select the Exceptions
tab.

Accept the default exceptions.

The process is aborted if the lookup table (Customer Dimension) is missing or if it does
not have any records.
If a lookup value is not found a row is added to an error tables as well as to an exception
table. Also a lookup value is not found the target column will be assigned a missing value.
9-84 Chapter 9 Working with Slowly Changing Dimensions

16) Click to close the Lookup Properties - DIFT SCD Customer Dimension
window.

17) Click to close the Warning.

d. In the Lookup Properties window, select the Errors tab.

1) Check Create error table and click .

9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-85

2) All source columns and a generated column are selected by default. Remove all columns
except Source Row Number, Order_ID, Customer_ID, and Product_ID from the
Selected columns pane.

3) Click to close the Choose Error Table Columns window

4) Check Create exception table and click .

9-86 Chapter 9 Working with Slowly Changing Dimensions

5) Four generated columns and no source columns are selected by default. Accept the default
column selection.

6) Click to close the Choose Exception Table Columns window

9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-87

e. Select the mappings tab and review the mappings.

These mappings for the Lookup transformation were established in an earlier step.

f. Click to close the Lookup Properties window.

8. Select File Save to save diagram and job metadata to this point.
9. Specify properties for the Table Loader transformation that will populate DIFT SCD Order
Fact.

a. Right-click on the Table Loader transformation and select Properties.

b. Select the Load Technique tab.

9-88 Chapter 9 Working with Slowly Changing Dimensions

c. Select Append to Existing as the Load Style.

d. Click to close the Table Loader Properties window.

10. Select File Save to save diagram and job metadata to this point.

Update the Processing Order, Run the Job, and view the Results

1. If necessary, select View Detail to open the Details panel. It opens below the Diagram Editor.
2. In the Details panel, select the Control Flow tab.

a. Use the and arrows to arrange the transformations in the following order:
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-89

b. Verify the order of processing on the Diagram tab in the Job Editor window.

The number in the upper-left corner of the transformation indicates the order of processing.
3. Run the job.
a. Select the Status tab in the Details pane to monitor the execution of the job.
b. Click to run the job.

c. Verify that all the steps completed successfully.

d. View the Log for the executed Job.
9-90 Chapter 9 Working with Slowly Changing Dimensions

4. Enable row count.

a. Select Tools Options to open the Options window.

b. Select the General tab

c. Check Enable row count on basic properties for tables.

d. Select Ok to close the options window.

5. View the data.
a. Select the Diagram tab in the Job Editor window.
b. Right-click on the DIFT SCD Order Fact table and select Open. The data is displayed in the
View Table window. Scroll to see the right-most columns:

c. Close the View Data window.

6. Run the job for month 2.
a. On the Diagram tab in the Job Editor window, right-click on the DIFT SCD Source Data and
select Properties.
b. Select the File Location tab.

c. Change the value of File Name to

S:\workshop\dift\data\OrionSourceDataM02.csv.

d. Click to close the DIFT SCD Source Data Properties window.

e. Select the Status tab in the Details pane to monitor the execution of the job.
f. Click to run the job.

g. Verify that all the steps completed successfully.

h. View the Log for the executed Job.
i. View the data.
7. Select the Diagram tab in the Job Editor window.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-91

a. Right-click on the DIFT SCD Order Fact table and select Open. The data is displayed in the
View Table window. Scroll to see the right-most columns:

Look for following updates to data:

Customer_Name changes: Ines Deisser changed to Ines Muller
(Customer_ID = 65)
b. Close the View Data window.
8. Run the job for month 3.
a. On the Diagram tab in the Job Editor window, right-click on the DIFT SCD Source Data and
select Properties.
b. Select the File Location tab.

c. Change the value of File Name to

S:\workshop\dift\data\OrionSourceDataM03.csv.

d. Click to close the DIFT SCD Source Data Properties window.

e. Select the Status tab in the Details pane to monitor the execution of the job.
f. Click to run the job.

g. Verify that all the steps completed successfully.

h. View the Log for the executed Job.
9. View the data.
a. Select the Diagram tab in the Job Editor window.
9-92 Chapter 9 Working with Slowly Changing Dimensions

b. Right-click on the DIFT SCD Order Fact table and select Open. The data is displayed in the
View Table window. Scroll to see the right- most columns:

Look for following updates to data:

Customer_Name changes: Ines Muller changed to Ines Deisser
(Customer_ID = 65)
c. Close the View Data window.
10. Save and close the job.
9.3 Introducing the Change Data Capture Transformations 9-93

9.3 Introducing the Change Data Capture Transformations

Objectives
Define Change data capture.
List the types of CDC transformations.
List functions of the CDC transformations.

Change Data Capture Defined

Change data capture (CDC) is a process that shortens
the time required to load data from a relational database.
The process is efficient because the source is a changed
data table, rather than the entire base table.
The CDC transformations in
SAS Data Integration Studio
can be used to load dimension
tables in a star schema, as
part of an implementation of
slowly changing dimensions.

57
9-94 Chapter 9 Working with Slowly Changing Dimensions

CDC Transformations
SAS Data Integration Studio provides four CDC
transformations:
Q Attunity CDC

Q DB2 CDC

Q Oracle CDC

Q General CDC.

The Attunity, DB2, and Oracle transformations work

directly with changed data tables that are in native
database format. The General transformation can be used
to load change data from other vendors and custom
applications.

The separately licensed Attunity software enables you to generate source change tables from a variety of
relational databases running in a variety of operational environments.

CDC Transformation Requirements

The database must
generate the change data table. The rows in the
change data table include the data involved in the
changes

The CDC transformation

uses the change data table as a source table

can optionally use a control table to prevent

reprocessing of change data that was loaded in an
earlier run. The control table is not needed if
processed rows are discarded from the change data
table.

59
9.3 Introducing the Change Data Capture Transformations 9-95

Prerequisites for Change Data Capture

The CDC transformations require the following software:

Attunity CDC Attunity is a data integration product in

which the Attunity Stream software
enables connectivity between databases
and across operating environments. The
Attunity CDC transformation has been
validated on Attunity AIS 5.0 with Attunity
Stream. To use the Attunity software you
need to license SAS/ACCESS Interface
to ODBC.

60 continued...

Prerequisites (continued)
Oracle CDC The Oracle CDC transformation has
been validated on Oracle 10G with
asynchronous CDC. The transformation
requires that you license SAS/ACCESS
to Oracle.
DB2 CDC The DB2 CDC transformation has been
validated on DB2/UDB, release 8.1,
fixpak 3. The transformation requires that
you license SAS/ACCESS to DB2.
General CDC The General CDC transformation has no
prerequisites.

61
9-96 Chapter 9 Working with Slowly Changing Dimensions

Using CDC Transformations

The CDC transformation
implements change data capture

reads data from a source change table that contains a

record of inserts, updates, and deletes since a
specified date and time
can optionally use a control table to direct the loading
process
can generate end date/time values in targets that are
used to implement type 2 slowly changing dimensions.
The end date/time values are used to close-out target
records in response to delete transactions in the
source change table.

CDC Control Tables

A control table can be used to prevent the update of
target rows that were processed in an earlier run. In a job
with a control table, the CDC transformation first finds in
the source the most recent insert, update, or delete action
for a given unique identifier (business key). The most
recent source row is then compared to the prior actions
that appear in the control table. If the unique identifiers
match, and if the rest of the rows are identical, then the
source row is a duplicate and it is not added to the target.

63
9.3 Introducing the Change Data Capture Transformations 9-97

CDC Control Tables Additional Points

Some additional points to note about the CDC control
tables:
Q Control tables are optional, so you need to use one
only if the source changed data table contains
information that was already loaded into the target.
Q The control table can be in SAS format or in native
database format.
Q Column definitions in the control table are similar to
those that are required in the source changed data
tables.
Q You can use the New Table Wizard to create control
tables.

CDC Control Table Structure

In control tables, the names and order of the following
columns can vary, because you identify those columns in
the properties window of the CDC transformation:
Application identifies the application that compares the
Name source change data records to the records in
the target to test for previous updates. A typical
value for this column is SAS Data Integration
Studio. The column type is character and the
length is 64.
Table Name identifies the source changed data table. The
column type is character and the length is 64.

65 continued...
9-98 Chapter 9 Working with Slowly Changing Dimensions

Context provides the unique identifiers in the target that

are not to be overwritten. The context is a
character value with length of 32 for DB2,
Attunity, and General. Oracle context is
numeric with a length of 8.
Rows records the number of source changed data
Processed records that were processed the last time that
the job was run. This value is updated at the
end of the job run, as an output from the CDC
transformation. The type of this column is
numeric and the length is 8.
Timestamp identifies the time and date when the job was
run, in DATETIME16.6 format. The type of this
column is numeric and the length is 8.
66
Chapter 10 Defining Generated
Transformations

10.1 SAS Code Transformations ......................................................................................... 10-3

10.2 Using the New Transformation Wizard ....................................................................... 10-8

Demonstration: Creating a Generated Transformation ...................................................... 10-11

Exercises ............................................................................................................................ 10-35

10.3 Solutions to Exercises ............................................................................................... 10-40

10-2 Chapter 10 Defining Generated Transformations
10.1 SAS Code Transformations 10-3

10.1 SAS Code Transformations

Objectives
Define SAS code transformation templates.
Explain the prompting framework.
Describe the different types of prompts that make
up the prompting framework.

Transformation Templates
The Process Library tree contains two kinds
of transformation templates:
Java Plug-In Created with the Java programming
Transformation language
Templates
SAS Code Created with the Transformation
Transformation Generator wizard
Templates

4
10-4 Chapter 10 Defining Generated Transformations

SAS Code Transformations

Right-clicking on a SAS code transformation in the
Transformations tree yields a menu with several unique
options: Menu for
Properties SAS Code Transformation

Analyze

Export

Menu for
Java Plug-In Transformation

SAS Code Transformations

SAS code transformations in the
Transformations tree can be identified
by the icon associated with the
transformation:

6
10.1 SAS Code Transformations 10-5

SAS Code Transformation Templates

One of the easiest ways to customize SAS Data
Integration Studio is to write your own SAS code
transformation templates.
SAS code transformation templates use the
SAS language to accomplish the following:
extract data

transform data

create reports

load data into data stores

Unlike Java-based plug-ins that require software

development, SAS code transformation templates
are created with a wizard.

New Transformation Wizard

The New Transformation wizard has the following
characteristics:
guides you through the steps of specifying SAS code
for a transformation template and saving the template
to the current metadata repository
is used to create custom SAS code transformation
templates
enables you to enter the SAS code that runs when
the template is executed as part of a job
After the template is saved, it is displayed in the
Transformations tree, where the template is available
for use in any job.

8
10-6 Chapter 10 Defining Generated Transformations

SAS Code and Macro Variables

The SAS code for a transformation template typically
includes macro variables.

options &options;
title "&title";
proc gchart data=&syslast;
vbar &classvar1 /
sumvar=&analysisvar
group=&classvar2;
run;quit;

Macro Variable Values

When the transformation template is used in a job, the
person specifying the properties of the transformation
must specify the values of the macro variables.

&classvar1

&classvar2

&analysisvar

&title
10
10.1 SAS Code Transformations 10-7

Generated SAS Code

SAS Data Integration Studio generates a corresponding
%LET statement(s) that creates the macro variables and
assigns the values.

%let syslast=yy.xx;
%let options=;
%let classvar1=Customer_Age_Group;
%let classvar2=Customer_Gender;
%let analysisvar=Quantity;
%let title=Sum of Quantity
across Gender and Age Group;

Options Window
The New Transformation wizards Options window is the
facility for creating the options to be used for the new
transformation.

12
10-8 Chapter 10 Defining Generated Transformations

10.2 Using the New Transformation Wizard

Objectives
Create a custom transformation.

New Transformation General Settings

General settings such as name and description are
specified in the initial window of the New Transformation
wizard.

16
10.2 Using the New Transformation Wizard 10-9

New Transformation SAS Code

The SAS Code window allows for direct entry of SAS
code. Typically, the SAS code is developed elsewhere,
copied, and
then pasted
in to this
window.

New Transformation Inputs/Outputs

The Inputs and Outputs area of this window will determine
if the new transformation is to accept inputs (and up to
how many).
Similar choices
are made for
outputs.

18
10-10 Chapter 10 Defining Generated Transformations

New Transformation - Finish

The review window of the wizard is used for checking and
verifying various settings for the new transformation.
Finishing this
window creates
the new
transformation.

19
10.2 Using the New Transformation Wizard 10-11

Creating a Generated Transformation

This demonstration creates a report on customer order information. The HTML report must have a text-
based output with summary statistics as well as a bar chart graphic. A transformation with this type of
output does not currently exist. Hence, a new SAS code transformation is created and then used in a job.
1. If necessary, access SAS Data Integration Studio using Barbaras credentials.

a. Select Start All Programs SAS SAS Data Integration Studio 4.2.

b. Select Barbaras Work Repository as the connection profile.

c. Click to close the Connection Profile window and access the Log On window.

d. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.

e. Click to close the Log On window.

2. Specify metadata for the new transformation.

a. Select the Folders tab.

b. Expand Data Mart Development Reports.

c. Verify that the Reports folder is selected.

d. Select File New Transformation.

e. Type Summary Table and Vertical Bar Chart as the value for the Name field.

f. Verify that the location is set to /Data Mart Development/Reports.

g. Type User Defined as the value for the Transformation Category field.
10-12 Chapter 10 Defining Generated Transformations

The General information for the new transformation should resemble the following:

h. Click .

i. Add the necessary SAS code.

1) Access Windows Explorer.
2) Navigate to S:\Workshop\dift\.
3) Right-click on TabulateGraph.sas and select Open With Notepad.
4) Right-click in the background of the Notepad window and select Select All.
5) Right-click in the background of the Notepad window and select Copy.
6) Access the New Transformation wizard.
7) Right-click in the background of the SAS Code pane and select Paste.

j. Click .
10.2 Using the New Transformation Wizard 10-13

The SAS code used is the following:

%let path=S:\Workshop\dift\reports;
%macro TabulateGChart;
options mprint;
%if (%quote(&options) ne) %then
%do;
options &options;
%end;
%if (%quote(&path) ne) %then
%do;
ods html path="&path"
gpath="&path"
%end;
%if (%quote(&file) ne) %then
%do;
file="&file..html" ;
%end;
%if (%quote(&tabulatetitle) ne) %then
%do;
title1 "&tabulatetitle";
%end;
proc tabulate data=&syslast;
class &classvar1
&classvar2;
var &analysisvar;
table &classvar1*&classvar2,
&analysisvar*(min="Minimum"*f=comma7.2
mean="Average"*f=comma8.2
sum="Total"*f=comma14.2
max="Maximum"*f=comma10.2);
run;
%if (%quote(&gcharttitle) ne) %then
%do;
title "&gcharttitle";
%end;
GOPTIONS DEV=PNG;
PROC GCHART DATA=&syslast;
VBAR &classvar1 / SUMVAR=&analysisvar GROUP=&classvar2
CLIPREF FRAME TYPE=SUM OUTSIDE=SUM
COUTLINE=BLACK;
RUN; QUIT;
ods html close;
%mend TabulateGChart;
%TabulateGChart;
10-14 Chapter 10 Defining Generated Transformations

k. Click .

The Options window is available to define the options to be used in this transformation.
10.2 Using the New Transformation Wizard 10-15

l. Define metadata for three groups.

1) Click .

2) Type Data Items as the value for the Displayed text field.

3) Click .

4) Click .

5) Type Titles as the value for the Displayed text field.

6) Click .

7) Click .

8) Type Other Options as the value for the Displayed text field.

9) Click .
10-16 Chapter 10 Defining Generated Transformations

The three groups should resemble the following:

m. Define metadata for the options in the Data Items group.

1) Define metadata for the first classification variable.
a) Select the Data Items group.

b) Click .

c) Type classvar1 as the value for the Name field.

d) Type Column to Chart for Vertical Bar Chart as the value for the
Displayed text field.

e) Type The column selected for this option will be the charting
column for GCHART and a classification column in the row
dimension for TABULATE. as the value for the Description field.

f) Click Requires a non-blank value in the Options area.

10.2 Using the New Transformation Wizard 10-17

g) Select the Prompt Type and Values tab.

h) Select Data source column as the value for the Prompt type field.

i) Verify that Select from source is selected in the Columns to select from area.

j) Verify that all data types are selected.

k) Click Limit number of selectable columns.
l) Type 1 as the value for the Minimum field.

m) Type 1 as the value for the Maximum field.

n) Click .

2) Define metadata for the second classification variable.

a) Select the Data Items group.

b) Click .

c) Type classvar1 as the value for the Name field.

d) Type Column for Grouping Charting Variable as the value for the
Displayed text field.
10-18 Chapter 10 Defining Generated Transformations

e) Type The column selected for this option will be the grouping
column for GCHART and a classification column in the row
dimension for TABULATE. as the value for the Description field.

f) Click Requires a non-blank value in the Options area.

g) Select the Prompt Type and Values tab.

h) Select Data source column as the value for the Prompt type field.

i) Verify that Select from source is selected in the Columns to select from area.

j) Verify that all data types are selected.

k) Click Limit number of selectable columns.
l) Type 1 as the value for the Minimum field.

m) Type 1 as the value for the Maximum field.

n) Click .

3) Define metadata for the analysis variable.

a) Select the Data Items group.

b) Click .

c) Type analysisvar as the value for the Name field.

d) Type Column to Analyze for Vertical Bar Chart as the value for the
Displayed text field.

e) Type The column selected for this option will determine the
heights of the bars for GCHART and an analysis column for
TABULATE. as the value for the Description field.

f) Click Requires a non-blank value in the Options area.

g) Select the Prompt Type and Values tab.

h) Select Data source column as the value for the Prompt type field.

i) Verify that Select from source is selected in the Columns to select from area.

j) Clear Character in the Data types area.

k) Click Limit number of selectable columns.

l) Type 1 as the value for the Minimum field.

m) Type 1 as the value for the Maximum field.

n) Click .
10.2 Using the New Transformation Wizard 10-19

The three options in the Data Items group should resemble the following:

n. Define metadata for the options in the Titles group.

1) Define metadata for the title to be used with TABULATE output.
a) Select the Titles group.

b) Click .

c) Type tabulatetitle as the value for the Name field.

d) Type Title for Table Report as the value for the Displayed text field.

e) Type Specify some text that will be used as the title for the
TABULATE output. as the value for the Description field.

f) Select the Prompt Type and Values tab.

g) Verify that Text is specified as the value for the Prompt type field.

h) Accept the defaults for the rest of the fields.

i) Click .

2) Define metadata for the title to be used with GCHART output.

a) Select the Titles group.

b) Click .

c) Type gcharttitle as the value for the Name field.

d) Type Title for Graph Report as the value for the Displayed text field.

e) Type Specify some text that will be used as the title for the
GCHART output. as the value for the Description field.

f) Select the Prompt Type and Values tab.

g) Verify that Text is specified as the value for the Prompt type field.

h) Accept the defaults for the rest of the fields.

i) Click .
10-20 Chapter 10 Defining Generated Transformations

The two options in the Titles group should resemble the following:

o. Define metadata for the options in the Other Options group.

1) Define metadata for SAS system options
a) Select the Other Options group.

b) Click .

c) Type options as the value for the Name field.

d) Type Specify SAS system options as the value for the Displayed text
field.
e) Type Specify a space separated list of global SAS system
options. as the value for the Description field.

f) Select the Prompt Type and Values tab.

g) Verify that Text is specified as the value for the Prompt type field.

h) Accept the defaults for the rest of the fields.

i) Click .

2) Define metadata for the name of the HTML file to be created.

a) Select the Other Options group.

b) Click .

c) Type file as the value for the Name field.

d) Type Name of HTML file to be created as the value for the Displayed
text field.

e) Type Enter the name of the HTML file that will contain the
reports generated by this transformation. Do NOT enter the
HTML file extension! as the value for the Description field.
10.2 Using the New Transformation Wizard 10-21

f) Select the Prompt Type and Values tab.

g) Verify that Text is specified as the value for the Prompt type field.

h) Accept the defaults for the rest of the fields.

i) Click .

The two options in the Other Options group should resemble the following:
10-22 Chapter 10 Defining Generated Transformations

p. Test the prompts.

1) Click . The Test the Prompts window opens as follows:

Verify that the three items in the Data Items group are all required (note the *).
The descriptions entered for each of the parameters are displayed.

Clicking opens a dialog window to navigate the SAS Folders to a data source
from which a column can be selected.
10.2 Using the New Transformation Wizard 10-23

2) Click Titles in the selection pane. The two options in the Titles group are displayed:

3) Click Other Options in the selection pane. The two options in the Other Options group are
displayed:
10-24 Chapter 10 Defining Generated Transformations

4) Click to close the Test the Prompts window.

q. Click .

r. Clear Transform supports outputs in the Outputs area.

The Inputs group box values add a specified number of inputs to the transformation when
it is used in a job. If you later update the transformation to increase this minimum number
of inputs value, any jobs that have been submitted and saved use the original value. The
increased minimum number of inputs is enforced only for subsequent jobs. Therefore, you
can increase the minimum number of inputs without breaking existing jobs. The Maximum
number of inputs field is used to allow you to connect additional inputs into the input port.
For example, a setting of 3 allows you to have up to three inputs. The rules for inputs also
apply to outputs.
10.2 Using the New Transformation Wizard 10-25

s. Click .

t. Click .

2. Create a job to use the new transformation.

a. Select the Folders tab.

b. Expand Data Mart Development Jobs.

c. Verify that the Jobs folder is selected.

d. Select File New Job. The New Job window opens.

e. Type Report and Graphic for Customer Orders as the value for the Name field.

f. Verify that the Location is set to Data Mart Development Jobs.

g. Click . The Job Editor window opens.

10-26 Chapter 10 Defining Generated Transformations

3. Add source table metadata to the diagram for the process flow.

a. Select the Folders tab.

b. Navigate to the Data Mart Development Data folder.

c. Drag the DIFT Customer Order Information table object to the Diagram tab of the Job Editor.
4. Add the Summary Table and Vertical Bar Chart transformation to the process flow.

a. Select the Checkouts tab.

b. Drag the Summary Table and Vertical Bar Chart transformation to the Diagram tab of the Job
Editor.
5. Connect the DIFT Customer Order Information table object to the Summary Table and Vertical
Bar Chart transformation.

6. Select File Save to save diagram and job metadata to this point.
10.2 Using the New Transformation Wizard 10-27

7. Specify properties for the Summary Table and Vertical Bar Chart transformation.

a. Right-click on the Summary Table and Vertical Bar Chart transformation and select
Properties.

b. Select the Options tab.

c. Verify that Data Items is selected in the selection pane.

1) Click for the Column to Chart for Vertical Bar Chart option.

2) Select Customer Age Group in the Select a Data Source Item window.

3) Click .

4) Click for the Column for Grouping Chart Variable option.

5) Select Customer Gender in the Select a Data Source Item window.

6) Click .
10-28 Chapter 10 Defining Generated Transformations

7) Click for the Column to Analyze for Vertical Bar Chart

option.
8) Select Quantity Ordered in the Select a Data Source Item window.

9) Click .
10.2 Using the New Transformation Wizard 10-29

d. Verify that Titles is selected in the selection pane.

1) Type Analyzing &analysisvar across &classvar1 and &classvar2 in

the Title for Table Report field.

2) Type Sum of &analysisvar for &classvar1 grouped by &classvar2 in

the Title for Graph Report field.
10-30 Chapter 10 Defining Generated Transformations

e. Verify that Other Options is selected in the selection pane.

1) Type NODATE NONUMBER LS=80 as the value for the Specify SAS system
options field.

2) Type CustomerOrderReport as the value for the Name of HTML file to be

created field.

f. Click .

8. Select File Save to save diagram and job metadata to this point.
10.2 Using the New Transformation Wizard 10-31

9. Run the job.

a. Right-click in background of the job and select Run.

b. Select the Status tab in the Details area.

c. Double-click the Warning for the content of the Warning.

10-32 Chapter 10 Defining Generated Transformations

10. View the generated HTML file.

a. Access the Windows Explorer.

b. Navigate to the S:\Workshop\dift\reports folder.

c. Double-click CustomerOrderReport.html. An information window opens like the following:

d. Click .

e. In the Microsoft Internet Explorer window, click to close the information bar.

The report is displayed as follows:

10.2 Using the New Transformation Wizard 10-33

Scroll to view the graphic:

11. Select File Close to close the browser window.

10-34 Chapter 10 Defining Generated Transformations

12. Check in the objects.

a. Select the Checkouts tab.

b. Select Check Outs Check In All.

c. Type Adding transformation and job using the new transformation as

the value for the Title field.

d. Click . Verify that all table objects are selected.

e. Click . Review the information in the Summary window.

f. Click . The table object should no longer be on the Checkouts tab.

g. Select the Transformations tab.

h. Expand the User Defined folder. Note the transformation.

After the transformation is checked in, it can be updated only by an administrator.

10.2 Using the New Transformation Wizard 10-35

Exercises

1. Creating a Forecast Transformation

A transformation needs to be generated that uses PROC FORECAST to create a data set containing
the forecast information. A PROC GPLOT step can then be used to plot this information.
The code for the transformation is found in S:\Workshop\dift\ForecastGraph.sas.
Name the transformation Forecast as Graphic and store it in the User Defined folder in
the Transformations tree.
Create three groups of options:
Data Items
Forecast Options
Titles and Other Options

Add the following options for the Data Items group:

Name Idvariable
Displayed text: ID Variable
Description: The column used to identify obs in input and
output data sets. Its values are interpreted
and extrapolated according to the values of the
INTERVAL= option.
Required Yes
Prompt type: Data Source Column
Other information: Do not allow character columns; only allow 1 column as a selection.

Name Fcastvariable
Displayed text: Column to Forecast
Description: The column from the input data set that is to
be forecasted.
Required Yes
Prompt type: Data Source Column
Other information: Do not allow character columns; only allow one column as a
selection.
10-36 Chapter 10 Defining Generated Transformations

Add the following options for the Forecast Options group:

Name Alpha
Displayed text: Significance Level
Description: Specify significance level for confidence
intervals (default is .05).
Required No
Prompt type: Numeric
Other information: Allow values other than integers; provide a default value of .05

Name Lead
Displayed text: Number of Periods to Forecast
Description: Specify the number of periods ahead to
forecast (default is 6).
Required No
Prompt type: Numeric
Other information: Provide a default value of 6.

Name Method
Displayed text: Method to Model the Series
Description: Specify the method to use to model the series
and generate the forecasts. (Default is
STEPAR)
Required No
Prompt type: Text
Other information: Provide a list of values of STEPAR, EXPO, WINTERS,
ADDWINTERS. Set STEPAR as the default.
10.2 Using the New Transformation Wizard 10-37

Add the following options for the Titles and Other Options group:

Name Title
Displayed text: Title for Forecast Graphic
Description: Specify text that will be used as the title
for the FORECAST output.
Required No
Prompt type: Text

Name Options
Displayed text: Specify SAS system options
Description: Specify a space separated list of global SAS
system options.
Required No
Prompt type: Text

Name File
Displayed text: Name of HTML file to be created
Description: Enter the name of the HTML file that will
contain the report generated by this
transformation. Do NOT enter the HTML file
extension!
Required No
Prompt type: Text

Be sure to test the prompts after they are built.

The transformation should not produce an output table.
10-38 Chapter 10 Defining Generated Transformations

2. Using the Forecast Transformation in a Job

Create a job that reads the DIFT Profit Information external file object, subsets the data for Orion
Australia, and generates an HTML file named OrionAustraliaProfitInfo.html using the Forecast as
Graphic transformation. The process flow should resemble the following:

Use the YYMM column as the ID column and the Profit column as the column to forecast.
View the HTML file. The output should resemble the following:

3. Check In Objects
Check in the transformation and job objects.
10.2 Using the New Transformation Wizard 10-39

4. Create Forecast for all Country Companies (Optional)

Create a job that uses the Loop transformations to create a series of HTML files, one for each distinct
company value. However, only include those company values that contain Orion. The control table
for the loop transformations can be created as part of the job, or as a separate job. The following
diagram shows the creation of the control table as part of the job:

The inner job should be a copy of the job from Exercise 3, but modified with parameters.
10-40 Chapter 10 Defining Generated Transformations

10.3 Solutions to Exercises

1. Creating a Forecast Transformation
a. Select the Folders tab.
b. Expand Data Mart Development Reports.
c. Verify that the Reports folder is selected.
d. Select File New Transformation.
e. Enter the General information for the new transformation.
1) Type Forecast as Graphic as the value for the Name field.

2) Verify that the location is set to Data Mart Development Reports.

3) Type User Defined as the value for the Transformation Category field.

4) Click .

f. Add the necessary SAS code.

1) Access Windows Explorer.
2) Navigate to S:\Workshop\dift\.
3) Right-click on ForecastGraph.sas and select Open With Notepad.
4) Right-click in the background of the Notepad window and select Select All.
5) Right-click in the background of the Notepad window and select Copy.
6) Access the New Transformation wizard.
7) Right-click in the background of the SAS Code pane and select Paste.

8) Click .

The SAS code used is the following:

%let path=S:\Workshop\dift\reports;

%macro ForeCastGraph;

options mprint;
%if (%quote(&options) ne) %then
%do;
options &options;
%end;

proc forecast data=&syslast alpha=&alpha interval=month

lead=&lead method=&method out=fcast outall;
10.3 Solutions to Exercises 10-41

%if (%quote(&idvariable) ne) %then

%do;
id &idvariable;
%end;
%if (%quote(&fcastvariable) ne) %then
%do;
var &fcastvariable;
%end;
run;

symbol1 I=spline c=green;

symbol2 I=spline c=blue l=3;
symbol3 I=spline c=black;
symbol4 I=spline c=red;
symbol5 I=spline c=red;
symbol6 I=spline c=black;
legend1 down=2 across=3 label=('Legend:')
position=(bottom center outside) ;
goptions dev=png;

%if (%quote(&file) ne) %then %do;

ods html path="S:\Workshop\dift\reports"
gpath="S:\Workshop\dift\reports"
file="&file..html";
%end;
%if (%quote(&title) ne) %then %do;
Title &title;
%end;
proc gplot data=fcast;
plot &fcastvariable * &idvariable = _type_ /
legend=legend1 autovref href='01jan2003'd ;
run;
quit;
ods html close;

%mend ForeCastGraph;

%ForeCastGraph;

g. Click .

h. Define metadata for three groups.

1) Click .

2) Type Data Items as the value for the Displayed text field.

3) Click .

4) Click .

5) Type Forecast Options as the value for the Displayed text field.
10-42 Chapter 10 Defining Generated Transformations

6) Click .

7) Click .

8) Type Titles and Other Options as the value for the Displayed text field.

9) Click .

The three groups should resemble the following:

i. Define metadata for the options in the Data Items group.

1) Define metadata for the first variable.
a) Select the Data Items group.

b) Click .

c) Type idvariable as the value for the Name field.

d) Type ID Variable as the value for the Displayed text field.

e) Type The column used to identify obs in input & output data
sets. Its values are interpreted and extrapolated according
to the values of the INTERVAL= option. as the value for the
Description field.

f) Click Requires a non-blank value in the Options area.

g) Select the Prompt Type and Values tab.

h) Select Data source column as the value for the Prompt type field.

i) Verify that Select from source is selected in the Columns to select from area.

j) Clear Chacacter in the Data Types area.

k) Click Limit number of selectable columns.

l) Type 1 as the value for the Minimum field.

m) Type 1 as the value for the Maximum field.

n) Click .

2) Define metadata for the second variable.

a) Select the Data Items group.
10.3 Solutions to Exercises 10-43

b) Click .

c) Type fcastvariable as the value for the Name field.

d) Type Column to Forecast as the value for the Displayed text field.

e) Type The column from the input data set that is to be

forecasted. as the value for the Description field.

f) Click Requires a non-blank value in the Options area.

g) Select the Prompt Type and Values tab.

h) Select Data source column as the value for the Prompt type field.

i) Verify that Select from source is selected in the Columns to select from area.

j) Clear Chacacter in the Data Types area.

k) Click Limit number of selectable columns.

l) Type 1 as the value for the Minimum field.

m) Type 1 as the value for the Maximum field.

n) Click .

The Data Items group has the following items defined:

j. Define metadata for the options in the Forecast Options group.

1) Define metadata for the significance level options.
a) Select the Forecast Options group.

b) Click .

c) Type alpha as the value for the Name field.

d) Type Significance Level as the value for the Displayed text field.

e) Type Specify significance level for confidence intervals

(default is .05). as the value for the Description field.

f) Select the Prompt Type and Values tab.

g) Select Numeric is specified as the value for the Prompt type field.
10-44 Chapter 10 Defining Generated Transformations

h) Clear Allow only integer values.

i) Type .05 as the value for the Default value field.

j) Click .

2) Define metadata for the number of periods to forecast option.

a) Select the Forecast Options group.

b) Click .

c) Type lead as the value for the Name field.

d) Type Number of Periods to Forecast as the value for the Displayed text
field.
e) Type Specify the number of periods ahead to forecast (default
is 6). as the value for the Description field.

f) Select the Prompt Type and Values tab.

g) Select Numeric is specified as the value for the Prompt type field.

h) Type 6 as the value for the Default value field.

i) Click .

3) Define metadata for the method to model the series forecast option.
a) Select the Forecast Options group.

b) Click .

c) Type method as the value for the Name field.

d) Type Method to Model the Series as the value for the Displayed text
field.
e) Type Specify the method to use to model the series and
generate the forecasts. (Default is STEPAR) as the value for the
Description field.

f) Select the Prompt Type and Values tab.

g) Accept Text as the value for the Prompt type field.

h) Select User selects values from a static list as the value for the Method for
populating prompt field.

i) Click in the List of values area.

j) Type STEPAR as the Unformatted Value.

k) Click in the List of values area.

10.3 Solutions to Exercises 10-45

l) Type EXPO as the Unformatted Value.

m) Click in the List of values area.

n) Type WINTERS as the Unformatted Value.

o) Click in the List of values area.

p) Type ADDWINTERS as the Unformatted Value.

q) Click under Default for STEPAR.

r) Type 6 as the value for the Default value field.

s) Click .

The Forecast Options group has the following items defined:

k. Define metadata for the options in the Titles and Other Options group.
1) Define metadata for the title to be used with forecast graphic.
a) Select the Titles and Other Options group.

b) Click .

c) Type title as the value for the Name field.

d) Type Title for Forecast Graphic as the value for the Displayed text
field.
e) Type Specify text that will be used as the title for the
FORECAST output. as the value for the Description field.

f) Select the Prompt Type and Values tab.

g) Verify that Text is specified as the value for the Prompt type field.

h) Accept the defaults for the rest of the fields.

i) Click .

2) Define metadata for SAS system options

10-46 Chapter 10 Defining Generated Transformations

a) Select the Title and Other Options group.

b) Click .

c) Type options as the value for the Name field.

d) Type Specify SAS system options as the value for the Displayed text
field.
e) Type Specify a space separated list of global SAS system
options. as the value for the Description field.

f) Select the Prompt Type and Values tab.

g) Verify that Text is specified as the value for the Prompt type field.

h) Accept the defaults for the rest of the fields.

i) Click .

3) Define metadata for the name of the HTML file to be created.

a) Select the Title and Other Options group.

b) Click .

c) Type file as the value for the Name field.

d) Type Name of HTML file to be created as the value for the Displayed
text field.

e) Type Enter the name of the HTML file that will contain the
report generated by this transformation. Do NOT enter the
HTML file extension! as the value for the Description field.

f) Click Requires a non-blank value in the Options area.

g) Select the Prompt Type and Values tab.

h) Verify that Text is specified as the value for the Prompt type field.

i) Accept the defaults for the rest of the fields.

j) Click .

The two options in the Other Options group should resemble the following:
10.3 Solutions to Exercises 10-47

l. Test the prompts.

1) Click . The Test the Prompts window opens as follows:

10-48 Chapter 10 Defining Generated Transformations

2) Click Forecast Options in the selection pane. The three options in the Forecast Options
group are displayed.
3) Verify that all fields have default values and that four values are available for selection for the
Method to Model the Series.
10.3 Solutions to Exercises 10-49

4) Click Titles and Other Options in the selection pane. The three options in the Titles and
Other Options group are displayed.

5) Verify that Name of HTML file to be created is a required field.

6) Click to close the Test the Prompts window.

m. Click .

n. Clear Transform supports outputs in the Outputs area.

o. Click .

p. Click .

2. Using the Forecast Transformation in a Job

a. Define the initial job metadata.
1) Select the Folders tab.
2) Expand Data Mart Development Jobs.
3) Verify that the Jobs folder is selected.
4) Select File New Job. The New Job window opens.
5) Type Create Profit Forecast for Orion Australia as the value for the
Name field.
10-50 Chapter 10 Defining Generated Transformations

6) Verify that the Location is set to Data Mart Development Jobs.

7) Click . The Job Editor window is displayed.

b. Add source metadata to the diagram for the process flow.

1) Select the Folders tab.
2) Navigate to the Data Mart Development Data folder.
3) Drag the DIFT Profit Information table object to the Diagram tab of the Job Editor.
c. Add the File Reader transformation to the process flow.
1) Select the Transformation tab.
2) Expand the Access folder and locate the File Reader transformation template.
3) Drag the File Reader transformation to the Diagram tab of the Job Editor.
d. Connect the DIFT Profit Information external file object to the File Reader transformation.
e. Add the Extract transformation to the process flow.
1) Select the Transformation tab.
2) Expand the Data folder and locate the Extract transformation template.
3) Drag the Extract transformation to the Diagram tab of the Job Editor.
f. Add the Forecast as Graphic transformation to the process flow.
1) Select the Checkouts tab.
2) Drag the Forecast as Graphic transformation to the Diagram tab of the Job Editor.
g. Connect the Extract transformation to the Forecast as Graphic transformation.

h. Select File Save to save diagram and job metadata to this point.
i. Specify properties for the Extract transformation.
1) Right-click on the Extract transformation and select Properties.
2) Select the Where tab.
3) Type Company = Orion Australia as the value for the Expression Text area.

4) Click to close the Properties window.

j. Select File Save to save diagram and job metadata to this point.
10.3 Solutions to Exercises 10-51

k. Specify properties for the Forecast as Graphic transformation.

1) Right-click on Forecast as Graphic transformation and select Properties.
2) Select the Options tab.
3) Select the Data Items in the selection pane.
4) Select YYMM as the value for the ID Variable.
5) Select Profit as the value for the Column to Forecast.
6) Select Titles and Other Options in the selection pane.
7) Type Profit Forecast for Orion Australia as the value for Title for
Forecast Graphic.
8) Type OrionAustraliaProfitInfo as the value for Name of HTML file to be
created.
9) Click to close the Properties window.

l. Run the job.

1) Right-click in background of the job and select Run.
2) Select the Status tab in the Details area. Note that all processes completed successfully.
3) Click to close the Details view.
4) View the Log for the executed Job.
5) Access the Windows Explorer.
6) Navigate to S:\Workshop\dift\reports.
7) Double-click OrionAustraliaProfitInfo.html to open the file in a browser.
8) Select File Close to close the browser.
9) Select File Close to close the job editor.
3. Check In Objects
a. Select the Checkouts tab.
b. Select Check Outs Check In All.
c. Type Adding objects that use SORT, RANK, TRANSPOSE and LIST DATA as
the value for the Title field.

d. Click . Verify that all table objects are selected.

e. Click . Review the information in the Summary window.

f. Click . The table object should no longer be on the Checkouts tab.

4. Create Forecast for all Country Companies (Optional)
10-52 Chapter 10 Defining Generated Transformations
Chapter 11 Implementing Data
Quality Techniques (Self-Study)

11.1 SAS and Data Quality ................................................................................................... 11-3

11.2 Working with the DataFlux IS Transformations ....................................................... 11-10

Demonstration: Confirming DataFlux Integration Server Is Running ................................. 11-15

Demonstration: Configuring the DataFlux Integration Server Manager ............................. 11-18

Demonstration: Creating Jobs for Execution on DataFlux Integration Server ................... 11-20

Demonstration: Creating a Job to be Used as a Service ................................................... 11-63

Demonstration: Uploading Jobs and Services to DataFlux Integration Server.................. 11-69

Demonstration: Registering DataFlux Integration Server in SAS Management

Console .................................................................................................... 11-76

Demonstration: Using the DataFlux IS Job Transformation............................................... 11-87

Demonstration: Using the DataFlux IS Service Transformation ...................................... 11-101

Exercises .......................................................................................................................... 11-120

11-2 Chapter 11 Implementing Data Quality Techniques (Self-Study)
11.1 SAS and Data Quality 11-3

11.1 SAS and Data Quality

Objectives
Define data quality.
Discuss data quality offerings from SAS.

What Is Data Quality?

Poor quality information is detrimental to any
organization. One of the critical foundations for the
effective warehousing and mining of volumes of data
is data quality. If data is of insufficient quality, then the
knowledge workers who query the data warehouse and
the decision makers who receive the information cannot
trust the results.

4
11-4 Chapter 11 Implementing Data Quality Techniques (Self-Study)

Cleansing the Data

Data can be cleansed and/or validated to reduce the
volume and increase the accuracy of the data sent
through an ETL flow.

Cleansing
data will
result in
accurate
reports

When to Cleanse the Data

The least expensive time to deal with data quality issues
is at the point of data capture. If rules and techniques can
be employed to keep the data from being stored
incorrectly, the ETL developer or business user does not
have to deal with data quality issues downstream.
If data quality cannot be applied at the point of data
capture, it is best to correct the data issues as soon as
possible, and in the ETL flow is a valid option.

6
11.1 SAS and Data Quality 11-5

SAS Data Integration Studio and Data Quality

SAS Data Integration Studio has a number of features that
can help improve the quality of data in a data warehouse:
Data Validation transformation

Apply Lookup Standardization transformation

Create Match Code transformation

DataFlux IS Job transformation

DataFlux IS Service transformation

SAS Data Quality Server functions in the Expression

Builder
With the exception of the Data Validation transformation,
these features require the licensing of SAS Data Quality
Server software.

7 ...

The Data Validation transformation is covered in a previous chapter.

SAS Data Quality Server

The SAS Data Quality Server product enables users to
analyze, transform, standardize, and de-duplicate data
from within the SAS programming environment.
The procedures are surfaced in SAS Data Integration
Studio through the Data Quality transformations.

8
11-6 Chapter 11 Implementing Data Quality Techniques (Self-Study)

Functional Overview of the

SAS Data Quality Server Software
SAS Data Quality Server DataFlux Products

Local Process
PROC DQSCHEME Quality dfPower
read
Knowledge
PROC DQMATCH Studio
Base
18 functions
read
run Profile
Server Process jobs
PROC DQSRVADM Integration
PROC DQSRVSVC Server
8 functions Architect
jobs

The language elements in the SAS Data Quality Server software can be separated into two functional
groups. As shown in the previous diagram, one group cleanses data in SAS, and the other group runs data
cleansing jobs and services on Integration Servers from DataFlux (a SAS company).
The language elements in the Local Process group read data definitions out of the Quality Knowledge
Base to, for example, create match codes, apply schemes, or parse text. The language elements in the
Server Process group start and stop jobs and services and manage log entries on DataFlux Integration
Servers.
The DataFlux Integration Servers and the related dfPower Profile and dfPower Architect applications are
made available with the SAS Data Quality Server software in various software bundles.
11.1 SAS and Data Quality 11-7

Apply Lookup Standardization Transformation

The Apply Lookup Standardization transformation applies
one or more schemes to one or more columns in a
source table. Applying schemes modifies your source
data according to rules that are defined in the schemes.
Example City Scheme: Original Data:
Data Standard Name City State
Fort Knox Fort Knox Glenn Gray Ft Knox ME
Ft Knox Fort Knox Ric Rogers DANBURY CT
. . .
Ft Knocks Fort Knox . . .
. . .
Danbury Danbury
DANBURY Danbury Updated Data:
Dan Bury Danbury Name City State
N Yarmouth North Yarmouth Glenn Gray Fort Knox ME
North Yarmouth North Yarmouth Ric Rogers Danbury CT
. . . . .
. . . . .
. . . . .

Create Match Code Transformation

The Create Match Code transformation can be used to
establish relationships between source rows.
Name Address City
Are these three
Jen Barker PO Box 15 Bath
records referring to
Jenny Barker Box 15 Bathe
the same individual?
Jennifer Barker P.O. Box 15 Bath

With the generated match codes based on name, address

and city, the three records are shown to be the same two
records can be removed from the data source.
Name Address City MatchCode
Jen Barker PO Box 15 Bath MY3Y$$Z5$$M~2
Jenny Barker Box 15 Bathe MY3Y$$Z5$$M~2
Jennifer Barker P.O. Box 15 Bath MY3Y$$Z5$$M~2
11
11-8 Chapter 11 Implementing Data Quality Techniques (Self-Study)

DataFlux IS Job Transformation

Jobs can be created with DataFlux dfPower Architect and
dfPower Profile.
Q dfPower Architect jobs are typically used to cleanse
larger amounts of source data.
Q dfPower Profile jobs are typically used to analyze data.

SAS Data Integration Studio enables you to execute the

dfPower Architect or dfPower Profile jobs within a SAS
Data Integration Studio job using the DataFlux IS Job
transformation.

DataFlux IS Service Transformation

DataFlux real-time services can be created using dfPower
Architect.
Q Real-time services are used to synchronously process
smaller amounts of data, when a client application is
waiting for a response.

SAS Data Integration Studio enables you to execute

DataFlux real-time services within a SAS Data Integration
Studio job using the DataFlux IS Service transformation.

All DataFlux jobs and real-time services run on DataFlux Integration Servers. To execute
DataFlux jobs and services from SAS Data Integration Studio jobs, you must first install a
DataFlux Integration Server and register that server in SAS metadata.
11.1 SAS and Data Quality 11-9

Data Quality Terminology

The data quality transformations and functions work with a
special database called the Quality Knowledge Base (QKB).
Quality A collection of locales and other
Knowledge information that is referenced during data
Base (QKB) analysis and data cleansing
Locale A collection of definitions in a QKB that is
specific to a national language and
geographic region
Example: The locale ENUSA is referenced when the
specified source column contains character
data in the English language that applies
to the region of the United States.

Settings for Data Quality Transformations

Settings for the data quality transformations can be
set from the Data Quality tab in the SAS Data Integration
Studio Options window. The following can be established:
Default Locale DQ Setup File Location
Scheme Repository Type Scheme Repository

15
11-10 Chapter 11 Implementing Data Quality Techniques (Self-Study)

11.2 Working with the DataFlux IS Transformations

Objectives
Discuss the DataFlux Integration Server.

19
11.2 Working with the DataFlux IS Transformations 11-11

DataFlux Integration Server

In the past, data cleansing was a batch, reactive process
used to clean data being fed into data warehouses and
data marts.
The DataFlux Integration Server provides key benefits
that include the following:
Q batch process scalability

Q real-time data cleansing

Q full DataFlux functionality from within SAS

Batch jobs can be run on a server-grade machine, meaning the process is more scalable to larger data
sources. Server-class machines supported by DataFlux Integration Server include:
Windows
Unix AIX/HP-UX/Solaris/Linux
The data cleansing processes, available as real-time services via Service Oriented Architecture (SOA), are
available to any Web-based application that can consume services (Web applications, ERP systems,
operational systems, SAS, and more).
The data cleansing jobs and services registered to the DataFlux Integration Server are available (via
procedures and functions) from within SAS. This gives the user the full power of dfPower Architect and
dfPower Profile functionality from within SAS.
11-12 Chapter 11 Implementing Data Quality Techniques (Self-Study)

What Is an Integration Server?

An integration server is a service-oriented architecture
application server that enables you to execute dfPower
Architect or dfPower Profile jobs on a server-based
platform.
By processing these jobs in Windows or UNIX, where the
data resides, you can avoid network bottlenecks and can
take advantage of performance features available with
higher-performance computers.

Standard versus Enterprise

DataFlux Integration Server is available in two editions:
Q Standard

Q Enterprise

In addition, a SAS restricted DataFlux Enterprise

Integration Server is also provided with SAS Enterprise
Data Integration Server bundle. This server is only
callable from within SAS.

22
11.2 Working with the DataFlux IS Transformations 11-13

DataFlux Standard Integration Server

The DataFlux Standard Integration Server supports the
ability to run batch dfPower Studio jobs in a client/server
environment, as well as the ability to call discrete
DataFlux data quality algorithms from numerous native
programmatic interfaces (including C, COM, JAVA, Perl,
and more).
The DataFlux Standard Integration Server enables any
dfPower Studio client to offload batch dfPower Profile and
dfPower Architect jobs into more powerful server
environments. This capability frees up the user's local
desktop, while enabling higher performance processing
on larger, more scalable servers.

DataFlux Enterprise Integration Server

The DataFlux Enterprise Integration Server offers an
approach to data quality that drastically reduces the time
required to develop and deploy real-time data quality and
data integration services.

24
11-14 Chapter 11 Implementing Data Quality Techniques (Self-Study)

Real-Time Services
In addition, existing batch jobs can be converted to real-
time services that can be invoked by any application that
is Web service enabled. This provides users with the
ability to reuse the business logic developed when
building batch jobs for data migration or loading a data
warehouse, and apply it at the point of data entry to
ensure consistent, accurate, and reliable data across the
enterprise.

25
11.2 Working with the DataFlux IS Transformations 11-15

Confirming DataFlux Integration Server Is Running

1. Select Start Control Panel.

2. Double-click Administrative Tools.
11-16 Chapter 11 Implementing Data Quality Techniques (Self-Study)

The Administrative Tools window opens.

3. Double-click Services.
11.2 Working with the DataFlux IS Transformations 11-17

The Services window opens.

4. Locate the DataFlux Integration Server service and verify that it is Started.

5. Select File Exit to close the Services window.

6. Select File Close to close the Administrative Tools window.
11-18 Chapter 11 Implementing Data Quality Techniques (Self-Study)

Configuring the DataFlux Integration Server Manager

1. Select Start All Programs DataFlux Integration Server 8.1 Integration Server Manager.

2. Click to close the warning window.

3. Select Tools Options.

4. Type localhost as the value for the Server name field.

5. Type 21036 as the value for the Server port field.

To configure a remote DataFlux Integration Server, specify a valid Server name and Server
port for the remote server.

6. Click to save the changes and close the Options window.

11.2 Working with the DataFlux IS Transformations 11-19

The DataFlux Integration Server Manager window now displays the machine/port specified.

7. Select File Exit to close DataFlux Integration Server Manager.

11-20 Chapter 11 Implementing Data Quality Techniques (Self-Study)

Creating Jobs for Execution on DataFlux Integration Server

This demonstration illustrates the creation of a dfPower Profile job, how to run the job, and how to review
the generated metrics.

Create a Profile Job

1. From within SAS Data Integration Studio, select Tools dfPower Tool
dfPower Profile(Configurator).

Alternatively, invoke dfPower Profile(Configurator) by selecting Start All Programs

DataFlux dfPower Studio 8.1 dfPower Studio, and then select Tools Profile
Configurator.
11.2 Working with the DataFlux IS Transformations 11-21

2. In the data sources area, click on to expand the DataFlux Sample database.

3. Click the Contacts table. The right side of the window populates with a listing of columns found in
the Contacts table.
11-22 Chapter 11 Implementing Data Quality Techniques (Self-Study)

4. Click to the left of the Contacts table. This selects all the fields of the table to be part of the
Profile job.

5. Clear the selections for DELETE_FLG and MATCH_CD fields.

11.2 Working with the DataFlux IS Transformations 11-23

6. Specify which metrics to calculate for each of the selected columns.

a. Select Job Select Metrics to open the Metrics window.
11-24 Chapter 11 Implementing Data Quality Techniques (Self-Study)

b. Click to the left of Select/unselect all to select all the Column profiling metrics.

c. Click to save the metric selections and close the Metrics window.
11.2 Working with the DataFlux IS Transformations 11-25

7. Specify additional metrics for the Address field.

a. Click the ADDRESS field.
11-26 Chapter 11 Implementing Data Quality Techniques (Self-Study)

b. Select Edit Field Override Metrics.

c. Click to the left of Frequency distribution.
d. Click to the left of Pattern frequency distribution.

e. Click to save the metric selections and close the Metrics window.
11.2 Working with the DataFlux IS Transformations 11-27

The ADDRESS field has a under the M field to identify that this column has metric overrides
specified.

8. Specify additional metrics for the City field.

a. Click the CITY field.
b. Select Edit Field Override Metrics.
c. Click to the left of Frequency distribution.
d. Click to the left of Pattern frequency distribution.

e. Click to save the metric selections and close the Metrics window.

9. Specify additional metrics for the Company field.

a. Click the COMPANY field.
b. Select Edit Field Override Metrics.
c. Click to the left of Frequency distribution.
d. Click to the left of Pattern frequency distribution.

e. Click to save the metric selections and close the Metrics window.
11-28 Chapter 11 Implementing Data Quality Techniques (Self-Study)

10. Specify additional metrics for the Phone field.

a. Click the PHONE field.
b. Select Edit Field Override Metrics.
c. Click to the left of Frequency distribution.
d. Click to the left of Pattern frequency distribution.

e. Click to save the metric selections and close the Metrics window.

11. Specify additional metrics for the State field.

a. Click the STATE field.
b. Select Edit Field Override Metrics.
c. Click to the left of Frequency distribution.
d. Click to the left of Pattern frequency distribution.

e. Click to save the metric selections and close the Metrics window.

Once the desired metrics are specified, the profile job is ready to run and produce the profile
report.
11.2 Working with the DataFlux IS Transformations 11-29

12. Save the Profile job.

a. Select File Save As. The Save As window is displayed.
b. Type Contacts Profile as the value for the Name field.

c. Type Profile fields for Contacts table as the value for the Description field.

d. Click .

The name is now displayed on the title bar for dfPower Profile (Configurator).
11-30 Chapter 11 Implementing Data Quality Techniques (Self-Study)

Create an Architect Job (Optional)

1. From within SAS Data Integration Studio, select Tools dfPower Tool dfPower Architect.

Alternatively, invoke dfPower Profile(Configurator) by selecting Start All Programs

DataFlux dfPower Studio 8.1 dfPower Studio, and then select Tools Base
Architect.
11.2 Working with the DataFlux IS Transformations 11-31

2. Add a Data Source node to the job flow.

a. Locate the Nodes tab in the Toolbox panel.

b. Click next to Data Inputs to expand the Data Inputs category of nodes.
11-32 Chapter 11 Implementing Data Quality Techniques (Self-Study)

c. Click the Data Source node, and then select Insert Node On Page. (Alternatively, you can
double-click on node and it will be automatically appended to the job flow, or you can drag and
drop the node onto the job flow and perform a manual connection.)
The node is added to the job flow and a Data Source Properties window is opened.
11.2 Working with the DataFlux IS Transformations 11-33

d. In the Data Source Properties window, click next to the Input table field. The Select
Table window opens.
e. Click to expand the DataFlux Sample database.
f. Click the Contacts table.

g. Click .
11-34 Chapter 11 Implementing Data Quality Techniques (Self-Study)

The Contacts table is now listed as the value for the Input table field. The fields found in
the Contacts table are now listed as Available.

h. Click to move all columns from Available to Selected.

i. Click .
11.2 Working with the DataFlux IS Transformations 11-35

3. Add a Standardization node to the job flow, and specify appropriate properties for it.

a. From the Nodes tab in the Toolbox panel, click to collapse the Data Inputs category of nodes.

b. Click next to Quality to expand the Quality category of nodes.

11-36 Chapter 11 Implementing Data Quality Techniques (Self-Study)

c. Click the Standardization node, and then select Insert Node Auto Append. (Alternatively,
you can double-click on node and it will be automatically appended to the job flow, or you can
drag and drop the node onto the job flow and perform a manual connection.)

d. In the Standardization Properties window, move the ADDRESS and STATE fields from Available
to Selected.

1) In the Standardization fields area, click ADDRESS field in the Available list.

2) Click to move the ADDRESS field to the Selected list.

3) Click STATE field in the Available list.

4) Click to move the STATE field to the Selected list.

11.2 Working with the DataFlux IS Transformations 11-37

e. Specify the appropriate Definition and/or Scheme for the two selected columns.

1) Click in the Definition field for the ADDRESS field to allow selection of a valid
standardization definition.

2) Select Address as the value for the Definition field.

The selected definition appears as displayed:

11-38 Chapter 11 Implementing Data Quality Techniques (Self-Study)

3) Click in the Definition field for the STATE field to allow selection of a valid
standardization definition.

4) Select State/Province (Abbreviation) as the value for the Scheme field.

5) Verify that the default names given to the output fields are ADDRESS_Stnd and
STATE_Stnd,
11.2 Working with the DataFlux IS Transformations 11-39

f. Select the desired additional outputs fields.

1) Click .

2) Click to move all columns from Available fields to Output fields.

11-40 Chapter 11 Implementing Data Quality Techniques (Self-Study)

3) Remove DATE, MATCH_CD, and DELETE_FLG from the Output fields list.

a) Click DATE field.

b) Hold down the CTRL key and click MATCH_CD field.
c) Hold down the CTRL key and click DELETE_FLG field.
11.2 Working with the DataFlux IS Transformations 11-41

d) Click to move the selected fields from the Output fields list.

4) Click to close the Additional Outputs window.

g. Click to close the Standardization Properties window.

11-42 Chapter 11 Implementing Data Quality Techniques (Self-Study)

4. Preview the results of the Standardization node.

a. Click Standardization 1 node in the job flow diagram.

b. Click Preview tab in the Details area.

c. Verify that the ADDRESS and STATE field values are standardized by checking for the following:
The state values for the first two records originally were state names spelled out the
STATE_Stnd field now has these values as abbreviations.
The address value for the first record has the word Street the ADDRESS_Stnd field has St.
The address value for the second record has the word Road the ADDRESS_Stnd field has
Rd.
Some of the original address values are all uppercased the ADDRESS_Stnd field has these
values proper-cased.
11.2 Working with the DataFlux IS Transformations 11-43

5. Add an Identification Analysis node to the job flow, and specify appropriate properties for it.

a. From the Nodes tab in the Toolbox panel, click to collapse the Data Inputs category of nodes.

b. Click next to Quality to expand the Quality category of nodes.

c. Click the Identification Analysis node, and then select Insert Node Auto Append.
(Alternatively, you can double-click on node and it will be automatically appended to the job
flow, or you can drag and drop the node onto the job flow and perform a manual connection.)

d. In the Identification Analysis Properties window, move the CONTACT field from Available to
Selected.

1) In the Identification analysis fields area, click CONTACT field in the

Available list.

2) Click - this moves the CONTACT field to the Selected list.

11-44 Chapter 11 Implementing Data Quality Techniques (Self-Study)

e. Click in the Definition area to allow selection of a valid identification definition.

f. Select Individual/Organization as the value for the Definition field.

g. Verify that the default name given to the output column is CONTACT_Identity,
11.2 Working with the DataFlux IS Transformations 11-45

h. Click .

i. Click to move all columns from Available fields to Output fields.

j. Click to close the Additional Outputs window.

k. Click to close the Identification Analysis Properties window.

6. Preview the results of the Identification Analysis node.

a. Click Identification Analysis 1 node in the job flow diagram.

b. Click Preview tab in the Details area.

For the first set of records in the Contacts table, the CONTACT field values are identified as
INDIVIDUAL.
11-46 Chapter 11 Implementing Data Quality Techniques (Self-Study)

7. Add a Branch node to the job flow, and specify appropriate properties for it.

a. From the Nodes tab in the Toolbox panel, click to collapse the Quality category of nodes.

b. Click next to Utilities to expand the Utilities category of nodes.

c. Click the Branch node, and then select Insert Node Auto Append. The Branch Properties
window is displayed.

d. Click to accept the default settings and close the Branch Properties window.
11.2 Working with the DataFlux IS Transformations 11-47

8. Add a Data Validation node to the job flow, and specify appropriate properties for it.

a. From the Nodes tab in the Toolbox panel, click to collapse the Utilities category of nodes.

b. Click next to Profiling to expand the Profiling category of nodes.

c. Click the Data Validation node, and then select Insert Node Auto Append.
11-48 Chapter 11 Implementing Data Quality Techniques (Self-Study)

The Data Validation Properties window is displayed.

d. Click next to the Field field and select CONTACT_Identity.

11.2 Working with the DataFlux IS Transformations 11-49

e. Click next to the Operation field and select equal to.

f. Type INDIVIDUAL as the value for the Single field.

g. Click to view full expression in the Expression area.

h. Click to close the Data Validation Properties window.

11-50 Chapter 11 Implementing Data Quality Techniques (Self-Study)

The job flow now should resemble the following:

9. Add a Gender Analysis node to the job flow, and specify appropriate properties for it.

a. From the Nodes tab in the Toolbox panel, click to collapse the Profiling category of nodes.

b. Click next to Quality to expand the Quality category of nodes.

c. Click the Gender Analysis node, and then select Insert Node Auto Append.
d. In the Gender Analysis Properties window, move the CONTACT field from Available to Selected.

1) In the Gender analysis fields area, click CONTACT field in the Available list.

2) Click to move the CONTACT field to the Selected list.

e. Click in the Definition area to allow selection of a valid gender definition.

f. Select Name as the value for the Definition field.

g. Verify that the default name given to the output column is CONTACT_Gender.
11.2 Working with the DataFlux IS Transformations 11-51

h. Click .

i. Click to move all columns from Available fields to Output fields.

j. Click to close the Additional Outputs window.

k. Click to close the Gender Analysis Properties window.

11-52 Chapter 11 Implementing Data Quality Techniques (Self-Study)

The job flow now should resemble the following:

11.2 Working with the DataFlux IS Transformations 11-53

10. Preview the results of the Gender Analysis node.

a. Click Gender Analysis 1 node in the job flow diagram.

b. Click Preview tab in the Details area.

There is a mixture of males and females in the data.

11. Add a Frequency Distribution node to the job flow, and specify appropriate properties for it.

a. From the Nodes tab in the Toolbox panel, click to collapse the Quality category of nodes.

b. Click next to Profiling to expand the Profiling category of nodes.

c. Click the Frequency Distribution node, and then select Insert Node Auto Append.

d. In the Frequency Distribution Properties window, move the CONTACT_Gender field from
Available to Selected.

1) In the Frequency distribution fields area, click CONTACT_Gender field in

the Available list.

2) Click to move the CONTACT_Gender field to the Selected list.

e. Click to close the Frequency Distribution Properties window.

11-54 Chapter 11 Implementing Data Quality Techniques (Self-Study)

The job flow now should resemble the following:

12. Preview the results of the Frequency Distribution node.

a. Click Frequency Distribution 1 node in the job flow diagram.

b. Click Preview tab in the Details area.

11.2 Working with the DataFlux IS Transformations 11-55

13. Add a second Data Validation node to the job flow, and specify appropriate properties for it.
a. From the Nodes tab in the Toolbox panel, verify that the Profiling category of nodes is expanded.
b. Click the Data Validation node, and then select Insert Node Insert On Page.

c. Click to close the Data Validation Properties window.

d. Click on the new Data Validation node and drag it so that it is next to the first Data Validation
node.
11-56 Chapter 11 Implementing Data Quality Techniques (Self-Study)

e. Click on the Branch node and (without releasing the mouse button) drag cursor to the second
Data Validation node. Release the mouse button.

f. Right-click on the second Data Validation node and select Properties.

g. Click next to the Field field and select CONTACT_Identity.

h. Click next to the Operation field and select equal to.

i. Type UNKNOWN as the value for the Single field.

j. Click to view full expression in the Expression area.

11.2 Working with the DataFlux IS Transformations 11-57

k. Click to close the Data Validation Properties window.

The job flow now should resemble the following:

11-58 Chapter 11 Implementing Data Quality Techniques (Self-Study)

14. Preview the results of the Data Validation node.

a. Click Data Validation 2 node in the job flow diagram.

b. Click Preview tab in the Details area.

Seven records were identified as UNKNOWN.

11.2 Working with the DataFlux IS Transformations 11-59

15. Add a Text File Output node to the job flow, and specify appropriate properties for it.

a. From the Nodes tab in the Toolbox panel, click to collapse the Profiling category of nodes.

b. Click next to Data Outputs to expand the Data Outputs category of nodes.

c. Click the Data Validation 2 node in the job flow.

d. Click the Text File Output node, and then select Insert Node Auto Append.
e. In the Text File Output Properties window, specify the attributes of the file that will be created.

1) Click next to the Output file field. The Save As window opens.

2) Navigate to S:\Workshop\lwdiwn (where X is a number provided by the instructor).

3) Type Unknown_Identities.txt as the value for the File name field.

4) Click . The Output file field should resemble the following:

11-60 Chapter 11 Implementing Data Quality Techniques (Self-Study)

f. Select Tab as the value for the Field delimiter field.

g. Click Include header row.

h. Click to close the Text File Output Properties window.

The job flow now should resemble the following:

11.2 Working with the DataFlux IS Transformations 11-61

16. Add an HTML Report following the Frequency Distribution node.

a. From the Nodes tab in the Toolbox panel, verify that the Data Outputs category of nodes is
expanded.

b. Click next to Data Outputs to expand the Data Outputs category of nodes.
c. Click the Frequency Distribution 1 node in the job flow.
d. Click the HTML Report node, and then select Insert Node Auto Append.
e. In the HTML Report Properties window, specify the attributes of the file that will be created.
1) Type Frequency Counts from Gender Analysis as the value for the Report
title field.

2) Click next to Report name field.

a) Type Contacts Gender Frequencies as the value for the Name field.

b) Click .

3) Accept the defaults for the Output fields area.

The final settings for the HTML Report Properties window should resemble the following:

f. Click to close the HTML Report Properties window.

11-62 Chapter 11 Implementing Data Quality Techniques (Self-Study)

The job flow now should resemble the following:

17. Save the Architect job.

a. Select File Save.
b. Type Contacts Table Analysis as the value for the Name field.

c. Click .
11.2 Working with the DataFlux IS Transformations 11-63

Creating a Job to be Used as a Service

This demonstration illustrates the creation of a dfPower Architect job using the External Data Provider.
This job will be uploaded to the DataFlux Integration Server and then processed with the DataFlux IS
Service transformation in SAS Data Integration Studio.
1. From within SAS Data Integration Studio, select Tools dfPower Tool dfPower Architect.
2. Add an External Data Provider node to the job flow.
a. Locate the Nodes tab in the Toolbox panel.

b. Click next to Data Inputs to expand the Data Inputs category of nodes.

c. Click the External Data Provider node, and then select Insert Node On Page.
(Alternatively, you can double-click on node and it will be automatically added to the job flow.)
11-64 Chapter 11 Implementing Data Quality Techniques (Self-Study)

The node is added to the job flow and an External Data Provider window is opened.
11.2 Working with the DataFlux IS Transformations 11-65

3. Specify four generic input fields.

a. Click to add the first of four generic fields.

b. Click to add the second of four generic fields.

c. Click to add the third of four generic fields.

d. Click to add the fourth of four generic fields.

11-66 Chapter 11 Implementing Data Quality Techniques (Self-Study)

e. Select the first generic field (Field) and change the value of the Field Name field to
Field_1.

f. Type 20 as the value for the Field Length field for Field_3.

g. Type 25 as the value for the Field Length field for Field_4.

h. Click .
11.2 Working with the DataFlux IS Transformations 11-67

4. Add a Basic Statistics node to the job flow, and specify appropriate properties for it.

a. From the Nodes tab in the Toolbox panel, click to collapse the Data Inputs category of nodes.

b. Click next to Profiling to expand the Profiling category of nodes.

c. Click the Basic Statistics node, and then select Insert Node Auto Append.
11-68 Chapter 11 Implementing Data Quality Techniques (Self-Study)

d. Specify the Basic Statistics properties.

1) Click Select All. All fields become selected.
2) Type 20 as the value for the Percentile interval.

e. Click .

5. Save the job.

a. Select File Save As.
b. Type LWDIWN EDP Basic Stats as the value for the Name field.

c. Click .

6. Select File Exit to close dfPower Architect.

11.2 Working with the DataFlux IS Transformations 11-69

Uploading Jobs and Services to DataFlux Integration Server

1. Select Start All Programs DataFlux Integration Server 8.1 Integration Server Manager.
11-70 Chapter 11 Implementing Data Quality Techniques (Self-Study)

2. Click the Profile Jobs tab.

3. Select Actions Upload. The Upload Profile Jobs window is displayed.

11.2 Working with the DataFlux IS Transformations 11-71

4. Click Contacts Profile in the Available list.

5. Click to move the Contacts Profile job to the Selected list.

6. Click to close the Upload Profile Jobs window.

The Profile Jobs tab now displays the uploaded job.

11-72 Chapter 11 Implementing Data Quality Techniques (Self-Study)

7. (Optional) Click the Architect Jobs tab.

8. (Optional) Select Actions Upload. The Upload Architect Jobs window is displayed.
11.2 Working with the DataFlux IS Transformations 11-73

9. (Optional) Click Contacts Table Analysis in the Available list.

10. (Optional) Click to move the Contacts Table Analysis job to the Selected list.

11. (Optional) Click to close the Upload Architect Jobs window.

The Architect Jobs tab now displays the uploaded job.

11-74 Chapter 11 Implementing Data Quality Techniques (Self-Study)

12. Click the Real-Time Services tab.

13. Select Actions Upload. The Upload Real-time Services window is displayed.
11.2 Working with the DataFlux IS Transformations 11-75

14. Click LWDIWN EDP Basic Stats in the Available list.

15. Click to move the LWDIWN EDP Basic Stats job to the Selected list.

16. Click to close the Upload Real-time Services window.

The Real-Time Services tab now displays the uploaded service.

17. Select File Exit to close DataFlux Integration Server Manager.

11-76 Chapter 11 Implementing Data Quality Techniques (Self-Study)

Registering DataFlux Integration Server in SAS Management

Console
1. Select Start All Programs SAS SAS Management Console 9.2.
2. Log on using Ahmeds credentials.
a. Verify that the connection profile is My Server.

Do not click Set this connection profile as the default.

b. Click to close the Connection Profile window and access the Log On window.

c. Type Ahmed as the value for the User ID field and Student1 as the value for the
Password field.

Do not click Save user ID and password in this profile.

d. Click to close the Log On window.

11.2 Working with the DataFlux IS Transformations 11-77

SAS Management Console is displayed.

11-78 Chapter 11 Implementing Data Quality Techniques (Self-Study)

3. Verify that the Plug-ins tab is selected in the Navigation pane.

4. Locate the Server Manager plug-in.
11.2 Working with the DataFlux IS Transformations 11-79

5. Right-click on the Server Manager plug-in and select New Server.

The New Server Wizard is displayed.

11-80 Chapter 11 Implementing Data Quality Techniques (Self-Study)

6. Select the type of server.

a. If necessary, expand Content Server grouping.
b. Click Http Server.

7. Click .
11.2 Working with the DataFlux IS Transformations 11-81

8. Specify the name and description for the new server.

a. Type DataFlux Enterprise Integration Server as the value for the Name field.

b. Type Integration Server running on localhost as the value for the

Description field.

9. Click .
11-82 Chapter 11 Implementing Data Quality Techniques (Self-Study)

10. Specify server properties.

a. Type DataFlux 8.1.1 as the value for the Software version field.

b. Type DataFlux as the value for the Vendor field.

c. Click in the Base Path(s) area.

1) Verify that / is the value for the Base Path field.

2) Type Default Base Path as the value for the Description field.

3) Click .

The Selected items area displays the new base path.

d. Click next to the Application Server Type field.

e. Select DataFlux Integration Server.

11.2 Working with the DataFlux IS Transformations 11-83

The final server properties should resemble the following:

11. Click .
11-84 Chapter 11 Implementing Data Quality Techniques (Self-Study)

12. Specify connection properties.

a. Click next to the Authentication domain field.

b. Select DefaultAuth.

c. Type localhost as the value for the Host name field.

d. Verify the value for the Port number field is set to 21036.

The final settings for connection properties should resemble the following:
11.2 Working with the DataFlux IS Transformations 11-85

13. Click .

14. Review the final settings in the Finish window.

15. Click .
11-86 Chapter 11 Implementing Data Quality Techniques (Self-Study)

The Server Manager plug-in now displays the new server.

16. Select File Exit to close SAS Management Console.

11.2 Working with the DataFlux IS Transformations 11-87

Using the DataFlux IS Job Transformation

1. If necessary, access SAS Data Integration Studio using Brunos credentials.

a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
b. Verify that the connection profile is My Server.

c. Click to close the Connection Profile window and access the Log On window.

d. Type Bruno as the value for the User ID field and Student1 as the value for the Password
field.

e. Click to close the Log On window.

2. Create a new folder.

a. Click the Folders tab in the tree view area.
b. Right-click on Data Mart Development folder and select New Folder.
c. Type DataFlux IS Examples as the name for the new folder.
11-88 Chapter 11 Implementing Data Quality Techniques (Self-Study)

3. Create initial job metadata that will use DataFlux IS Job transformation.
a. Right-click on DataFlux IS Examples folder and select New Job.
b. Type LWDIWN - Run Profile and Architect Jobs as the value for the Name field.

c. Verify that /Data Mart Development/DataFlux IS Examples is the value for the Location
field.

d. Click .
11.2 Working with the DataFlux IS Transformations 11-89

4. Add the DataFlux IS Job transformation to the job flow.

a. Click the Transformation tab.
b. Click to expand the Data Quality grouping of transformations.
c. Drag the DataFlux IS Job transformation to the Diagram tab of the Job Editor.

d. Right-click on the DataFlux IS Job transformation and select Properties.

e. Type Contacts Profile at the end of the default value for the Name field.
11-90 Chapter 11 Implementing Data Quality Techniques (Self-Study)

f. Click the Job tab.

g. Click under Job type and select Profile.

The Contact Profile.pfi job file should appear in the Job field.

h. Click next to Output destination and select File.

i. Type S:\Workshop\lwdiwn\Contacts Profile.pfo as the value for the

File/repository name field.

j. Type Contacts Report as the value for the Description field.

k. Click .

The diagram tab of the Job Editor window now displays the following:
11.2 Working with the DataFlux IS Transformations 11-91

5. (Optional) Add a second DataFlux IS Job transformation to the job flow.

a. (Optional) Click the Transformation tab.
b. (Optional) If necessary, click to expand the Data Quality grouping of transformations.
c. (Optional) Drag a second DataFlux IS Job transformation to the Diagram tab of the Job Editor
and drop it next to the first DataFlux IS Job transformation.

d. (Optional) Right-click on the second DataFlux IS Job transformation and select Properties.
e. (Optional) Type Architect Reports at the end of the default value for the Name field.
11-92 Chapter 11 Implementing Data Quality Techniques (Self-Study)

f. (Optional) Click the Job tab.

g. (Optional) Verify that Architect is the value for the Job type field.

h. (Optional) Verify that Contacts Table Analysis.dmc is the value for the Job field.

i. (Optional) Click .

The diagram tab of the Job Editor window now displays the following:
11.2 Working with the DataFlux IS Transformations 11-93

6. Select File Save to save the updated job.

7. Run the job by clicking in the tool set from the Job Editor window.

The Details area shows that the job ran successfully.

For DataFlux IS Job transformations, completed successfully simply means that the process was
passed off successfully to the DataFlux Integration Server.
8. Select File Close to close the Job Editor window.
11-94 Chapter 11 Implementing Data Quality Techniques (Self-Study)

9. Access DataFlux Integration Server Manager and verify the jobs ran successfully.
a. Select Start All Programs DataFlux Integration Server 8.1
Integration Server Manager.

b. Verify that both the Profile job and the Architect job completed. The bottom portion of the
DataFlux Integration Server Manager displays this information on the Status of All Jobs tab.

c. Select File Exit to close DataFlux Integration Server Manager.

10. View the generated profile report.
a. From SAS Data Integration Studio, select Tools dfPower Tool dfPower Studio.
b. In the left panel, click to expand Management Resources.
11.2 Working with the DataFlux IS Transformations 11-95

c. Double-click the DataFlux Default management resources location.

The main area updates with a DataFlux Default tab.

d. Click to expand the reports folder under DataFlux Default.

11-96 Chapter 11 Implementing Data Quality Techniques (Self-Study)

e. Right-click the profile folder and select Import.

The Import File window opens as displayed.

f. Click next to Source field.

g. Navigate to S:\Workshop\lwdiwn.

h. Select All Files (*.*) as the value for the Files of type field.

i. Select Contacts Profile.pfo.

11.2 Working with the DataFlux IS Transformations 11-97

j. Click . The path and filename update in the Import File window.

k. Click .

The Contacts Profile report is imported to the DataFlux Default management resources location.
11-98 Chapter 11 Implementing Data Quality Techniques (Self-Study)

l. Double-click the Contacts Profile report and dfPower Profile(Viewer) is invoked with the profile
report.

m. When done viewing, select File Exit to close dfPower Profile(Viewer).

11. (Optional) View the generated HTML report (from dfPower Architect job).
a. (Optional) Expand the DataFlux Default reports architect folders.
11.2 Working with the DataFlux IS Transformations 11-99

b. (Optional) Double-click the Contacts Gender Frequencies report.

A browser opens to display the generated HTML frequency report.

c. (Optional) When done viewing, select File Close to close the browser.
12. (Optional) View the generated text file output.
a. (Optional) Open a Windows Explorer by selecting Start All Programs Accessories
Windows Explorer.
b. (Optional) Navigate to S:\Workshop\lwdiwn.
11-100 Chapter 11 Implementing Data Quality Techniques (Self-Study)

c. (Optional) Double-click the Unknown_Identities.txt file. The file opens in a Notepad window as
displayed.

d. (Optional) When done viewing, select File Exit to close the Notepad window.
11.2 Working with the DataFlux IS Transformations 11-101

Using the DataFlux IS Service Transformation

The job being created will use the previously registered DataFlux Integration Server service. Before
taking advantage of the DataFlux IS Service transformation, a table needs to be defined in metadata (this
table will be the source data for the DataFlux IS Service transformation.

Register the Contacts Table from the DataFlux Sample Database

1. If necessary, access SAS Data Integration Studio using Brunos credentials.

a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
b. Verify that the connection profile is My Server.

c. Click to close the Connection Profile window and access the Log On window.

d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.

e. Click to close the Log On window.

11-102 Chapter 11 Implementing Data Quality Techniques (Self-Study)

2. Register the Contacts table in metadata.

a. Click the Folders tab.
b. Right-click on the DataFlux IS Examples folder and select Register Tables.
c. Expand the ODBC Sources folder and select ODBC Microsoft Access as the type of table that
needs to be registered.

d. Click .
11.2 Working with the DataFlux IS Transformations 11-103

e. The needed SAS Library does not exist in metadata. Click to invoke the New Library
wizard.
1) Type DIWN DataFlux Sample Database as the value for the Name field.

2) Click .
11-104 Chapter 11 Implementing Data Quality Techniques (Self-Study)

3) Double-click SASApp to move it from the Available servers list to the Selected
servers list.

4) Click .
11.2 Working with the DataFlux IS Transformations 11-105

5) Type diwndfdb as the value for the Libref field.

6) Click .
11-106 Chapter 11 Implementing Data Quality Techniques (Self-Study)

7) The needed Database Server does not exist in metadata. Click to invoke the New
Server wizard.
a) Type DIWN DataFlux Sample Database Server as the value for the Name
field.

b) Click .
11.2 Working with the DataFlux IS Transformations 11-107

c) Select ODBC Microsoft Access as the value for the Data Source Type field.

d) Click .
11-108 Chapter 11 Implementing Data Quality Techniques (Self-Study)

e) Click Datasrc.
f) Type "DataFlux Sample" as the value for the Datasrc field.

g) Click .
11.2 Working with the DataFlux IS Transformations 11-109

h) Review the final settings for the New Server Wizard.

i) Click .
11-110 Chapter 11 Implementing Data Quality Techniques (Self-Study)

8) Verify that the newly defined database server (DIWN DataFlux Sample Database Server)
appears in the Database Server field.

9) Click .
11.2 Working with the DataFlux IS Transformations 11-111

10) Review the final settings for the New Library Wizard.

11) Click .
11-112 Chapter 11 Implementing Data Quality Techniques (Self-Study)

f. Verify that the newly defined SAS library (DIWN DataFlux Sample Database) appears in the
SAS Library field. Also, the value specified for the Datasrc (DataFlux Sample)
information for the library server should appear in the Data Source field.

g. Click .
11.2 Working with the DataFlux IS Transformations 11-113

h. Click the Contacts table.

i. Click .
11-114 Chapter 11 Implementing Data Quality Techniques (Self-Study)

j. Review the final settings for the Register Tables wizard.

k. Click .

3. Update the properties for the Contacts table.

a. Right-click on the Contacts table (in the Data Mart Development DataFlux IS Examples
folder on the Folders tab) and select Properties.
b. Type DIWN Contacts as the value for the Name field on the General tab.

c. Click the Columns tab.

d. Click the DATE column, hold down the CTRL key and click the MATCH_CD column, hold
down the CTRL key and click the DELETE_FLG column (all three columns should be selected).

e. Click on the Columns tab tool set.

f. Click to close the Properties window.

11.2 Working with the DataFlux IS Transformations 11-115

4. Create initial job metadata to use the DataFlux IS Service transformation.

a. Right-click on DataFlux IS Examples folder and select New Job.
b. Type LWDIWN - Run Architect Service as the value for the Name field.

c. Verify that /Data Mart Development/DataFlux IS Examples is the value for the Location
field.

d. Click .

5. Create the needed job flow.

a. From the Folders tab, drag the DIWN Contacts table to the Diagram tab of the Job Editor
window.
b. Click the Transformations tab.
c. Click to expand the Data grouping of transformations.
d. Drag the Extract transformation to the Diagram tab of the Job Editor window, and place it next
to the DIWN Contacts table.

e. Connect the DIWN Contacts table object to the Extract transformation.

11-116 Chapter 11 Implementing Data Quality Techniques (Self-Study)

f. Right-click on the Extract transformation and select Properties.

g. Click the Mappings tab.
h. On the target table side, change the name of the STATE column to Field_1.

i. On the target table side, change the name of the PHONE column to Field_2.

j. On the target table side, change the name of the OS column to Field_3.

k. On the target table side, change the name of the DATABASE column to Field_4.
11.2 Working with the DataFlux IS Transformations 11-117

l. Remove the remaining target columns from the target table side.
1) Click ID column, hold down the SHIFT key and click the CITY column.

2) Click Delete Target Columns from the tool set of the Mappings tab.

m. Click to close the Extract Properties window.

n. Click the Transformation tab.

o. Click to expand the Data Quality grouping of transformations.
p. Drag the DataFlux IS Service transformation to the Diagram tab of the Job Editor, placing this
transformation next to the Extract transformation.
q. Connect the temporary table from the Extract transformation to the DataFlux IS Service
transformation.

r. Right-click on the DataFlux IS Service transformation and select Properties.

s. Click the Service tab.
t. Verify that the correct server and service information is specified.

u. Click to close the DataFlux IS Service Properties window.

11-118 Chapter 11 Implementing Data Quality Techniques (Self-Study)

6. Run the job to this point.

a. Click the DataFlux IS Service transformation on the Diagram tab of the Job Editor.

b. Click from the tool set of the Job Editor window.

c. Right-click on the temporary table object associated with the DataFlux IS Service transformation
and select Open.
A Warning window opens:

d. Click .
11.2 Working with the DataFlux IS Transformations 11-119

The profiling metrics requested in the Architect job are displayed in the View Data window.

e. Select File Close to close the View Data window.

f. Select File Save to save the job metadata.
g. Select File Close to close the Job Editor window.
11-120 Chapter 11 Implementing Data Quality Techniques (Self-Study)

Exercises

1. Working with DataFlux IS Service Transformation

Create a dfPower Architect job.
Use the External Data Provider as a data source. Define two generic fields, Field_1 and
Field_2, both character of length 15.
Add a Basic Pattern Analysis node to the job flow using both fields for the analysis.
Save the job, naming it LWDIWN EDP Basic Pattern Analysis.

Upload the new dfPower Architect job to the DataFlux Integration Server as a service.
Create a job in SAS Data Integration Studio.
Name the job LWDIWN Architect Service Exercise.
Place the job in \Data Mart Development\DataFlux IS Examples folders.
Use the DIWN Contacts table as a source table.
Use the Extract transformation following the source table, renaming the target columns for STATE
and PHONE to Field_1 and Field_2. Remove the remaining target columns.
Add a DataFlux IS Service transformation to the job flow, connecting the output of the Extract
transformation to this new transformation. Be sure to specify the new service in the Properties
window.
Save and then run the job.
View the output to verify that the dfPower Architect job did produce basic pattern analysis results.
Chapter 12 Deploying Jobs

12.1 Overview of Deploying Jobs ....................................................................................... 12-3

12.2 Deploying Jobs for Scheduling ................................................................................... 12-5

Demonstration: Scheduling Orion Jobs ............................................................................. 12-13

12.3 Deploying Jobs as Stored Processes ...................................................................... 12-38

Demonstration: Creating SAS Stored Processes from Report Jobs.................................. 12-43
12-2 Chapter 12 Deploying Jobs
12.1 Overview of Deploying Jobs 12-3

12.1 Overview of Deploying Jobs

Objectives
Discuss the types of job deployment available
for SAS Data Integration Studio jobs.

About Deploying Jobs

In a production environment, SAS Data Integration Studio
jobs must often be executed outside of SAS Data
Integration Studio.
Jobs can be deployed:
for scheduling

as a stored process

as a Web service.

Under change management, only administrators

can deploy jobs.

4
12-4 Chapter 12 Deploying Jobs

Deployment Techniques
You can also deploy a job in order to accomplish the
following tasks:
Divide a complex process flow into a set of smaller
flows that are joined together and can be executed
in a particular sequence.
Execute a job on a remote host.

5
12.2 Deploying Jobs for Scheduling 12-5

12.2 Deploying Jobs for Scheduling

Objectives
Provide an overview of the scheduling process.
Discuss the types of scheduling servers.
Discuss the Schedule Manager in SAS Management
Console.
Discuss batch servers.

Scheduling Requirements
The SAS scheduling tools enable you to automate the
scheduling and execution of SAS jobs across your
enterprise computing environment. Scheduling requires
four main components:
SAS Application

Schedule Manager

Scheduling Server

Batch Server

9
12-6 Chapter 12 Deploying Jobs

Steps for Scheduling

Step 1 Step 2 Step 3 Step 4
SAS Data Scheduling Manager
Integration plug-in in SAS Scheduling Batch
Studio Management Console server servers

Batch server 1
Flow_ABC Flow_ABC Command line 1 -
Job A
event Job_A event Job_A Command line 2 -
Deployment Job B
directory event Job_B event Job_B
Batch server 2
event Job_C event Job_C
Command line 3 -
Job C

Step 1: A SAS application (such as SAS Data Integration Studio) creates a job that needs to be
scheduled. If the job was created by SAS Data Integration Studio, the job is placed in a
deployment directory.

Step 2: A user set up to administer scheduling can use the Schedule Manager plug-in in SAS
Management Console to prepare the job for scheduling, or users can schedule jobs directly
from other SAS applications. The job is added to a flow, which can include other jobs and
events that must be met (such as the passage of a specific amount of time or the creation of a
specified file). The Schedule Manager also specifies which scheduling server should be used
to evaluate the conditions in the flow and which batch server should provide the command to
run each job. The type of events you can define depends on the type of scheduling server you
choose. When the Schedule Manager has defined all the conditions for the flow, the flow is
sent to the scheduling server, which retrieves the command that is needed to run each job from
the designated batch server.

Step 3: The scheduling server evaluates the conditions that are specified in the flow to determine
when to run a job. When the events specified in the flow for a job are met, the scheduling
server uses the command obtained from the appropriate batch server to run the job. If you
have set up a recurring scheduled flow, the flow remains on the scheduling server and the
events continue to be evaluated.

Step 4: The scheduling server uses the specified command to run the job in the batch server, and then
the results are sent back to the scheduling server.
12.2 Deploying Jobs for Scheduling 12-7

Scheduling Servers
SAS supports scheduling through three types
of scheduling servers:
Platform Process Manager server

Operating system scheduling server

In-Process scheduling server

The type you choose depends on the level of scheduling

functions your organization requires, the budget and
resources available to support a scheduling server, and
the applications that will be creating jobs to be scheduled.
When you create a flow in SAS Management Consoles
Schedule Manager, you specify which scheduling server
the flow is to be associated with. Schedule Manager then
passes the flow information to the appropriate scheduling
server.
11

You can create a definition for a scheduling server by using the Server Manager plug-in in SAS
Management Console or an application that directly schedules jobs.
12-8 Chapter 12 Deploying Jobs

Platform Process Manager Server

The Platform Process Manager server, which is part of Platform Suite for SAS, provides full-featured
enterprise scheduling capabilities, including features such as workload prioritization and policy-based
scheduling. The server enables you to schedule jobs using a variety of recurrence criteria and
dependencies on other jobs, time events, or file events. You can use the Flow Manager application (also
part of Platform Suite for SAS) to manage scheduled jobs, including deleting and stopping previously
scheduled jobs.
Because Platform Suite for SAS is a separate application, it requires an additional license fee. It also
requires you to perform additional tasks to install, configure, and maintain all components of the
application. However, the components included with the application also provide functions such as load
balancing and submission of jobs to a grid.
The metadata for a Process Manager Server includes the following information:
the network address or host name of a machine
the port number for the server

Operating System Scheduling Server

Operating system scheduling provides the ability to schedule jobs through the services provided through a
servers operating system. Using operating system scheduling provides a basic level of scheduling at no
additional cost, because the service is provided by software you already own. However, this type of
scheduling does not support advanced scheduling capabilities, such as the use of many types of
dependencies. The specific scheduling functions that are supported vary according to the operating system
used, which can make it more difficult to set up consistent scheduling criteria on several servers.
Managing scheduled jobs requires you to issue operating system commands, rather than using a graphical
user interface. The metadata for an operating system scheduling server includes the following:
the network address of a machine
the port number for the server
the directory on the server where scheduled flows should be stored (control directory)
the command to start a SAS session on the server

In-Process Scheduling Server

In-process scheduling provides the ability to schedule jobs from certain Web-based SAS applications
without using a separate scheduling server. With in-process scheduling, the scheduling functions run as a
process within the application. Although in-process scheduling is supported only for certain applications
(such as SAS Web Report Studio), it offers basic scheduling capabilities without incurring any additional
cost or requiring many installation or configuration tasks. Because an in-process scheduling server runs as
part of the application, this type of scheduling also eliminates the need for the application to authenticate
scheduled jobs to a separate server. However, the application must be running at the time the scheduled
job attempts to run.
12.2 Deploying Jobs for Scheduling 12-9

Schedule Manager
The Schedule Manager plug-in for SAS Management
Console is a user interface that enables you to create
flows, which consist of one or more SAS jobs. Each job
within a flow can be triggered to run based on criteria
such as a date and time, the state of a file on the file
system, or the status of another job within the flow.
The available scheduling criteria depend on the type
of scheduling server used.

Schedule Manager is designed as a scheduler-neutral interface. When you create a flow, you specify
which scheduling server that the flow is to be associated with. Schedule Manager converts the flow
information to the appropriate format and submits it to the scheduling server (the Platform Computing
server, an operating system scheduling server, or an in-process scheduling server).
12-10 Chapter 12 Deploying Jobs

Batch Servers
Batch servers provide the command needed to run the
programs that have been submitted for scheduling.
Several batch server types are supported, each
of which provides the command to run a scheduled
SAS job from a specific application in a specific
environment.
The command is included in the metadata definition
for each server.
The batch servers commands are independent
of the type of scheduling server used.
Batch server metadata objects are components of the
SAS Application Server (for example, SASApp), and
can be created by using the Server Manager plug-in in
SAS Management Console.
13

The following types of batch servers are supported:

SAS Java Batch server Used for scheduling Java applications that are supplied by SAS and
that are invoked and launched by the standard java.exe shell. This
server definition is used for reports that are created by applications
such as SAS Web Report Studio and campaigns created by SAS
Marketing Automation.
SAS DATA Step Batch server Used for scheduling SAS programs that contain DATA step and
procedure statements. These programs include jobs that are created in
and deployed from SAS Data Integration Studio. The metadata
definition for a SAS DATA Step Batch server must include a SAS
start command that will run in batch mode.
SAS Generic Batch server Used for scheduling jobs from Java-based SAS applications that are
not specifically supported by the SAS Java Batch server types. If a
SAS application adds support for scheduling jobs before a new type
of SAS Java Batch server has been created, you can use the SAS
Generic Batch server to provide a command to run jobs from the
application.
12.2 Deploying Jobs for Scheduling 12-11

Job Metadata
Job metadata becomes available to the Schedule
Manager when you use a SAS application such as
SAS Data Integration Studio to schedule a job.
The job metadata includes the following information:
the command that is to be used to execute the job

the name of the SAS batch server that is associated

with the job
the deployment directory that is associated with the
job (required only for SAS DATA Step Batch servers)
the filename of the executable program (required only
for SAS DATA Step batch servers)

Flow Metadata
Flow metadata is created when you use Schedule
Manager to create a flow. The flow metadata includes
the following information:
the name of the scheduling server that is to execute
the jobs in the flow
the triggers and dependencies that are associated
with the jobs in the flow
Depending on the scheduling server that the user
specifies, Schedule Manager converts the flow metadata
to the appropriate format and submits it to the scheduling
server.

15
12-12 Chapter 12 Deploying Jobs

Platform Suite for SAS

Platform Suite for SAS is an integrated enterprise job
scheduler from Platform Computing Corporation. Platform
Suite for SAS is specifically designed to manage job flows
that are generated by SAS software, and includes the
following components:
Process Manager Server

Platform Flow Manager

Platform Calendar Editor

Platform Load Sharing Facility (LSF)

Platform Grid Management Services

Process Manager Server Controls the submission of jobs to Platform Load Sharing
Facility (LSF) and manages all dependencies among jobs.

Platform Flow Manager Provides a visual representation of flows that have been created
for a Process Manager Server. These include flows that were
created and scheduled in the SAS Management Console
Schedule Manager, as well as reports that have been scheduled
through SAS Web Report Studio.
Platform Flow Manager provides information about each flows
status and associated dependencies. You can view or update the
status of jobs within a flow, and you can run or rerun a single
job regardless of whether the job failed or completed
successfully.

Platform Calendar Editor Is a scheduling client for a Process Manager Server. This client
enables you to create new calendar entries. You can use it to
create custom versions of calendars that are used to create time
dependencies for jobs that are scheduled to run on the server.

Platform Load Sharing Facility (LSF) Controls the submission of jobs to Platform Load Sharing
Facility (LSF) and manages all dependencies among jobs.

Platform Grid Management Services Manages jobs that are scheduled in a grid environment. This
software collects information on the jobs that are running on the
grid and the nodes to which work has been distributed. It makes
the information available to the Grid Manager plug-in for SAS
Management Console. You can use Grid Manager to view and
manage the grid workload information.
12.2 Deploying Jobs for Scheduling 12-13

Scheduling Orion Jobs

1. If necessary, access SAS Data Integration Studio using Brunos credentials.

a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
b. Verify that the connection profile is My Server.

c. Click to close the Connection Profile window and access the Log On window.

d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.

e. Click to close the Log On window.

2. Locate the Orion Jobs folder.

a. Click the Folders tab.
b. Expand Data Mart Development.
c. Expand Orion Jobs.
12-14 Chapter 12 Deploying Jobs

3. Deploy the DIFT Populate Order Fact Table for scheduling.

a. Right-click DIFT Populate Order Fact Table and select Scheduling Deploy.
b. Verify that SASApp SAS DATA Step Batch Server is selected as the value for the
Batch Server field.

c. Click next to the Deployment Directory field.

1) Type Orion Star Jobs as the value for the Name field.

2) Click next to the Directory field.

3) Navigate to S:\Workshop\dift in the Select a directory window.

4) Click to create a new folder.

5) Type OrionStarJobs and press ENTER.

6) Click the new folder, OrionStarJobs.

7) Click .
12.2 Deploying Jobs for Scheduling 12-15

The New directory window should resemble the following:

8) Click .

The Deployment Directory field should resemble the following:

d. Accept the default value for the Deployed Job Name field.

e. Verify that /Data Mart Development/Orion Jobs is the value for the Location field.

The final settings should resemble the following:

f. Click to save the information and close the Deploy a job for scheduling window.
12-16 Chapter 12 Deploying Jobs

An information message appears:

g. Click .

The Orion Jobs folders shows that the DIFT Populate Order Fact Table job icon has
been decorated to signify scheduling.
Also, a new objects appears in the same folder, DIFT_Populate_Order_Fact_Table.

h. Single-click on the new object, DIFT_Populate_Order_Fact_Table. The status bar identifies

this object as a deployed job.
12.2 Deploying Jobs for Scheduling 12-17

i. Right-click on the new object, DIFT_Populate_Order_Fact_Table, and select Properties.

12-18 Chapter 12 Deploying Jobs

j. Click the Scheduling Details tab.

k. Click to close the Properties window.

12.2 Deploying Jobs for Scheduling 12-19

l. Right-click on the job object, DIFT Populate Order Fact Table. There are more options
available:
12-20 Chapter 12 Deploying Jobs

4. Deploy DIFT Populate Old and Recent Orders Tables for scheduling.

a. Right-click DIFT Populate Old and Recent Orders Tables and select Scheduling Deploy.
b. Verify that SASApp SAS DATA Step Batch Server is selected as the value for the
Batch Server field.

c. Click next to the Deployment Directory field and select Orion Star Jobs.

d. Accept the default value for the Deployed Job Name field.

e. Verify /Data Mart Development/Orion Jobs is the value for the Location field.

The final settings should resemble the following:

f. Click to save the information and close the Deploy a job for scheduling window. An
information message appears.

g. Click to close the information window.

5. Deploy DIFT Populate Customer Dimension Table for scheduling.

a. Right-click DIFT Populate Customer Dimension Table and select Scheduling Deploy.
b. Verify that SASApp SAS DATA Step Batch Server is selected as the value for the
Batch Server field.

c. Click next to the Deployment Directory field and select Orion Star Jobs.

d. Accept the default value for the Deployed Job Name field.

e. Verify /Data Mart Development/Orion Jobs is the value for the Location field.

f. Click to save the information and close the Deploy a job for scheduling window. An
information message appears.

g. Click to close the information window.

12.2 Deploying Jobs for Scheduling 12-21

6. Deploy DIFT Populate Organization Dimension Table for scheduling.

a. Right-click DIFT Populate Organization Dimension Table and select Scheduling Deploy.
b. Verify that SASApp SAS DATA Step Batch Server is selected as the value for the
Batch Server field.

c. Click next to the Deployment Directory field and select Orion Star Jobs.

d. Accept the default value for the Deployed Job Name field.

e. Verify /Data Mart Development/Orion Jobs is the value for the Location field.

f. Click to save the information and close the Deploy a job for scheduling window. An
information message appears.

g. Click to close the information window.

7. Deploy DIFT Populate Product Dimension Table for scheduling.

a. Right-click DIFT Populate Product Dimension Table and select Scheduling Deploy.
b. Verify that SASApp SAS DATA Step Batch Server is selected as the value for the
Batch Server field.

c. Click next to the Deployment Directory field and select Orion Star Jobs.

d. Accept the default value for the Deployed Job Name field.

e. Verify /Data Mart Development/Orion Jobs is the value for the Location field.

f. Click to save the information and close the Deploy a job for scheduling window. An
information message appears.

g. Click to close the information window.

8. Deploy DIFT Populate Time Dimension Table for scheduling.

a. Right-click DIFT Populate Time Dimension Table and select Scheduling Deploy.
b. Verify that SASApp SAS DATA Step Batch Server is selected as the value for the
Batch Server field.

c. Click next to the Deployment Directory field and select Orion Star Jobs.

d. Accept the default value for the Deployed Job Name field.

e. Verify /Data Mart Development/Orion Jobs is the value for the Location field.

f. Click to save the information and close the Deploy a job for scheduling window. An
information message appears.

g. Click to close the information window.

12-22 Chapter 12 Deploying Jobs

The Orion Jobs folder should now resemble the following:

9. Access Windows Explorer and verify the creation of the code files.
a. Select Start All Programs Windows Explorer.
b. Navigate to S:\Workshop\dift\OrionStarJobs.
c. Verify that .sas files were created for each of the deployed jobs.
12.2 Deploying Jobs for Scheduling 12-23

d. Select File Close to close the Windows Explorer.

10. Access SAS Management Console using Ahmeds credentials.
a. Select Start All Programs SAS SAS Management Console 9.2.
b. Verify that the connection profile is My Server.

c. Click to close the Connection Profile window and access the Log On window.

d. Type Ahmed as the value for the User ID field and Student1 as the value for the
Password field.

e. Click to close the Log On window.

11. Create a new flow using the Schedule Manager plug-in.

a. If necessary, click the Plug-ins tab.
b. Locate the Schedule Manager plug-in.
12-24 Chapter 12 Deploying Jobs

c. Right-click Schedule Manager and select New Flow.

12.2 Deploying Jobs for Scheduling 12-25

The deployed jobs are displayed and available to be part of the new flow.

d. Type OrionStarJobs as the value for the Name field.

e. Click next to the Scheduling Server field and select Platform Process Manager.

f. Click to move all items from the available list to the selected list.
12-26 Chapter 12 Deploying Jobs

g. Click .

The new flow appears under the Schedule Manager plug-in.

12.2 Deploying Jobs for Scheduling 12-27

12. Establish dependencies within the job flow.

a. Right-click on OrionStarJobs flow and select Edit Flow. The visual flow editor is available to
create dependencies.

b. Click to verify that all six deployed jobs are found in the visual flow editor.
12-28 Chapter 12 Deploying Jobs

c. Click on the DIFT_Populate_Order_Fact_Table connection selector and drag to connect it to

DIFT_Populate_Old_and_Recent_Orders_Tables. The New Job Event window is displayed.

d. Accept Completes successfully as the value for the Event type field.

e. Click .
12.2 Deploying Jobs for Scheduling 12-29

f. Click from the toolbar.

g. Click .

h. Select Actions Add Gate.

i. Click Run the deployed job only when all of the conditions occur.

j. Click .
12-30 Chapter 12 Deploying Jobs

A gate node is displayed in the upper-left corner.

k. Drag the gate node so that it is to the right of and in between the two jobs
DIFT_Populate_Organization_Dimension_Table and
DIFT_Populate_Time_Dimension_Table.

l. Connect DIFT_Populate_Organization_Dimension_Table to the gate node.

m. Accept Completes successfully as the value for the Event type field.

n. Click .

o. Connect DIFT_Populate_Time_Dimension_Table to the gate node.

12.2 Deploying Jobs for Scheduling 12-31

p. Accept Completes successfully as the value for the Event type field.

q. Click .

r. Connect the gate node to the DIFT_Populate_Product_Dimension_Table job.

12-32 Chapter 12 Deploying Jobs

s. Click from the toolbar.

t. Click to close the flow view.

12.2 Deploying Jobs for Scheduling 12-33

13. Verify the dependencies established using visual flow editor with the standard interface.
a. Locate the deployed job DIFT_Populate_Time_Dimension_Table.
b. Right-click and select Manage Dependencies.

c. Click .
12-34 Chapter 12 Deploying Jobs

d. Locate the deployed job DIFT_Populate_Old_and_Recent_Orders_Tables.

e. Right-click and select Manage Dependencies.

f. Click .
12.2 Deploying Jobs for Scheduling 12-35

14. Schedule the flow.

a. Right-click on OrionStarJobs job flow and select Schedule Flow.
b. Type .\lsfadmin as the value for the User ID.\field.

c. Type Student1 as the value for the Password field.

d. Click .

e. Click next to the Trigger field and select Manually in Scheduling Server.

f. Click . A message is displayed confirming the successful scheduling.

g. Click to close the information window.

15. Access and trigger the flow using Flow Manager.

a. Select Start All Programs Platform Process Manager Flow Manager.
b. Type .\lsfadmin as the value for the User name field.

c. Type Student1 as the value for the Password field.

d. Click .
12-36 Chapter 12 Deploying Jobs

e. On the By User tab, expand .\lsfadmin and click OrionStarJobs.

f. Select Flow Open.

g. Select Flow Trigger Now.

12.2 Deploying Jobs for Scheduling 12-37

h. Select View Refresh (after a period of time).

i. Select File Close to close the flow view.

j. Select File Exit to close Flow Manager.
16. Select File Exit to close SAS Management Console.
12-38 Chapter 12 Deploying Jobs

12.3 Deploying Jobs as Stored Processes

Objectives
Describe SAS Stored Processes.
List the applications that can be used to create
and execute stored processes.
Describe deployment of SAS Data Integration Studio
jobs as a SAS Stored Process.

What Is a SAS Stored Process?

A SAS Stored Process has the following characteristics:
is a SAS program that is hosted on a server and
described by metadata
can be executed by many of the platform for
SAS Business Analytics applications
is similar in concept to programs run by SAS/IntrNet
software, but more versatile because of the underlying
metadata and security support
consists of a .sas file along with a metadata definition
that describes how the stored process should execute

21
12.3 Deploying Jobs as Stored Processes 12-39

Comparing a Program to a Stored Process

Consider the difference between these two methods
of executing SAS programs:

SAS program SAS program as

on local machine a stored process

One programmer or group One programmer or group

is generally responsible might be responsible for
for developing, testing, developing and testing the
and executing the stored process. However,
program. The programmer many different applications
or group also manages can request that the stored
distribution of the results. process be executed and
the results returned directly
to the application.
22

Advantages of Stored Processes

SAS Stored Processes accept user input through
parameters. This allows for code that is not static
and can be easily run with different parameter values.
SAS Stored Process code is not embedded into
client applications. SAS Stored Process programs
can be invoked from a variety of applications,
including SAS Web applications, SAS Enterprise
Guide, Microsoft Office, and SAS Visual BI
(JMP software).
Every application that invokes a stored process
always gets the latest version available.
Stored process programs ensure security and
application integrity because the code is contained
on a server and can be centrally maintained and
23 managed from the server.
12-40 Chapter 12 Deploying Jobs

Where Is a Stored Process Defined and Stored?

A stored process is basically a SAS program defined
by metadata and stored in a source code repository.
.SAS file
proc report data=sashelp.class nowd;
column name age height;
define name / order 'Name';
define age / display;
define height / display;
run;

stored process
stored process
Metadata
stored process
Server
stored process

Metadata Repository
Source Code Repository
24

Stored Processes Metadata

When a stored process is created and registered, there
are many different pieces of information that are stored
in the metadata, including the following:
attributes that describe the stored process
information on how and where the stored process
is executed
prompts and attributes that can be surfaced
to the user
instructions for how to return results to the application

25
12.3 Deploying Jobs as Stored Processes 12-41

What Can a Stored Process Access?

A stored process can access almost any SAS data source
or external file.

SAS
Stored
Process

SAS Data External

Sources Files

What Can a Stored Process Create?

A stored process can create new data sets, files,
and report output in a variety of formats.

SAS
Stored
Results
SAS ODS Process Package
Output

SAS Data External

Sources SAS Catalog E-mail Files
Entry
27
12-42 Chapter 12 Deploying Jobs

Where Can a Stored Process Be Used?

Stored processes can be invoked from several of the
SAS platform applications.

SAS
SAS Information
SAS Add-In for
Microsoft Office
Stored Delivery Portal

Process

SAS SAS Web

Enterprise Guide Report Studio

SAS Information SAS Visual BI

Map Studio (JMP Software)
28

Deploying Jobs as SAS Stored Processes

You can select a job in the Inventory tree or the Folders
tree and deploy it as a SAS stored process. Code is
generated for the stored process and the code is saved
to a file in a source repository. Metadata about the stored
process is saved to the current metadata server.

29
12.3 Deploying Jobs as Stored Processes 12-43

Creating SAS Stored Processes from Report Jobs

1. If necessary, access SAS Data Integration Studio using Brunos credentials.

a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
b. Verify that the connection profile is My Server.

c. Click to close the Connection Profile window and access the Log On window.

d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.

e. Click to close the Log On window.

2. Locate the Orion Reports Extract and Summary folder.

a. Click the Folders tab.
b. Expand Data Mart Development.
c. Expand Orion Reports.
d. Expand Extract and Summary.
3. Open the job for editing.
a. Double-click DIFT Create Report for US Customer Order Information.
b. Right-click on the Summary Statistics transformation and select Properties.
c. Click the Options tab.
d. Click ODS options in the left selection pane.
e. For ODS result, set the value to a blank.

f. Click to close the properties window.

g. Select File Save to save the changes to the job.

h. Select File Close to close the job editor window.
4. Deploy DIFT Create Report for US Customer Order Information as a stored process.
a. Right-click DIFT Create Report for US Customer Order Information and select
Stored Process New.
12-44 Chapter 12 Deploying Jobs

b. Accept the defaults for the General Tab information.

c. Click .
12.3 Deploying Jobs as Stored Processes 12-45

d. Click next to the SAS server field and select SASApp Logical Stored Process Server.

e. Click next the Source code repository field. The Manage Source Code
Repositories window is displayed.

1) Click to add a new directory.

2) Type S:\Workshop\dift\SPSourceCode as the value for the Path field.

3) Click to close the Add Source Code Repository window.

a) Open a Windows Explorer.

b) Navigate to S:\Workshop\dift.
c) Select File New Folder.
d) Type SPSourceCode and press ENTER.

e) Select File Close.

4) Click to close the Manage Source Code Repositories window.

f. Type USCustomerOrderInfo.sas as the value for the Source file field.

g. Click Package.
12-46 Chapter 12 Deploying Jobs

The Execution Tab window should resemble the following:

h. Click .

i. There are no parameters to declare, so click .

j. There are no data sources or data targets to declare, so click .

12.3 Deploying Jobs as Stored Processes 12-47

A new metadata object, a stored process, now should appear in the Extract and Summary folder.
12-48 Chapter 12 Deploying Jobs

5. Execute the SAS Stored Process using SAS Add-In for Microsoft Office.
a. Select Start All Programs Microsoft Office Microsoft Office Excel 2007.
b. Click the SAS tab.

c. Click .

d. In the Reports window, click in the left selection pane.

e. Navigate to Data Mart Development Orion Reports Extract and Summary.

f. Click DIFT Create Report for US Customer Order Information.

g. Click .
12.3 Deploying Jobs as Stored Processes 12-49

h. Accept the default location.

i. Click . The SAS Stored Process executes.

12-50 Chapter 12 Deploying Jobs

The requested report is displayed:

6. When finished viewing, select the Office button and then Exit Excel (do not save any changes).
Chapter 13 Learning More

13.1 SAS Resources ............................................................................................................. 13-3

13-2 Chapter 13 Learning More
13.1 SAS Resources 13-3

13.1 SAS Resources

Objectives
Identify areas of support that SAS offers.
List additional resources.

Education
Comprehensive training to deliver greater value to your
organization

More than 200 course offerings

World-class instructors
Multiple delivery methods: instructor-led and
self-paced
Training centers around the world

http://support.sas.com/training/

3
13-4 Chapter 13 Learning More

SAS Publishing
SAS offers a complete selection of publications to help
customers use SAS software to its fullest potential:

Multiple delivery methods: e-books,

CD-ROM, and hard-copy books
Wide spectrum of topics
Partnerships with outside authors,
other publishers, and distributors

http://support.sas.com/publishing/

SAS Global Certification Program

SAS offers several globally recognized certifications.

Computer-based
certification exams
typically 60-70 questions
and 2-3 hours in length
Preparation materials and
practice exams available
Worldwide directory of
SAS Certified Professionals

http://support.sas.com/certify/

5
13.1 SAS Resources 13-5

Support
SAS provides a variety of self-help and assisted-help
resources.

SAS Knowledge Base

Downloads and hot fixes
License assistance
SAS discussion forums
SAS Technical Support

http://support.sas.com/techsup/

User Groups
SAS supports many local, regional, international,
and special-interest SAS user groups.

SAS Global Forum

Online SAS Community: www.sasCommunity.org

support.sas.com/usergroups/

7
13-6 Chapter 13 Learning More

Selected Additional Resources

Search Papers Presented at SAS Global Forum (previously known as SUGI)

support.sas.com/events/sasglobalforum/previous/online.html
SAS Code Samples on support.sas.com
support.sas.com/ctx/samples/index.jsp
SAS Code Samples from Specific Books
support.sas.com/documentation/onlinedoc/code.samples.html
List of all SAS Products and Solutions
www.sas.com/products/index.html
List of Papers
support.sas.com/resources/papers/

GTF Notes
74% (19)
GTF Notes
62 pages
SAS DI Developer - Fast Track PDF
100% (8)
SAS DI Developer - Fast Track PDF
982 pages
SAS® Programming 3 Advanced Techniques PDF
100% (2)
SAS® Programming 3 Advanced Techniques PDF
244 pages
PG1
63% (8)
PG1
294 pages
SAS Programming 2 Data Manipulation Techniques - Quizzes PDF
100% (1)
SAS Programming 2 Data Manipulation Techniques - Quizzes PDF
92 pages
SAS Administration from the Ground Up: Running the SAS9 Platform in a Metadata Server Environment
From Everand
SAS Administration from the Ground Up: Running the SAS9 Platform in a Metadata Server Environment
Anja Fischer
5/5 (1)
Exercises and Projects for The Little SAS Book, Sixth Edition
From Everand
Exercises and Projects for The Little SAS Book, Sixth Edition
Rebecca A. Ottesen
No ratings yet
SAS Programming II Data Manipulation Techniques
No ratings yet
SAS Programming II Data Manipulation Techniques
2 pages
Microsoft SharePoint End User Training-Syllabus
No ratings yet
Microsoft SharePoint End User Training-Syllabus
3 pages
SAS Institute A00-211 PDF Exam Questions and Answers From Testking
100% (2)
SAS Institute A00-211 PDF Exam Questions and Answers From Testking
10 pages
How To Make Money Trading With Charts PDF
80% (35)
How To Make Money Trading With Charts PDF
459 pages
An Introduction to Creating Standardized Clinical Trial Data with SAS
From Everand
An Introduction to Creating Standardized Clinical Trial Data with SAS
Todd Case
No ratings yet
Visual Analytics PDF
No ratings yet
Visual Analytics PDF
688 pages
Advanced SAS Programming Techniques
100% (2)
Advanced SAS Programming Techniques
81 pages
Controlling Input and Output - Exercises
0% (1)
Controlling Input and Output - Exercises
12 pages
Sas Course Content Porak Technologies
No ratings yet
Sas Course Content Porak Technologies
8 pages
Sas Workbook With Examples
100% (2)
Sas Workbook With Examples
100 pages
SAS Slides 1: Introduction To SAS
100% (1)
SAS Slides 1: Introduction To SAS
13 pages
SAS® 9.4 Companion For UNIX EnvironmentsThird EditionSAS
No ratings yet
SAS® 9.4 Companion For UNIX EnvironmentsThird EditionSAS
534 pages
Practical and Efficient SAS Programming: The Insider's Guide
From Everand
Practical and Efficient SAS Programming: The Insider's Guide
Martha Messineo
No ratings yet
Pro Trader Telugu Final
No ratings yet
Pro Trader Telugu Final
64 pages
M365 A3-A5 Overview
100% (1)
M365 A3-A5 Overview
10 pages
Student Portal
No ratings yet
Student Portal
78 pages
LWMC1V2 001
100% (2)
LWMC1V2 001
296 pages
LWEG171 E70539 05apr2016p374 v11
No ratings yet
LWEG171 E70539 05apr2016p374 v11
374 pages
Base Sas Programming 2019 10 04 13 47 36 122
No ratings yet
Base Sas Programming 2019 10 04 13 47 36 122
281 pages
SAS Programming 3 Advanced Techniques and Efficiencies
100% (1)
SAS Programming 3 Advanced Techniques and Efficiencies
888 pages
SAS Macro Language Reference
No ratings yet
SAS Macro Language Reference
385 pages
SAS Macro That Reads The Filenames Available at A Particular Directory On Any FTP Server
No ratings yet
SAS Macro That Reads The Filenames Available at A Particular Directory On Any FTP Server
9 pages
Guide To SAS 9.4 & SAS Viya 3.3 Programming Documentation
100% (1)
Guide To SAS 9.4 & SAS Viya 3.3 Programming Documentation
4 pages
SAS Proc Report at Work
No ratings yet
SAS Proc Report at Work
43 pages
SAS Certification A00-212
63% (8)
SAS Certification A00-212
81 pages
The Ins and Outs of SAS® Data Integration Studio
No ratings yet
The Ins and Outs of SAS® Data Integration Studio
16 pages
Base SAS 9.4 Procedures Guide High-Performance Procedures, Third Edition
No ratings yet
Base SAS 9.4 Procedures Guide High-Performance Procedures, Third Edition
172 pages
The Essential PROC SQL
No ratings yet
The Essential PROC SQL
10 pages
Scheduling in SAS 9.4: ® Second Edition
No ratings yet
Scheduling in SAS 9.4: ® Second Edition
86 pages
SAS Macro Language Reference PDF
No ratings yet
SAS Macro Language Reference PDF
385 pages
Ten Good Reasons To Learn Sas Software'S SQL Procedure: Sigurd W. Hermansen, Westat, Rockville, MD
No ratings yet
Ten Good Reasons To Learn Sas Software'S SQL Procedure: Sigurd W. Hermansen, Westat, Rockville, MD
5 pages
Sas
No ratings yet
Sas
2 pages
SAS Slides 7: Match Merging With Datastep
No ratings yet
SAS Slides 7: Match Merging With Datastep
22 pages
LSF Admin
No ratings yet
LSF Admin
774 pages
SASAdvanceQuiz A00 212
No ratings yet
SASAdvanceQuiz A00 212
81 pages
SAS Exam
100% (1)
SAS Exam
32 pages
SAS Training - Day1
No ratings yet
SAS Training - Day1
36 pages
YLP SAS Advanced Certification Slides
0% (3)
YLP SAS Advanced Certification Slides
114 pages
Introduction To SEM Using SAS
No ratings yet
Introduction To SEM Using SAS
50 pages
SAS Full Training Manual 1.13.2014
No ratings yet
SAS Full Training Manual 1.13.2014
189 pages
SAS Programming
No ratings yet
SAS Programming
3 pages
20 SAS Macros Tips in 30 Minutes
No ratings yet
20 SAS Macros Tips in 30 Minutes
28 pages
Top 50 SAS Interview Questions For 2019 - SAS Training - Edureka PDF
No ratings yet
Top 50 SAS Interview Questions For 2019 - SAS Training - Edureka PDF
9 pages
Merging Data Seven Different Ways
100% (1)
Merging Data Seven Different Ways
1 page
SAS Programming Basics
No ratings yet
SAS Programming Basics
26 pages
LEARN SAS Within 7 Weeks: Part2 (Introduction To SAS - The Data Step)
100% (3)
LEARN SAS Within 7 Weeks: Part2 (Introduction To SAS - The Data Step)
63 pages
SAS Clinical Training AND Placement Program
No ratings yet
SAS Clinical Training AND Placement Program
6 pages
Base Sas Advance SAS Interview Questions by Keli Technologies
No ratings yet
Base Sas Advance SAS Interview Questions by Keli Technologies
12 pages
Adam Implementation Guide v1.0
No ratings yet
Adam Implementation Guide v1.0
81 pages
Sas Macros
No ratings yet
Sas Macros
247 pages
Base SAS 9.4 Procedures Guide, Seventh Edition
No ratings yet
Base SAS 9.4 Procedures Guide, Seventh Edition
2,828 pages
CH 12 Producing Summary Reports - Exercises
No ratings yet
CH 12 Producing Summary Reports - Exercises
17 pages
Practice Exam. SAS Pre-Advanced
50% (2)
Practice Exam. SAS Pre-Advanced
50 pages
SAS Certified Base Programmer For SAS 9 - A00-211 - Dumps
75% (36)
SAS Certified Base Programmer For SAS 9 - A00-211 - Dumps
30 pages
SAS Slides 9: Plots and Graphs
No ratings yet
SAS Slides 9: Plots and Graphs
25 pages
Ch. 6 Macro ... All Parts
100% (1)
Ch. 6 Macro ... All Parts
43 pages
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
The Simple Guide to SAS: From Null to Novice
From Everand
The Simple Guide to SAS: From Null to Novice
Kirby Thomas
No ratings yet
Advanced SAS Interview Questions You'll Most Likely Be Asked
From Everand
Advanced SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Pricing Matrix Complete
No ratings yet
Pricing Matrix Complete
2 pages
SalesTracking Ch03 AfterRemovingChartField Complete
No ratings yet
SalesTracking Ch03 AfterRemovingChartField Complete
13 pages
SalesTracking Ch01 Complete
No ratings yet
SalesTracking Ch01 Complete
12 pages
Basic of Statistic4
No ratings yet
Basic of Statistic4
5 pages
SSY Vs MF Calculator
No ratings yet
SSY Vs MF Calculator
8 pages
Types of Statistics
No ratings yet
Types of Statistics
12 pages
Introduction To Data Science: Maths Business Technology Programmipython, Sas, R
No ratings yet
Introduction To Data Science: Maths Business Technology Programmipython, Sas, R
5 pages
University Institute of Engineering & Technology, Panjab University Chandigarh
No ratings yet
University Institute of Engineering & Technology, Panjab University Chandigarh
2 pages
Lessons in Business Statistics Prepared by P.K. Viswanathan
No ratings yet
Lessons in Business Statistics Prepared by P.K. Viswanathan
19 pages
Life Insurance Premium Payment Calc
No ratings yet
Life Insurance Premium Payment Calc
4 pages
Telco Bill Data v1
No ratings yet
Telco Bill Data v1
2,496 pages
Length Width Species
No ratings yet
Length Width Species
6 pages
Data Dictionary
No ratings yet
Data Dictionary
7 pages
Advocate Assigning System: Bachelor of Science in Computer Science
No ratings yet
Advocate Assigning System: Bachelor of Science in Computer Science
101 pages
SPIE APRS Tutorial Geowebservices HCK
No ratings yet
SPIE APRS Tutorial Geowebservices HCK
23 pages
Online Mobile Application Development Using Ionic Framework For Educational Institutions
No ratings yet
Online Mobile Application Development Using Ionic Framework For Educational Institutions
5 pages
Mobile Cloud Computing
No ratings yet
Mobile Cloud Computing
5 pages
Extended ECM For Office 365: Product Overview & Roadmap
No ratings yet
Extended ECM For Office 365: Product Overview & Roadmap
40 pages
Answer EX 1
No ratings yet
Answer EX 1
12 pages
Jntua Mobile Application Development Lab Manual
No ratings yet
Jntua Mobile Application Development Lab Manual
31 pages
50 Most Asked Android Interview QnA
No ratings yet
50 Most Asked Android Interview QnA
5 pages
Waveforms™ SDK Reference Manual: Revised April 4, 2019
No ratings yet
Waveforms™ SDK Reference Manual: Revised April 4, 2019
121 pages
Syserr
No ratings yet
Syserr
7 pages
ETRA06 Li Babcock Parkhurst
No ratings yet
ETRA06 Li Babcock Parkhurst
7 pages
Paras Anadani: Career Objective
No ratings yet
Paras Anadani: Career Objective
3 pages
Subham Jain - SFDC
No ratings yet
Subham Jain - SFDC
2 pages
Komunikasyon at Pananaliksik Sa Wika at Kulturang Pilipino
No ratings yet
Komunikasyon at Pananaliksik Sa Wika at Kulturang Pilipino
4 pages
Tech Note 1223 - Implications of Internet Explorer 11 EOL
No ratings yet
Tech Note 1223 - Implications of Internet Explorer 11 EOL
3 pages
M.A Chaudary Greenhall OL Physics Notes
No ratings yet
M.A Chaudary Greenhall OL Physics Notes
139 pages
Unit-10 Ajm
No ratings yet
Unit-10 Ajm
20 pages
Commercial: All Possible With Dynamics CRM
No ratings yet
Commercial: All Possible With Dynamics CRM
3 pages
Technical Plan
No ratings yet
Technical Plan
3 pages
Emerging Trends in Information Technology
No ratings yet
Emerging Trends in Information Technology
13 pages
TANGAZO LA NAFASI ZA KAZI eGA, TBA & KADCO PDF
No ratings yet
TANGAZO LA NAFASI ZA KAZI eGA, TBA & KADCO PDF
23 pages
REFLECTION
No ratings yet
REFLECTION
7 pages
Fantastico Fileslist
No ratings yet
Fantastico Fileslist
1 page
Delta-Ia HM Iandroid Eremote App Um en 20141027
No ratings yet
Delta-Ia HM Iandroid Eremote App Um en 20141027
10 pages
Anjin Pradhan CV
No ratings yet
Anjin Pradhan CV
2 pages
office-word-excel-powerpoint-advanced
No ratings yet
office-word-excel-powerpoint-advanced
2 pages
Chapter 1 Lecture Notes
No ratings yet
Chapter 1 Lecture Notes
46 pages