Informatica Basics
Informatica Basics
Informatica Basics
Basics Training
Course Objectives
Understand how development to use PowerCenter 7 components for
Be able to build basic ETL mappings Be able to create, run and monitor workflows Understand available options for loading target data Be able to Troubleshoot most problems
PowerCenter 7 Architecture
Sources
Native
Native
Targets
TCP/IP
Repository Server
Repository
Workflow Monitor
Not shown : Client ODBC Connections for Source and Target metadata
PowerCenter 7 Architecture
PowerCenter 7 Architecture
You can register multiple PowerCenter Servers to a repository. The PowerCenter Server moves data from sources to targets based on workflow and mapping metadata stored in a repository.
The PowerCenter Server runs workflow tasks according to the conditional links connecting the tasks.
When you have multiple PowerCenter Servers, you can assign a server to start a workflow or a session.
PowerCenter 7 Architecture
The PowerCenter Server can combine data from different platforms and source types. For example, you can join data from a flat file and an Oracle source.
The PowerCenter Server can also load data to different platforms and target types.
For example, you can load transformed data to both a flat file target and a Microsoft SQL Server database in the same session.
10
Designer Overview
Chapter 2
Designer Interface
Designer Windows: Navigator Workspace Status bar Output Overview Instance Data Target Data
12
Designer Interface
Designer Tools: The Designer provides the following tools: Source Analyzer: Use to import or create source definitions for flat file, XML, COBOL, Application, and relational sources
13
Designer Interface
Status bar: Displays the status of the operation you perform. Output: Provides details when you perform certain tasks, such as saving your work or validating a mapping. Right-click the Output window to access window options, such as printing output text, saving text to file, and changing the font size. Overview: An optional window to simplify viewing workbooks containing large mappings or a large number of objects. Outlines the visible area in the workspace and highlights selected objects in color. To open the Overview window, choose View-Overview Window. Instance Data: View transformation data while you run the Debugger to debug a mapping. Target Data: View target data while you run the Debugger to debug a mapping. You can view a list of open windows and switch from one window to another in the Designer.
14
Designer Tasks
The common tasks performed in each of the Designer tools: Add a repository Print the workspace Open and close a folder Create shortcuts Check in and out repository objects Search for repository objects Enter descriptions for repository objects Copy objects Export and import repository objects Work with multiple objects, ports, or columns Rename ports Use shortcut keys
15
Naming Conventions
Chapter 3
Naming Conventions
Good Practice to Follow Naming Conventions Can be project specific: Workflow: Session: Mapping: Source: wfl_ followed by workflow functionality s_ followed by mapping name m_ followed by mapping functionality Table/File name
Target:
Ports: Input & Output : Variable:-
Table/File name
Column Names v_ followed by functionality
17
Update Strategy:
Aggregator: Normalizer:
upd_
agg_ nrm_
18
20
21
22
24
25
6. Click OK.
26
7. Choose Repository-Save
27
XML File. You can create an XML target definition to output data to an XML file.
28
Database location.
Column names. Datatypes. Key constraints. Key relationships.
29
30
31
32
Metadata Extensions Metadata Extensions Allows developers and partners to extend the metadata stored in the
repository Accommodates the following metadata types: User-defined: PowerCenter users can define and create their own metadata Vendor-defined: Third-party application vendor-created metadata lists For e.g., Applications like Power Connect for Siebel can add information such as contacts, version etc Can be re-usable or non-reusable Can promote non-reusable metadata extensions to reusable; this is not reversible Reusable ones are associated with all repository objects of that object type. Non-reusable one is associated with a single repository object Administrator or Super User privileges are required for managing reusable metadata extensions
33
Data Previewer
Preview data in Relational Sources Flat File Sources Relational Targets Flat File Targets Data Preview Option is available in
Source Analyzer
Warehouse Designer Mapping Designer Mapplet Designer
34
35
Mappings Overview
Chapter 5
Overview
A mapping is a set of source and target definitions linked by transformation objects that define the rules for data transformation. Mappings represent the data flow between sources and targets. When the PowerCenter Server runs a session, it uses the instructions configured in the mapping to read, transform, and write data. Every mapping must contain the following components: Source instance: Describes the characteristics of a source table or file. Transformation: Modifies data before writing it to targets. Use different transformation objects to perform different functions. Target instance: Defines the target table or file. Links: Connect sources, targets, and transformations so the PowerCenter Server can move the data as it transforms it. Note: A mapping can also contain one or more Mapplets. A mapplet is a set of transformations that you build in the Mapplet Designer and can use in multiple mappings.
38
Sample Mapping
39
40
41
Transformation Concepts
A Transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of transformations that perform specific functions. Transformations can be active or passive. Transformations can be connected to the data flow, or they can be unconnected. An unconnected transformation is not connected to other transformations in the mapping. An Unconnected transformation is called within another transformation, and returns a value to that transformation. Transformations in a mapping represent the operations the PowerCenter Server performs on the data. Data passes into and out of transformations through ports that you link in a mapping or mapplet.
42
Transformation Concepts
Perform the following tasks to incorporate a transformation into a mapping: 1. Create the transformation. 2. Configure the transformation. 3. Link the transformation to other transformations and target definitions. You can create transformations using the following Designer tools: Mapping Designer: Create transformations that connect sources to targets. Transformations in a mapping cannot be used in other mappings unless you configure them to be reusable. Transformation Developer: Create individual transformations, called reusable transformations, that you can use in multiple mappings. Mapplet Designer: Create and configure a set of transformations, called mapplets, that you can use in multiple mappings.
43
Getting Help
Chapter 6
45
46
47
48
50
51
Default Query
For relational sources, the PowerCenter Server generates a query for each Source Qualifier transformation when it runs a session. The default query is a SELECT statement for each source column used in the mapping. Thus, the PowerCenter Server reads only the columns that are connected to another transformation.
Although there are many columns in the source definition, only three columns are connected to another transformation. In this case, the PowerCenter Server generates a default query that selects only those three columns:
SELECT CUSTOMERS.CUSTOMER_ID, CUSTOMERS.COMPANY, FROM CUSTOMERS CUSTOMERS.FIRST_NAME
52
53
54
55
Hands on Exercises - I
Chapter 8
57
58
59
During execution of the map, select file as the target instead of relational and delimiter is Pipe
Ensure target file name is user specific (e.g.: Student01 should use file_name01)
60
Number or rows input may not equal Number or rows input always number of rows output equals number of rows output Can operate on groups of data rows Operates on one row at a time
May not be re-linked into another data stream (except into a sorted join where both flows arise from the same source qualifier)
e.g. Aggregator, Filter, Joiner, Rank, e.g. Expression, Lookup, External Normalizer, Source Qualifier, Procedure, Sequence Generator, Update Strategy, Custom Stored Procedure
62
67
Transformation Views
A transformation has three views : Iconized Normal Edit Iconized: shows the transformation in the relation to the rest of the mapping
68
Transformation Views
Normal: shows the flow of data through the transformation
Edit: shows the transformation ports and the properties; allows editing
69
70
Input
Output Variable Ports evaluation follows the Top-Down Approach An Expression is a calculation or conditional statement added to a transformation. An Expression can be composed or Ports, Functions, operators, variables, literals, return values & constants.
71
Ports - Evaluation
The best practice recommends the following approach for port evaluation Input Ports: Should be evaluated first There is no evaluation ordering among input ports (as they do not depend on any other ports) Variable Ports: Should be evaluated after all input ports are evaluated (as variable ports can reference any input port) Variable ports can reference other variable ports also but not any output ports. Ordering of variables is also very important as they can reference each others values.
72
Ports - Evaluation
Output Ports: Should be evaluated last They can reference any input port or any variable port. There is no ordered evaluation of output ports (as they cannot reference each other)
73
74
75
76
Expressions
Expressions can be entered at the row-level (port) or field-level (transformation level) Expressions can be used in the following transformations:-
Expression:
Aggregator Rank Filter Router Update Strategy Transaction Control
77
Expression Transformation
Expression Transformation
Passive Transformation Connected Ports Mixed Variables allowed Create expression in output or variable port
79
Expression Transformation
Perform calculations using non-aggregate functions (row level)
80
Expression Editor
An expression formula is a calculation or conditional statement for a specific port in a transformation Performs calculation based on ports, functions, operators, variables, constants, and return values from other transformations
81
Expression Editor
82
Expression Validation
The Validate or OK button in Expression Editor will: Parse the current expression
83
Informatica Functions
Character Functions Conversion Functions Date Functions Numerical Functions Scientific Functions Test Functions Special Functions
84
Informatica Functions
Character Functions Used to manipulate character data InitCap returns the string value with the first letter in upper case followed by lower case Conversion Functions Used to convert data types
85
Informatica Functions
Date Functions Used to round, truncate, or compare dates; extract one part of the date; or perform arithmetic on a date To pass a string to a date function,first use the to_date() to convert it to an alternate date/time data type
Numerical Functions
Used to perform mathematical operations on numeric data
86
Informatica Functions
Scientific Functions Used to calculate geometric values of numeric data Test Functions Used to test if a lookup result is null Used to validate data ISNULL() IS_DATE() IS_NUMBER() IS_SPACES()
87
Informatica Functions
Special Functions Used to handle specific conditions within a session; search for certain values; test conditional statements IIF(Condition,True,False) ERROR() ABORT()
DECODE()
88
89
90
Note: a) Transformation data types allow mix-n-match of source and target database types b) When connecting ports, native and transformation data types must be either compatible or explicitly converted
91
92
Connect Validation
Examples of invalid connections in a Mapping: Connecting ports with incompatible data types Connecting output ports to a Source Connecting a Source to anything but a Source Qualifier or Normalizer Transformation Connecting an output port to an output port or an input port to another input port
93
Mapping Validation
Mappings must Be valid for a session to run Be end-to-end complete and contain valid expressions Pass all data flow rules Mappings are always validated when saved; can be validated without saving Output window will always display reason for invalidity
94
Filter Transformation
Filter Transformation
Active Transformation Connected Ports
All Input/Output
Usage Filter rows from mapping/mapplet pipeline
96
Filter Transformation
Drops rows conditionally
Use of logical operators makes the filter very effective (e.g. SALARY > 30000 AND SALARY < 100000)
97
98
Router Transformation
Router Transformation
Rows sent to multiple filter conditions
Active Transformation Connected
Ports
All input/output Specify filter conditions for each Group Used to Link source data in one pass to multiple filter conditions
100
Router Groups
Input group(always one) User-defined groups Each group has one condition All group conditions are evaluated for each row One row can pass multiple conditions Unlinked group outputs are ignored Default group(always one) can capture rows that fail all Group conditions
101
102
103
Workflows- I
Chapter 10
105
Workflow - Overview
A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. The PowerCenter Server runs workflow tasks according to the conditional links connecting the tasks. You can run a task by placing it in a workflow. Workflow Manager is used to develop and manage workflows. Workflow Monitor is used to monitor workflows and stop the PowerCenter Server. When a workflow starts, the PowerCenter Server retrieves mapping, workflow, and session metadata from the repository to extract data from the source, transform it, and load it into the target. It also runs the tasks in the workflow. You can run as many sessions in a workflow as you need. You can run the Session tasks sequentially or concurrently, depending on your needs.
107
Session Overview
A session is a set of instructions that tells the PowerCenter Server how and when to move data from sources to targets. A mapping is a set of source and target definitions linked by transformation objects that define the rules for data transformation. A session is a type of task, similar to other tasks available in the Workflow Manager. In the Workflow Manager, you configure a session by creating a Session task. To run a session, you must first create a workflow to contain the Session task.
108
109
110
Session Task
Session Task
Server Instructions to run the logic of ONE specific Mapping E.g- source and target data location specifications, memory allocation,optional Mapping overrides, scheduling,processing and load instructions Becomes a component of a Workflow or Worklet If configured in the Task Developer,the Session Task is reusable
When a session is to be created, valid mappings are displayed in the dialog box
112
Session Task
Session Task Tabs : General Properties Config Object Mapping Components
Metadata Extensions
113
Session Task
114
Session Task
115
Validating a Session
The Workflow Manager validates a Session task when you save it. You can also manually validate Session tasks and session instances. Validate reusable Session tasks in the Task Developer. Validate non-reusable sessions and reusable session instances in the Workflow Designer.
116
Monitor Workflows
The Workflow Monitor is the tool for monitoring Workflows and Tasks Review details about a Workflow or Tasks in two views:
118
119
Task View
120
Monitoring Workflows
Perform operations in the Workflow Monitor Restart: restart a Task, Workflow or Worklet Stop: stop a Task, Workflow or Worklet Abort: abort a Task, Workflow or Worklet Resume: resume a suspended Workflow after a failed Task is corrected View Session and Workflow logs Abort has a 60 second timeout If the Server has not completed processing and committing data during the timeout period, the threads and processes associated with the Session are killed.
121
The Repository Manager Truncate Log option clears the Workflow Monitor logs
122
Hands on Exercises - II
Chapter 12
Ensure all leading and trailing spaces are removed for character columns
Use NEXTVAL of Sequence Generator transformation to connect to Employee_wk
124
125
126
Debugger Features
Debugger is a wizard-driven tool View source/target data View transformation data
128
Debugger Features
You can debug a valid mapping to gain troubleshooting information about data and error conditions. To debug a mapping, you configure and run the Debugger from within the Mapping Designer. The Debugger uses a session to run the mapping on the PowerCenter Server. When you run the Debugger, it pauses at breakpoints and allows you to view and edit transformation output data. You might want to run the Debugger in the following situations: Before you run a session After you run a session
129
You can choose from the following Debugger session types when you configure the Debugger: Use an existing non-reusable session for the mapping Use an existing reusable session for the mapping Create a debug session instance for the mapping
130
Debug Process
1. Create breakpoints You create breakpoints in a mapping where you want the PowerCenter Server to evaluate data and error conditions. 2. Configure the Debugger Use the Debugger Wizard to configure the Debugger for the mapping. Select the session type the PowerCenter Server uses when it runs Debugger. When you create a debug session, you configure a subset of session properties within the Debugger Wizard, such as source and target location. You can also choose to load or discard target data.
132
Debug Process
3. Run the Debugger Run the Debugger from within the Mapping Designer. When you run the Debugger the Designer connects to the PowerCenter Server. The PowerCenter Server initializes the Debugger and runs the debugging session and workflow. The PowerCenter Server reads the breakpoints and pauses the Debugger when the Breakpoints evaluate to true. 4. Monitor the Debugger While you run the Debugger, you can monitor the target data, transformation & Mapplets output data, the debug log, and the session log. When you run the Debugger, the Designer displays the following windows: Debug log. View messages from the Debugger. Target window. View target data. Instance window. View transformation data.
133
Debug Process
5. Modify data and breakpoints When the Debugger pauses, you can modify data and see the effect on transformations, Mapplets, and targets as the data moves through the pipeline. You can also modify breakpoint information. The Designer saves mapping breakpoint and Debugger information in the workspace files. You can copy breakpoint information and the Debugger configuration to another mapping. If you want to run the Debugger from another PowerCenter Client machine, you can copy the breakpoint information and the Debugger configuration to the other PowerCenter Client machine.
134
Debugger Interface
135
Creating Breakpoints
Use the Breakpoint Editor in the Mapping Designer to create breakpoint conditions in a mapping. You can create data or error breakpoints.
When you run the Debugger, the PowerCenter Server pauses the Debugger when a breakpoint evaluates to true.
A breakpoint can consist of an instance name, a breakpoint type, and a condition. When you enter breakpoints, set breakpoint parameters in the following order: 1. Select the instance name. 2. Select the breakpoint type. 3. Enter the condition.
136
Breakpoints Editor
137
Debugger Tips
Server must be running before starting a Debug Session When the Debugger is started, a spinning icon displays. Spinning stops when Debugger Server is ready
Flashing yellow/green arrow points to the current active Source Qualifier. Solid yellow arrow points to the current Transformation instance
Next Instance-is a single step at a time, one row moves from transformation to transformation Step to Instance-examines one transformation at a time, one row after other through the same transformation
138
Transformations in Depth - II
Chapter 14
Target Instances
Target Instances
A single mapping can have more than one instance of the same target The data would be loaded into the instances in bulk mode like a pipeline
Usage of multiple instances of the same target for loading is dependant on the RDBMS in use. Multiple instances may not be used if the underlying database locks the entire table while inserting records
141
142
Joiner Transformation
Joiner Transformation
Active/Connected Ports Input Output Master
144
Joins Types
Homogeneous Joins Joins that can be performed with a SQL SELECT statement Source Qualifier contains a SQL join Tables on same database server(or are synonyms) Database server does the join work Multiple Homogeneous joins can be joined
Heterogeneous Joins
Examples of joins that cannot be done with an SQL statement : An Oracle table and a DB2 table Two flat files A flat file and a database table
145
Heterogeneous Joins
146
Joiner Properties
Join Types: Normal (inner) Master Outer Detail Outer Full Outer
Joiner can accept sorted data (configure the join condition to use the sort origin ports) Joiner Conditions & Nested Joins: Multiple Join conditions are supported
Mid-Mapping Join
The joiner does not accept input in the following situations Both input pipelines begin with the same Source Qualifier Both input pipelines begin with the same Normalizer
148
Aggregator Transformation
Aggregator Transformation
Active Transformation Connected Ports
Mixed
Variables allowed Group by allowed Used for Standard aggregations Can also be used to get distinct records
150
Aggregator Transformation
Performs aggregate calculations
151
Aggregate Expressions
Aggregate functions are supported only in the Aggregator Transformation
Aggregator Transformation
Aggregate Functions Return summary values for non-null data in selected ports
153
Aggregate Properties
Sorted Data(can be aggregated more efficiently) The Aggregator can handle sorted or unsorted data
The Server will cache data from each group and release the cached data- upon reaching the first record of the next group
Data Must be sorted according to the order of the Aggregator Group By ports
Lookup Transformation
Lookup Transformation
Passive Transformation Connected/Unconnected Ports Mixed L indicates Lookup port R indicates port used as a return value Usage Get related values Verify if records exist or if data has changed Multiple conditions are supported Lookup SQL override is allowed
156
Lookup Transformation
157
Lookup Transformation
Looks up values in a database table and provides data to other components in a mapping
158
Lookup Properties
Lookup conditions Lookup Table Name
Lookup condition
Native Database connection Object name
159
160
Lookup Caching
Caching can significantly impact performance Cached Lookup table data is cached locally on the server Mapping rows are looked up against the cache Only one SQL SELECT is needed Cache is indexed based on the order by clause
Uncached
Each Mapping row needs one SQL SELECT If the data does not fit in the memory cache, the PowerCenter Server stores the overflow values in the cache files.
When the session completes, the PowerCenter Server releases cache memory and deletes the cache files unless you configure the Lookup transformation to use a persistent cache.
161
Lookup Caches
When configuring a lookup cache, you can specify any of the following options: Persistent cache: You can save the lookup cache files and reuse them the next time the PowerCenter Server processes a Lookup transformation configured to use the cache. When Session completes, the persistent cache is stored on the server hard disk. The next time Session runs, cached data is loaded fully or partially into RAM and reused. A named persistent cache may be shared by different Sessions Recache from source: If the persistent cache is not synchronized with the lookup table, you can configure the Lookup transformation to rebuild the lookup cache.
162
Lookup Caches
Static cache: You can configure a static, or read-only, cache for any lookup source. By default, the PowerCenter Server creates a static cache. It caches the lookup file or table and looks up values in the cache for each row that comes into the transformation. Dynamic cache: If you want to cache the target table and insert new rows or update existing rows in the cache and the target, you can create a Lookup transformation to use a dynamic cache. The PowerCenter Server dynamically inserts or updates data in the lookup cache and passes data to the target table. Shared cache: You can share the lookup cache between multiple transformations. You can share an unnamed cache between transformations in the same mapping. You can share a named cache between transformations in the same or different mappings.
163
164
Unconnected Lookups
Unconnected Lookup
Will be physically unconnected from other transformations There can be NO data flow arrows leading to or from an unconnected Lookup
Lookup function can be set within any transformation that supports expression
Function in the Aggregator calls the unconnected Lookup
166
167
168
169
170
172
173
Constant
DD_INSERT DD_UPDATE DD_DELETE DD_REJECT
Numeric Value
0 1 2 3
IIF ( score>69,DD_INSERT,DD_DELETE) Expression is evaluated for each row Rows are tagged according to the logic of the expression Appropriate SQL(DML) is submitted to the target database: insert, delete or update
DD_REJECT means the row will not have SQL written for it.
Target will not see the row Rejected rows may be forwarded through Mapping to a reject file
174
176
177
Cycle
Design tip: Set Reset property and Increment by 1. Use in conjunction with lookup. Lookup to get max(value) from target. Add NextVal to it to get the new ID.
178
Hands on Exercises - 3
Chapter 15
Add a Breakpoint for Country =USA for the Source Qualifier and Router Transformations
Stop at rows where condition is satisfied to observe data
180
181
Join to the flat file source by doing an inner join on EmployeeID to get RegionID, RegionDescription, TerritoryID and TerritoryDescription from the db tables and other details from the flat file
Avoid PhotoPath and Notes from the Flat file join
182
Formulae:
lead_time_days = requireddate - orderdate, internal_response_time_days = shippeddate - orderdate, external_response_time_days = requireddate - shippeddate total_order_item_count = SUM(Quantity) total_order_discount_dollars = SUM((Quantity * UnitPrice) * Discount) total_order_dollars = SUM((Quantity * UnitPrice) - ((Quantity * UnitPrice) * Discount)) DEFAULT to -1 for customer_wk, employee_wk, order_date_wk, required_date_wk, shipped_date_wk, ship_to_geography_wk, shipper_wk
183
184
185
Add Expression Transformation for trimming string columns and getting values from LKP_Target
Add Router Transformation to separate the data flow for New and Existing Records
Designer Features
Chapter 16
Arranging Workspace
188
189
Link Paths
190
191
192
Comparing Objects
193
Documentation
Informatica also provides a very descriptive collection of Documentation and Guides. The complete set of documentation for PowerCenter includes: Data Profiling Guide. Designer Guide. Getting Started Installation and Configuration Guide.
Troubleshooting Guide.
Web Services Provider Guide Workflow Administration Guide. XML User Guide.
194
Versioning
If you have the team-based development license, you can configure the repository to store multiple versions of objects During development, you can use the following change management features to create and manage multiple versions of objects in the repository: Check out and check in versioned objects Compare objects
195
Versioning
A repository enabled for versioning can store multiple versions of the following objects: Sources
Targets
Transformations Mappings & Mapplets Sessions & Tasks Workflows & Worklets Session configurations Schedulers Cubes Dimensions
196
Normalizer Transformation
Normalizer Transformation
Normalization is the process of organizing data. Normalizes Records from relational or VSAM sources Active Transformation Connected Ports Input/Output or Output
Usage
Required for VSAM source definitions Normalize flat file or relational source definitions Generate multiple records from one record
199
Overview
You primarily use the Normalizer transformation with COBOL sources, which are often stored in a de-normalized format. The Normalizer transformation normalizes records from COBOL and relational sources, allowing you to organize the data according to your own needs. A Normalizer transformation can appear anywhere in a pipeline when you normalize a relational source.
You break out repeated data within a record into separate records.
You can also use the Normalizer transformation with relational sources to create multiple rows from a single row of data.
200
Overview
Use a Normalizer transformation instead of the Source Qualifier transformation when you normalize a COBOL source.
201
Transforms allowed before the Normalizer transformation Transforms allowed before the Normalizer transformation Reusable Ports
Yes
Yes
Yes
no Input/Output
Yes Input/Output
202
203
Sorter Transformation
Sorter Transformation
Active Transformation Connected Ports Input/Output Define one or more sort keys Define sort order for each key Usage
205
Sorter Transformation
Can sort data from relational tables or flat files Sort takes place on the Informatica Server machine Multiple sort keys are supported
206
Sorter Transformation
Sorter Properties Cache size Can be adjusted. [Default is 8MB] Server uses twice the cache listed If cache size is unavailable, Session Task will fail
207
Rank Transformation
Rank Transformation
Filters the top or bottom range of records for selection. Active Transformation Connected
Ports
Mixed One pre-defined output port RANK INDEX Variable allowed Group By allowed Usage Select top/bottom
Number of records
209
Overview
You can use a Rank transformation to: Return the largest/smallest numeric value in a port or group. Return the strings at the top/bottom of a session sort order.
210
Overview
Rank transformation allows you to group information (like Aggregator) create local variables and write non-aggregate expressions. The Rank transformation differs from the transformation functions MAX and MIN, in that it allows you to select a group of top or bottom values, not just one value. You can connect ports from only one transformation to the Rank transformation.
The Rank transformation includes input or input/output ports connected to another transformation in the mapping.
It also includes variable ports and one rank port. Use the rank port to specify the column you want to rank.
211
Rank Index
The Designer automatically creates a RANKINDEX port for each Rank transformation. The PowerCenter Server uses the Rank Index port to store the ranking position for each row in a group. For example, if you create a Rank transformation that ranks the top five salespersons for each quarter, the rank index numbers the salespeople from 1 to 3:
RANKINDEX 1 2 3 SALES_PERSON Sam Mary Alice SALES 10,000 9,000 8,000
The RANKINDEX is an output port only. You can pass the rank index to another transformation in the mapping or directly to a target.
212
Rank Index
If two rank values match, they receive the same value in the rank index and the transformation skips the next value. For example, if you want to see the top five retail stores in the country and two stores have the same sales, the return data might look similar to the following:
RANKINDEX 1 1 3 4 SALES 10000 10000 90000 80000 STORE Orange Brea Los Angeles Ventura
213
Overview
PowerCenter allows you to control commit and rollback transactions based on a set of rows that pass through a Transaction Control transformation. A transaction is the set of rows bound by commit or rollback rows. You can define a transaction based on a varying number of input rows. You might want to define transactions based on a group of rows ordered on a common key, such as employee ID or order entry date.
215
Overview
In PowerCenter, you define transaction control at two levels: Within a mapping: Within a mapping, you use the Transaction Control transformation to define a transaction (using an expression) Based on the return value of the expression, you can choose to commit, roll back, or continue without any transaction changes. Within a session: When you configure a session, you configure it for user-defined commit. You can choose to commit or roll back a transaction if the PowerCenter Server fails to transform or write any row to the target. When you run the session, the PowerCenter Server evaluates the expression for each row that enters the transformation. When it evaluates a commit row, it commits all rows in the transaction to the target or targets. When the PowerCenter Server evaluates a rollback row, it rolls back all rows in the transaction from the target or targets.
216
217
TC_COMMIT_BEFORE
TC_COMMIT_AFTER TC_ROLLBACK_BEFORE TC_ROLLBACK_AFTER
218
Overview
A Stored Procedure transformation is an important tool for populating and maintaining databases. Database administrators create stored procedures to automate tasks that are too complicated for standard SQL statements. A stored procedure is a precompiled collection of Transact-SQL, PL-SQL or other database procedural statements and optional flow control statements, similar to an executable script. Stored procedures are stored and run within the database. You can run a stored procedure with the EXECUTE SQL statement in a database client tool. Not all databases support stored procedures, and stored procedure syntax varies depending on the database. You might use stored procedures to do the following tasks: Check the status of a target database before loading data into it. Determine if enough space exists in a database. Perform a specialized calculation. Drop and recreate indexes
220
Overview
Stored procedures also provide error handling and logging necessary for critical tasks. The stored procedure must exist in the database before creating a Stored Procedure transformation, and the stored procedure can exist in a source, target, or any database with a valid connection to the PowerCenter Server. You might use a stored procedure to perform a query or calculation that you would otherwise make part of a mapping. If you already have a well-tested stored procedure for calculating sales tax, you can perform that calculation through the stored procedure instead of recreating the same calculation in an Expression transformation.
221
The stored procedure issues a status code that notifies whether or not the stored procedure completed successfully.
You cannot see this value. The PowerCenter Server uses it to determine whether to continue running the session or stop. You configure options in the Workflow Manager to continue or stop the session in the event of a stored procedure error.
223
The mode you use depends on what the stored procedure does and how you plan to use it in your session.
224
All data entering the transformation through the input ports affects the stored procedure.
You should use a connected Stored Procedure transformation when you need data from an input port sent as an input parameter to the stored procedure, or the results of a stored procedure sent as an output parameter to another transformation. Unconnected The unconnected Stored Procedure transformation is not connected directly to the flow of the mapping. It either runs before or after the session, or is called by an expression in another transformation in the mapping.
225
Comparison
If you want to Use this mode
Unconnected
Run a stored procedure once during your mapping, such as pre- or post-session.
Unconnected
Run a stored procedure every time a row passes through Connected Or the Stored Procedure transformation. Unconnected Run a stored procedure based on data that passes through the mapping, such as when a specific port does not contain a null value. Pass parameters to the stored procedure and receive a single output parameter. Pass parameters to the stored procedure and receive multiple output parameters. Run nested stored procedures. Call multiple times within a mapping. Unconnected
This is useful for calling the stored procedure for each row of data that passes through the mapping, such as running a calculation against an input port.
Connected stored procedures run only in normal mode. Pre-load of the Source: Before the session retrieves data from the source, the stored procedure runs. This is useful for verifying the existence of tables or performing joins of data in a temporary table.
228
This is useful for verifying target tables or disk space on the target system.
Post-load of the Target: After the session sends data to the target, the stored procedure runs. This is useful for re-creating indexes on the database.
229
230
Connected
231
Unconnected
232
233
Workflows - II
Chapter 18
235
Reusable Tasks
Three types of reusable tasks Session: Set of instructions to execute a specific Mapping Command: Specific shell commands to run during any Workflow Email: Sends email during the Workflow Use the Task Developer to create a reusable tasks These tasks will then appear in the Navigator and can be dragged & dropped into any workflow
236
Command Task
Specify one or more Unix shell or DOS commands to run during the Workflow Runs in the Informatica Server(Unix or Windows) environment
Shell command status(successful completion or failure) is held in the pre-defined variable $command_task_name.STATUS
Each command Task shell command can execute before the Session begins or after the Informatica Server executes a Session
Specify one or more Unix shell or DOS (NT, WIn2000) commands to run at a specific point in the Workflow
Becomes a component of a Workflow (or Worklet)
237
Command Task
If configured in the Task Developer, the Command Task is reusable (optional) You can use a Command task in the following ways:
238
Email Task
Configure to have the Informatica Server to send email at any point in the Workflow Becomes a component in a Workflow (or Worklet)
239
Non-reusable Tasks
Six additional Tasks are available in the Workflow Designer Decision
Assignment
Timer Control Event Wait Event Raise
240
Decision Task
Specifies a condition to be evaluated in the Workflow Use the Decision Task in branches of a Workflow Provides additional functionality over a Link
241
Decision Task
Example Workflow without a Decision Task
242
Assignment Task
Assigns a value to a Workflow variable Variables are defined in the Workflow object
243
Timer Task
Waits for a specified period of time to execute the next Task Absolute Time Datetime variable Relative Time
244
Control Task
Used to stop, abort, or fail the top-level workflow or the parent workflow based on an input link condition. A parent workflow or worklet is the workflow or worklet that contains the Control task.
245
246
247
Hands-On - III
Chapter 19
249
250
251
Conclusion
Thank You!
252
Kanbay
WORLDWIDE HEADQUARTERS: 6400 SHAFER COURT I ROSEMONT, ILLINOIS USA 60018 TEL. 847.384.6100 I FAX 847.384.0500 I WWW.KANBAY.COM