Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
119 views

Lookup Transformation

The document provides an overview of lookup transformations in data integration. It discusses how lookups work by retrieving related values from a lookup table based on a value in the source. It also describes lookup properties, conditions, caching considerations, and techniques like connected vs unconnected lookups and conditional lookups.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
119 views

Lookup Transformation

The document provides an overview of lookup transformations in data integration. It discusses how lookups work by retrieving related values from a lookup table based on a value in the source. It also describes lookup properties, conditions, caching considerations, and techniques like connected vs unconnected lookups and conditional lookups.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Lookup Transformation

By the end of this sub-section you will be familiar with:


Lookup Basics
How does a Lookup work
Lookup Properties
Lookup Conditions
Lookup Cache Overview
Lookup Cache considerations
Lookup Cache Types
Lookup Techniques

idwbitraining@gmail.com 1
Lookup Basics
Purpose of Lookup Transformation:
Getting related value: Retrieve value from the lookup table
based on a value in the source. And the value returned can
also be used to perform a calculation like any other port.
Update slowly changing dimension tables: Determine whether
rows exist in a target and accordingly you can create a new
record or update the existing one.
Lookup can be used as Connected/Unconnected and it is
termed as both Passive/Active based on the type of output
we want it to deliver.
The lookup can be performed on flat file/relation tables ,views
or synonym.
idwbitraining@gmail.com 2
How a Lookup Transformation Works
For each Mapping row, one or more port values are looked up
in a database table
If a match is found, one or more table values are returned to
the Mapping. If no match is found, NULL is returned
Look Up Transformation
Look-up
Values
Return
SQ_TARGET_ITEMS_OR... LKP_OrderID TARGET_ORDERS_COS...
Values
Source Qualifier Lookup Procedure Target Definition
Name Datatype Len... Name Datatype Len... Loo... Ret... AssociatedK...Name
... Datatype L
ITEM_ID decimal 38 IN_ORDER_ID decimal 38 No No ORDER_ID number(p,s) 3
ITEM_NAME string 72 DATE_ENTERED date/ time 19 Yes No DATE_ENTERED date 1
ITEM_DESC string 72 DATE_PROMISED date/ time 19 Yes No DATE_PROMISED date 1
WHOLESALE_CO... decimal 10 DATE_SHIPPED date/ time 19 Yes No DATE_SHIPPED date 1
DISCONTINUED_... decimal 38 EMPLOYEE_ID decimal 38 Yes No EMPLOYEE_ID number(p,s) 3
MANUFACTURER...decimal 38 CUSTOMER_ID decimal 38 Yes No CUSTOMER_ID number(p,s) 3
DISTRIBUTOR_ID decimal 38 SALES_TAX_RATE decimal 5 Yes No SALES_TAX_RATE number(p,s) 5
ORDER_ID decimal 38 STORE_ID decimal 38 Yes No STORE_ID number(p,s) 3
TOTAL_ORDER_... decimal 38 TOTAL_ORDER_... number(p,s) 3

idwbitraining@gmail.com 3
Lookup Transformation
Looks up values in a database table or flat files and provides data to
downstream transformation in a Mapping

Passive Transformation
Connected / Unconnected
Ports
Mixed
L denotes Lookup port
R denotes port used as a
return value (unconnected
Lookup only)
Specify the Lookup Condition
Usage
Get related values
Verify if records exists or if
data has changed
idwbitraining@gmail.com 4
Lookup Properties

Override
Lookup SQL
option

Toggle
caching

Native Database
Connection
Object name

idwbitraining@gmail.com 5
Additional Lookup Properties

Set cache
directory

Make cache
persistent

Set Lookup
cache sizes

idwbitraining@gmail.com 6
Lookup Conditions
Multiple conditions are supported

idwbitraining@gmail.com 7
Connected Lookup
SQ_TARGET_ITEMS_OR... LKP_OrderID TARGET_ORDERS_COS...
Source Qualifier Lookup Procedure Target Definition
Name Datatype Len... Name Datatype Len... Loo... Ret... AssociatedK...Name
... Datatype L
ITEM_ID decimal 38 IN_ORDER_ID decimal 38 No No ORDER_ID number(p,s) 3
ITEM_NAME string 72 DATE_ENTERED date/ time 19 Yes No DATE_ENTERED date 1
ITEM_DESC string 72 DATE_PROMISED date/ time 19 Yes No DATE_PROMISED date 1
WHOLESALE_CO... decimal 10 DATE_SHIPPED date/ time 19 Yes No DATE_SHIPPED date 1
DISCONTINUED_... decimal 38 EMPLOYEE_ID decimal 38 Yes No EMPLOYEE_ID number(p,s) 3
MANUFACTURER...decimal 38 CUSTOMER_ID decimal 38 Yes No CUSTOMER_ID number(p,s) 3
DISTRIBUTOR_ID decimal 38 SALES_TAX_RATE decimal 5 Yes No SALES_TAX_RATE number(p,s) 5
ORDER_ID decimal 38 STORE_ID decimal 38 Yes No STORE_ID number(p,s) 3
TOTAL_ORDER_... decimal 38 TOTAL_ORDER_... number(p,s) 3

Connected Lookup
Part of the data flow pipeline

idwbitraining@gmail.com 8
Unconnected Lookup
Will be physically unconnected from other transformations
There can be NO data flow arrows leading to or from an unconnected Lookup

Lookup function can be set within any


transformation that supports expressions

Lookup data is
called from the
point in the
Mapping that
needs it

Function in the Aggregator


calls the unconnected
Lookup

idwbitraining@gmail.com 9
Unconnected Lookup - Return Port
The port designated as R is the return port for the unconnected lookup
There can be only one return port
The look-up (L) / Output (O) port can be assigned as the Return (R) port
The Unconnected Lookup can be called in any other transformations
expression editor using the expression
:LKP.Lookup_Tranformation(argument1, argument2,..)

idwbitraining@gmail.com 10
Connected vs. Unconnected Lookups

CONNECTED LOOKUP UNCONNECTED LOOKUP

Part of the mapping data flow Separate from the mapping data
flow
Returns multiple values (by linking Returns one value (by checking the
output ports to another Return (R) port option for the output
transformation) port that provides the return value)
Executed for every record passing Only executed when the lookup
through the transformation function is called
More visible, shows where the Less visible, as the lookup is called
lookup values are used from an expression within another
transformation
Default values are used Default values are ignored

idwbitraining@gmail.com 11
Conditional Lookup Technique
Two requirements:
Must be Unconnected (or function mode) Lookup
Lookup function used within a conditional statement

Row keys
Condition (passed to Lookup)

IIF ( ISNULL(customer_id),0,:lkp.MYLOOKUP(order_no))

Lookup function

Conditional statement is evaluated for each row


Lookup function is called only under the pre-defined condition
idwbitraining@gmail.com 12
Conditional Lookup Advantage
Data lookup is performed only for those rows which require it.
Substantial performance can be gained

EXAMPLE: A Mapping will process 500,000 rows. For two percent of those rows
(10,000) the item_id value is NULL. Item_ID can be derived from the
SKU_NUMB.

IIF ( ISNULL(item_id), 0,:lkp.MYLOOKUP (sku_numb))

Condition Lookup
(true for 2 percent of all rows) (called only when condition is true)

Net savings = 490,000 lookups

idwbitraining@gmail.com 13
To Cache or not to Cache?
Caching can significantly impact performance
Cached
Lookup table data is cached locally on the machine
Mapping rows are looked up against the cache
Only one SQL SELECT is needed

Uncached
Each Mapping row needs one SQL SELECT
Rule Of Thumb: Cache if the number (and size) of records in
the Lookup table is small relative to the number of mapping
rows requiring lookup or large cache memory is available for
Integration Service
idwbitraining@gmail.com 14
Lookup cache - overview

Lookup transformations can be configured to use cache.

The Integration Service builds the cache in memory when the first row is
processed. If the memory is inadequate, the data is paged into a cache file.

If you use a flat file lookup, the Integration Service always caches the lookup rows.

By default, the cache files are created under $PMCacheDir.

Cache if the number (and size) of records in the Lookup table is small relative to
the number of mapping rows requiring the lookup.

idwbitraining@gmail.com 15
Lookup cache - Types
There are two types of lookup caches Static and Dynamic
Un-cached Static cache Dynamic cache
The lookup table is queried each Cannot insert/update the cache once Can insert/update rows in the cache for each
time. created row from source (previous widget)
Cannot use flat file as lookup Can use relational and flat file lookups Can use relational and flat file lookups
source
When the condition matches, When the condition matches, lookup When the condition matches, rows are
lookup returns a row returns a row updated in the cache or left unchanged
depending on the row type
If the condition is false, the If the condition is false, the default value When the condition is false, rows are
default value is returned for is returned for connected and NULL is updated in the cache or left unchanged
connected and NULL is returned returned for unconnected lookups depending on the row type
for unconnected lookups

idwbitraining@gmail.com 16
Lookup cache for connected
The Integration Service can build cache for connected lookups in two ways
Sequential cache: The Integration Service builds the cache in memory when it processes the
first row of the data in a cached lookup transformation. It waits for upstream transformations
to complete before building a cache.
Concurrent cache: The Integration Service does not wait for upstream active transformations
to complete. It starts building the cache as soon as session starts. This may improve
performance if you are sure that the cache is needed each time the mapping is run.
For example: if the transformation logic in a mapping is configured to route data to different
pipelines, the downstream lookup might not be hit each time. In this case, it is advisable to
go for sequential cache.
Unconnected lookup caches cannot be processed concurrently.

idwbitraining@gmail.com 17
Lookup cache: Static

This is the default type of cache.

Cache is built when the first lookup row is processed.

For each row that passes the transformation, the cache is queried for specified
condition.

If a match is available, the proper value is returned.

If a match is not available either default value (for connected lookups only) or
NULL is returned.

If multiple matches are found, rows are returned based on the option specified in
Lookup policy on multiple match in the lookup properties.

idwbitraining@gmail.com 18
Lookup cache: Dynamic

The cache file is constantly updated by the following actions

Insert - Inserts the row into the cache if it is not present and you specified to insert
rows. You can configure to insert rows into cache based on input ports or
generated sequence IDs.

Update updates the row in cache if the row is already present and an update is
specified in the properties

No change:
Row does not exist in cache, but you have specified to only insert new rows

Row does not exist in cache, but you have specified update existing rows only

Row exists in the cache, but based on the lookup conditions nothing changes

idwbitraining@gmail.com 19
Lookup cache dynamic when to use

Some situations where dynamic lookups can be used

Updating a master customer table with new and updated customer information.
Use a Lookup transformation to perform a lookup on the customer table to determine if
a customer exists in the target. Use a dynamic lookup cache that inserts and updates
rows in the cache as it passes rows to the target.

Loading data into a slowly changing dimension table and a fact table.
Load data into a slowly changing dimension table and a fact table. Create two pipelines
and configure a Lookup transformation that performs a lookup on the dimension table.
Use a dynamic lookup cache to load data to the dimension table. Use a static lookup
cache to load data to the fact table, and specify the name of the dynamic cache from the
first pipeline.

idwbitraining@gmail.com 20
Lookup cache dynamic properties
Dynamic lookup cache consists of the following properties
Property Description
NewLookupRow This port is added when the lookup is configured as dynamic. 0=No change, 1=insert, 2=update

Associated port The data in the associated port is used to determine whether to insert/update rows in cache. A
sequence id can also be used as associated port wherein Informatica generates and uses a
primary key
Ignore Null Inputs for This port is selected when you do not want to update the data in cache when this column is
Updates NULL
Ignore in Comparison The Integration Service compares the values in all lookup ports with the values in their
associated input ports by default. Select this property if you want the Integration Service to
ignore the port when it compares values before updating a row.
Insert else Update This affects only rows that enters the lookup transformation flagged as insert. Inserts a row into
cache if it is new. If the row exists in index cache, but the data cache is different, then it updates
the cache. If this option is not selected, Informatica inserts all new rows and ignores update
rows.
Update else Insert This affects only rows that enter the lookup transformation flagged as update. If the row exists
in cache, Informatica updates the data cache. If a row does not exist in cache, it inserts a new
row. If this option is not selected, Informatica updates rows in cache and ignores new rows

idwbitraining@gmail.com 21
Lookup cache dynamic - behavior
Dynamic lookup cache behavior for insert row type
Insert else update Row found in cache Data cache is different Lookup cache result NewLookupRow
option value
Not selected Yes n/a No change 0
No n/a Insert 1
selected Yes Yes Update 2 (0)
Yes No No change 0
No n/a Insert 1

Dynamic lookup cache behavior for update row type


Update else insert Row found in cache Data cache is different Lookup cache result NewLookupRow
option value
Not selected Yes Yes Update 2 (0)
Yes No No change 0
No n/a No change 0
selected Yes Yes Update 2 (0)
Yes No No change 0
No n/a Insert 1

idwbitraining@gmail.com 22
Lookup cache dynamic - guidelines
The Lookup transformation must be a connected transformation.
You can only create an equality lookup condition. You cannot look up a range of data in
dynamic cache.
Associate each lookup port that is not in the lookup condition with an input port or a
sequence ID.
When you use a lookup SQL override, make sure you map the correct columns to the
appropriate targets for lookup.
When you add a WHERE clause to the lookup SQL override, use a Filter transformation before
the Lookup transformation.
Use Update Strategy transformations after the Lookup transformation to flag the rows for
insert or update for the target.
Use an Update Strategy transformation before the Lookup transformation to define some or
all rows as update if you want to use the Update Else Insert property in the Lookup
transformation.
Set the row type to Data Driven in the session properties.
Select Insert and Update as Update for the target table options in the session properties.

idwbitraining@gmail.com 23
Lookup cache sharing unnamed cache

When two Lookup transformations share an unnamed cache, the Integration


Service saves the cache for a Lookup transformation and uses it for subsequent
Lookup transformations that have the same lookup cache structure.

For example, if you have two instances of the same reusable Lookup
transformation in one mapping and you use the same output ports for both
instances, the Lookup transformations share the lookup cache by default

Shared transformations must use the same ports in the lookup condition. The
conditions can use different operators, but the ports must be the same.

idwbitraining@gmail.com 24
Lookup cache sharing named cache

You can also share the cache between multiple Lookup transformations by using a
persistent lookup cache and naming the cache files.

When the Integration Service processes the first Lookup transformation, it


searches the cache directory for cache files with the same file name prefix.

If the Integration Service finds the cache files and you do not specify to recache
from source, the Integration Service uses the saved cache files.

If the Integration Service does not find the cache files or if you specify to recache
from source, the Integration Service builds the lookup cache us.

The Integration Service saves the cache files to disk after it processes each target
load order.

idwbitraining@gmail.com 25
Lookup cache sharing named cache

The Integration Service fails the session if you configure subsequent Lookup transformations
to recache from source, but not the first one in the same target load order group.

If the cache structures do not match, the Integration Service fails the session.

The Integration Service processes multiple sessions simultaneously when the Lookup
transformations only need to read the cache files.

The Integration Service fails the session if one session updates a cache file while another
session attempts to read or update the cache file.
For example, Lookup transformations update the cache file if they are configured to use a dynamic
cache or recache from source.

idwbitraining@gmail.com 26
Lookup cache - Tips
Cache small lookup tables.
Improve session performance by caching small lookup tables. The result of the
lookup query and processing is the same, whether or not you cache the lookup
table.
Use a persistent lookup cache for static lookup tables.
If the lookup table does not change between sessions, configure the Lookup
transformation to use a persistent lookup cache.
The Integration Service then saves and reuses cache files from session to session,
eliminating the time required to read the lookup table.
Care should be taken to ensure that data does not become stale while using
persistent cache.
For example: in a daily load, always cache a persistent lookup first (using re-cache from
source option), before they are used in other mappings. It is a good idea to re-cache a
persistent lookup in order to match any changes in the lookup table

idwbitraining@gmail.com 27
Lookup cache
Enable caching

Cache directory

Using persistent cache

Data cache size

Index cache size

Dynamic lookup

Naming a persistent cache

Recache for persistent cache

Dynamic lookup options

idwbitraining@gmail.com 28

You might also like