Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Oci Tuning Ds

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 37

Skip to main content

Sign in (or register)

English

IBM

Technical topics

Evaluation software

Community

Events

Search developerWorks

Search dev

developerWorks
Technical topics

Information Management

Technical library

Tuning the Oracle Connector performance in


IBM InfoSphere DataStage, Part 2:
Optimization of bulk load operation and
considerations for reject links and data types
Predrag Maksimovic (predragm@us.ibm.com), Senior Software Engineer, IBM
Summary: The Oracle Connector is a connectivity component in IBM InfoSphere
Information Server. It is utilized by IBM InfoSphere DataStage and other products in the
Information Server suite to perform extract, lookup, load, and metadata import operations on
Oracle databases. This article is Part 2 of a series of two articles that provide a set of guidelines
for tuning the Oracle Connector stages in DataStage parallel jobs with the goal of maximizing
their performance.
View more content in this series

Date: 04 Apr 2013


Level: Intermediate
PDF: A4 and Letter (803 KB | 34 pages)Get Adobe Reader
Activity: 6259 views
Comments: (View | Add comment - Sign in)
Average rating (6 votes)
Rate this article
Introduction
In the Part 1 article in this series we provided a brief overview of the IBM InfoSphere DataStage
product in the IBM Information Server suite of products and explained the role of the Oracle
Connector stage in DataStage jobs. We explained the difference between conductor and player
processes for the stage. We introduced the concepts of parallelism and partitioning and explained
how their meaning differs between the DataStage environment and the Oracle database
environment. We covered in detail the performance tuning and troubleshooting of the Oracle
Connector stage configured to perform SQL statements on the database, including query
statements used to fetch and lookup rows in the database as well as DML and PL/SQL statements
used to insert, update and delete rows in the database.
In the Part 2 article of the series we will continue our coverage of the Oracle Connector
performance tuning. First we will focus on performance aspects of the bulk load mode in the
connector. In this mode the connector is used to load data to the database by utilizing the Oracle
direct path interface. We will cover the operation of loading the records to the database as well as
the operations that often need to be performed on the table indexes and constraints before and
after loading the records. We will then present a few guidelines regarding the use of reject links
with the connector. We will show how reject links can affect the connector performance in some
cases and present some alternative approaches for your consideration. We will end the article by
presenting you with a number of performance tuning strategies oriented towards the handling of
different Oracle data types in the connector.
As was the case with the Part 1 article, the Part 2 article assumes the use of DataStage parallel
jobs and Information Server Version 9.1, although some of the presented concepts would apply
also to DataStage server jobs as well as the earlier Information Server versions.
Back to top
Bulk load
When the Oracle Connector stage is configured to run in bulk load mode it utilizes the Oracle
direct path interface to write data to the target database table. It receives records from the input
link and passes them to Oracle database which formats them into blocks and appends the blocks
to the target table as opposed to storing them in the available free space in the existing blocks. To
configure the stage to run in bulk load mode the Write mode property of the stage needs to be set

to value Bulk load. In this section, we will look at the connector properties and environment
variables that play important role for tuning bulk load operation performance in the stage, as
shown in Figure 1.
Figure 1. Connector properties that affect bulk load performance

Running the stage in Bulk load write mode typically results in better performance than when
running it in Insert write mode. However, Bulk load write mode imposes several restrictions that
are not present when the Insert write mode is used, especially when running the stage in parallel
execution mode with Allow concurrent load sessions connector property set to value Yes. These
restrictions are primarily related to the handling of triggers, indexes and constraints defined on
the target table.
The information in the remainder of this section is intended to help you answer the following
questions:

Should I configure the stage to run in Bulk load or Insert write mode?
If I choose to run the stage in Bulk load write mode, how should I configure the
remaining settings in the stage in order to achieve optimal performance?

Data volume considerations


When deciding to configure the stage in Bulk load or Insert write mode, the critical aspect to
consider is the data volume, in other words how many records on average is the stage writing to
the target table in a single job invocation.
If the data volume is relatively small, the preferred option would be to use Insert write mode due
to more flexibility that this write mode provides in respect to maintaining indexes, constraints
and triggers of the target table. With the Insert write mode all the table indexes, constraints and
triggers are enforced during the job execution and remain valid after the job completes. For
example, if the stage that is running in Insert write mode tries to insert a record to the target table
that violates the primary key constraint in the table, the stage can be configured to handle this

scenario and to continue processing the remaining records in a number of ways that are not
readily available when the stage is running in Bulk load write mode.

It can have a reject link defined and configured to process records that have violated
constraints. Each input record that violates table constraints will be routed to the reject
link, with optionally included error code and message text explaining why the record was
rejected. The existing row in the table for which the primary key was violated will remain
in the table.
It can be configured to run in Insert then update write mode in which case the existing
row in the target table will be updated with the values from the record that violated the
constraint.

It can be configured to run in Insert new rows only write mode, in which case the record
that violated the constraint will be skipped (ignored).

It can be configured with the INSERT statement that contains LOG ERRORS INTO clause
which will result in Oracle redirecting records that violated the constraint along with the
error information to the specified error logging table.

If the data volume is relatively large and the primary goal is to append the data to the target table
as quickly as possible then the Bulk load mode should be considered.
The question is what should be considered a small data volume as opposed to large data volume.
This decision will depend upon other things like how much performance you are willing to
sacrifice when you choose the Insert write mode, which in turn depends on the size of the batch
window that is available to push the data to the database and the available system resources. As a
rule of thumb, thousands of records would likely be considered a small data volume, while
millions of records would likely be considered a large data volume.
If the table does not have any indexes, constraints and triggers defined then Bulk load write
mode would make a good candidate irrespective of the data volume involved because many of
the restrictions imposed by this write mode would not apply any more. This includes the scenario
where the table does in fact have indexes, constraints and triggers defined but you have
procedures in place to disable or drop them prior to loading the data and then enable or rebuild
them after the load is complete.
A simple test like the following can help you decide which write mode to choose:
1. Design a test job with Oracle Connector stage that has an input link. In the test job
provide records similar in volume and structure to the records that will be processed by
the actual job in production.
2. Ensure that the target database table used in the test job has the same definition as the
table that will be used in production. If the production job performs incremental data
loads to an existing table in the database, ensure that the table used for the test job is
initially populated with a similar number of rows as the table that will be used in
production.

3. Run the job a few times in insert mode and capture the average time it took the job to
complete.
4. Manually disable any indexes, constraints and triggers on the target table and repeat the
job a few times with the connector stage configured in bulk load write mode and capture
the average time it took the job to complete.
5. Compare the results from the previous two sets of job runs. If the numbers are similar
that would be an indication that the Insert write mode is sufficient. If bulk load completes
much faster, determine which additional operations need to be performed on the table to
restore the constraints and indexes. Issue these operations from a tool like SQL*Plus to
see how much time they take to complete. If that time combined with the time to bulk
load the data is still considerably shorter than the time that it took to insert the data, then
the bulk load mode is likely to be a better choice.
Handling triggers
If the target table has triggers that need to be fired for every row written to the table then the bulk
load mode should not be used because database triggers are not supported in this mode. To bulk
load data to a table that has triggers, the triggers need to be disabled prior to the load. The
connector provides an option to disable the triggers automatically before to load and enable them
after the load. Once the triggers are enabled they will fire only for the rows inserted to the table
after that point. They will not fire for the rows that were loaded while the triggers were disabled.
Handling indexes
If the table has indexes, you can configure the stage to disable all the indexes when performing
the load. To do this set the Index maintenance connector property to value Skip all. This is
typically the best option to choose as it allows you to run the stage without restrictions in parallel
execution mode and with any Partition type value selected on the Partitioning page for the input
link. However, all indexes are marked UNUSABLE after the load completes and should be rebuilt.
This is where the data volume and the number of indexes play a critical role. If you are
incrementally loading a relatively small number of records compared to the number of records
already present in the table, it may be significantly faster to run the stage in Insert write mode
and have the database maintain indexes throughout the insert operation instead of having to
rebuild all the indexes after bulk load. On the other hand, if the target table is empty and you
need to load many records to it then you will likely be benefited in using the Bulk load write
mode because the indexes will need to be built on all the rows, and in that case, it may be faster
to first load all the records without maintaining the indexes and then build the indexes from
scratch.
Instead of rebuilding the indexes in an external tool after the load, you can configure the stage to
request from Oracle database to maintain the indexes during the load by setting the Index
maintenance option property to Do not skip unusable or Skip unusable. The use of these two
options is highly restrictive though and in most cases will not be the most optimal choice. Here
are the things that you need to keep in mind if you consider using one of these two options:

You will need to set Allow concurrent load sessions connector property to No, otherwise
you will receive Oracle error "ORA-26002: Table string has index defined upon it" before
any data is loaded. This requirement further implies that you will not be able to run the
stage in parallel execution mode, except for one special case when loading to a
partitioned table which is covered later in the article in the section that talks about
loading records to a partitioned table.
If the table has a UNIQUE index, the index will remain VALID after the job completes but
only if none of the loaded rows have violated the uniqueness imposed by the index.
Otherwise the index will automatically be marked UNUSABLE even though the job
completes successfully. Subsequent attempts to run the job will result in failure and the
reported Oracle error depend on the Index maintenance option property value. For Do not
skip unusable value the error will be "ORA-26028: index string.string initially in
unusable state" and for Skip unusable value the error will be "ORA-26026: unique index
string.string initially in unusable state".
If the table has a NON-UNIQUE index, the index will remain VALID after the load. If the
index was marked UNUSABLE before the load, the behavior will depend on the Index
maintenance option property value. For Do not skip unusable value it will result in
Oracle error "ORA-26002: Table string has index defined upon it" and for Skip unusable
value the job will complete and the index will remain in UNUSABLE state.

You can configure the stage to automatically rebuild indexes after the load, and you can control
the LOGGING and PARALLEL clauses of the index rebuild statements to optimize the index rebuild
process. You can configure the stage to fail the job if any index rebuild statement fails or to only
issue warning messages. To improve the performance the connector stage will rebuild the
indexes only when necessary. It will inspect the state of each index before rebuilding it. If the
index is marked VALID the connector will skip rebuilding it. This applies also to locally and
globally partitioned indexes. In that case, the connector will inspect the index partitions and
subpartitions (if the index is locally partitioned and the table is composite-partitioned) and
rebuild only those index partitions and subpartitions that are marked UNUSABLE.
Handling constraints
The presence of integrity and referential constraints on the table has implications on the bulk
load operation similar to those imposed by the indexes.
and CHECK constraints do not need to be disabled prior to the load. If any input records
violate the constraint the job will fail and no records will be loaded to the table. The exception to
this is the case when the stage is configured to run in parallel execution mode and some player
processes finish loading their record partitions before the player process that encountered the
record in error. In that case, the records loaded by the player processes that have already finished
will be committed to the table. From the performance point of view, a good approach for dealing
with these two types of constraints is to leave them enabled on the table during the load but
ensure that no records violate them by performing the necessary checking and error handling
upstream of the connector stage. We provide some concrete suggestions later in the section that
addresses reject links.
NOT NULL

FOREIGN KEY, UNIQUE

and PRIMARY KEY constraints also do not need to be disabled prior to the
load but are effectively ignored by the load operation. Any records that violate these constraints
will still be loaded and the constraints will remain VALIDATED and ENABLED. To ensure that these
constraints are valid for the rows loaded by the connector stage you will need to disable them
prior to the load and enable them after the load. Keep in mind that these types of constraints are
usually accompanied by indexes. The database enforces UNIQUE and PRIMARY KEY constraints
through unique indexes and FOREIGN KEY constraints often have user-defined indexes created for
them. In the earlier section about handling indexes we explained the effects that indexes have on
the load operation. The same effects apply in this case and need to be taken into consideration.
The connector provides an option to disable constraints before the load. You can specify this by
setting the Disable constraints property to Yes. When this is done, the load operation typically
completes faster than the insert operation. However, you have to enable and validate the
constraints following the load and deal with the loaded records that have violated the constraints.
That part can have a major impact on the overall performance. If you have a mechanism in place
outside of the DataStage job to enable and validate the constraints after the load, then having the
connector disable them before the load will provide for the best performance of the job itself
since there will be no restrictions during the load due to constraints and no time will be spent at
the end of the job to enable the constraints.
You can configure the stage to enable and validate the constraints after the load by setting
Enable constraints property to Yes. But, if any constraint fails the validation at that time, the job
will fail and the constraint will remain in DISABLED and NOT VALIDATED state. You can specify
the exceptions table in the Exceptions table name property to which the ROWIDs of the rows
that violated the constraints will be stored, along with the name of the constraint that they have
violated. Doing this will still result in the job failure if any of the constraints cannot be enabled,
but you will be able to determine which rows violated which constraint. You can further
configure the stage to delete the rows from the table that violated the constraints and to try to
enable the constraints again but that will discard all the rows in the table found to be in violation
of the constraints. To store them somewhere instead of discarding them you can define a reject
link for the stage and enable SQL error constraint violation condition on it. This will have a
major impact on the performance because it enforces sequential execution of the stage in a single
player process. Rather than accepting this restriction you should consider running the stage in
parallel execution mode and specifying Insert value for the Write mode as that will potentially
result in better overall performance. In general, it is best to avoid the use of reject links if
possible. The section about reject links provides some concrete ideas in this regard.
If you configure the stage to disable table constraints prior to the load and enable them after the
load and reject the records that have violated constraints, all of the records in violation of UNIQUE
and PRIMARY KEY constraints defined on the table will be rejected. The connector doesn't make
distinction between the rows that existed in the table prior to load and those that were loaded by
the connector. For example, if a row loaded by the connector violated the primary key constraint
because another row with the same primary key value existed in the table prior to the load, both
rows will be deleted from the table and sent to the reject link. To have the stage delete and reject
only those rows in violation of constraint that were not in the table prior to the load, you will
need to implement a custom solution such as the one shown in the next section.

Custom handling of indexes and constraints


When the connector stage is configured to handle triggers, indexes or constraints automatically
before and after the load, the connector will accomplish this by issuing various Data Definition
Language (DDL) statements on the database. For enabling and disabling triggers, it will use
ALTER TRIGGER statements, for rebuilding indexes it will use ALTER INDEX statements and for
enabling and disabling constraints it will use ALTER TABLE statements. You can locate these
statements in the job log when you run the job with CC_MSG_LEVEL environment variable set to
value 2. The connector provides properties that you can use to include or remove particular
clauses in these statements. For example, it provides properties that allow you to control the
presence of PARALLEL and LOGGING clauses in the ALTER INDEX REBUILD statement.
In some cases, you may wish to issue your own statements for handling triggers, indexes and
constraints instead of having the connector automatically generate and run them. For example,
you may wish to reference table triggers, indexes and constraints directly by their names instead
of having the connector query the database dictionary views to dynamically discover those
objects. You may wish to manually include certain clauses in these statements that you cannot
configure through the connector properties. You can achieve this outside of the stage by running
external tools and scripts before and after running the job. You can do it through the stage by
utilizing the Before SQL statement and After SQL statement connector properties. In each of
these two properties you can specify either a list of semicolon-separated SQL statements or an
anonymous PL/SQL block to be executed. The connector stage executes these statements in the
conductor process. It will run the Before SQL statement statements before any player process has
been created for the stage to process the records. It will run the After SQL statement statements
after all the player processes have finished loading the records. The following example illustrates
how this can be accomplished.
Suppose that you want to load records to the table TABLE_ITEMS which stores information about
certain type of items. The table has the ID column that represents item identifiers and serves as
the primary key for the table, which further means that the table has a unique index defined on
this same column. The table also has the NAME column that stores item names and for which a
non-unique index is defined on the table. Finally, the table has the LOAD_DATE column which
represents the date and time when the item row was loaded. The statement shown in Listing 1
can be used to create this table.
Listing 1. Create table statement example
CREATE TABLE TABLE_ITEMS(ID NUMBER(10), NAME VARCHAR2(20), LOAD_DATE DATE);
ALTER TABLE TABLE_ITEMS ADD CONSTRAINT PK_TABLE_ITEMS_ID PRIMARY KEY (ID);
CREATE INDEX IDX_TABLE_ITEMS_NAME ON TABLE_ITEMS(NAME);

The load date is set when the job runs and is passed to the job as a job parameter in format YYYYMM-SS HH24:MI:SS. Various mechanisms can be used to set and pass this value to the job. For
example, if you are invoking the job with dsjob command from command line, the system date

command can be used to produce this value. For example, on Linux you can use the command
shown in Listing 2 to run the job and pass the LOAD_DATE job parameter value based on the
current system date and time.
Listing 2. dsjob command invocation
dsjob -run -jobstatus -userstatus -param LOAD_DATE="`date +'%F %T'`"
project_name job_name

Another option for passing the current date and time would be to create a sequence job that
contains a single activity stage which in turn invokes the load job and initializes the LOAD_DATE
job parameter to value DSJobStartTimestamp, which is a built-in DataStage macro that returns
the job start timestamp.
Suppose the table contains many rows already and that you are running your job to load
additional rows. Further, most of the records that you are loading are for new items, but some
records represent existing items, meaning they have the ID field value for which there is already
a row in the table with that same ID value. You want to load the data as fast as possible, and you
also want the constraints and indexes to be enabled and valid for all the table rows after the load.
If any loaded rows violate the primary key constraint, you want to store them to an alternate table
TABLE_ITEMS_DUPLICATES which has the same columns as the TABLE_ITEMS table but does not
have any constraints and indexes defined on it, as shown in Listing 3.
Listing 3. Create table statement for duplicates
CREATE TABLE TABLE_ITEMS_DUPLICATES(ID NUMBER(10),
NAME VARCHAR2(20),
LOAD_DATE DATE);

You could achieve this task with the stage configured to disable constraints before the load, skip
all indexes during the load and rebuild indexes and enable constraints after the load. You could
define reject link on the stage, select the SQL error constraint violations condition on the reject
link and direct this link to another Oracle Connector stage configured to insert rows to the
TABLE_ITEMS_DUPLICATES table. But this approach would have the following limitations:

Because you configured the stage to disable the constraints and reject rows that violated
constraints after enabling them, the connector enforces sequential execution in a single
player process. The rejecting of records on the reject link can only take place in a player
process and in this case the rejecting is done when the connector enables the constraints
after the load. For the player process to enable the constraints after it has finished loading
the records, it needs to be certain that it is the only player process for the stage. But in
practice, you may wish to run the stage in parallel execution mode so that you can utilize
the available system resources on the engine tier.

The connector rejects all the rows that violated the primary key constraint including the
rows that existed in the table prior to the load but that happen to have the same ID value
as some of the rows that were loaded. This is because at the time the constraints are
enabled all the rows are already in the table storage and no distinction is made at that
time between the rows that were in the table prior to the load and those that were just
loaded. In practice, you may wish to reject only the rows that were just loaded by the job
and that violated the constraint but not any of the rows that existed in the table prior to
running the job.

The following is one possible approach to avoid these limitations. Specify the statement shown
in Listing 4 in the Before SQL statement property.
Listing 4. Statement to disable PK constraint
ALTER TABLE TABLE_ITEMS DISABLE CONSTRAINT PK_TABLE_ITEMS_ID;

This statement will explicitly disable the constraint PK_TABLE_ITEMS on the table prior to
loading records to the table.
Set the Index maintenance option connector property to Skip all. Set the Allow concurrent load
sessions connector property to value Yes. Do not define reject link for the stage. Configure the
stage to run in parallel execution mode.
Define the exceptions table TABLE_ITEMS_EXCEPTION. This table will be needed for the
EXCEPTIONS INTO option of the ENABLE CONSTRAINT clause in the ALTER TABLE statement that
we will want to invoke (explained further down). The format that the exceptions table needs to
follow can be found in the SQL script UTLEXCPT.SQL included with the Oracle database product
installation. Based on the information in this script, to create the TABLE_ITEMS_EXCEPTION
exceptions table for this example execute the statement shown in Listing 5.
Listing 5. Statement to create the exceptions table
CREATE TABLE TABLE_ITEMS_EXCEPTIONS(ROW_ID ROWID,
OWNER VARCHAR2(30),
TABLE_NAME VARCHAR2(30),
CONSTRAINT VARCHAR2(30));

Specify the PL/SQL anonymous block shown in Listing 6 in the After SQL statement property.
The connector will submit it to the database once all the records have been loaded. See the
comments embedded in the PL/SQL code for details about individual operations performed by
this PL/SQL block.

Listing 6. Custom PL/SQL code for handling indexes and constraints


DECLARE
-- Define the exception for handling constraint validation error ORA02437.
cannot_validate_constraint EXCEPTION;
PRAGMA EXCEPTION_INIT(cannot_validate_constraint, -2437);
BEGIN
-- Truncate the tables TABLE_ITEMS_DUPLICATES and TABLE_ITEMS_EXCEPTIONS
in case
-- they contain any rows from the previous job runs.
EXECUTE IMMEDIATE 'TRUNCATE TABLE TABLE_ITEMS_DUPLICATES';
EXECUTE IMMEDIATE 'TRUNCATE TABLE TABLE_ITEMS_EXCEPTIONS';
-- Try to enable the PK_TABLE_ITEMS_ID constraint and to build the
underlying
-- unique index. If any rows are in violation of the constraint then store
-- their ROWIDs to the TABLE_ITEMS_EXCEPTIONS table.
EXECUTE IMMEDIATE 'ALTER TABLE TABLE_ITEMS
ENABLE CONSTRAINT PK_TABLE_ITEMS_ID
EXCEPTIONS INTO TABLE_ITEMS_EXCEPTIONS';

be

EXCEPTION WHEN cannot_validate_constraint THEN


-- The constraint could not be enabled. The constraint violations need to

-- handled before trying again to enable the constraint.


BEGIN
-- Set the default date format for the session to match the format of
-- LOAD_DATE job parameter value.
EXECUTE IMMEDIATE 'ALTER SESSION SET NLS_DATE_FORMAT=''YYYY-MM-DD
HH24:MI:SS''';

were
the

-- Copy rows from TABLE_ITEMS table to TABLE_ITEMS_DUPLICATES table that


-- violated the constraint and were loaded by this job. The rows that
-- loaded by this job will have the LOAD_DATE column value that matches
-- LOAD_DATE job parameter value.
INSERT INTO TABLE_ITEMS_DUPLICATES
SELECT ID, NAME, LOAD_DATE
FROM TABLE_ITEMS
WHERE ROWID IN (SELECT ROW_ID FROM TABLE_ITEMS_EXCEPTIONS) AND
LOAD_DATE='#LOAD_DATE#';

-- Delete the rows from TABLE_ITEMS that were copied to


TABLE_ITEMS_DUPLICATES.
DELETE FROM TABLE_ITEMS
WHERE ROWID IN (SELECT ROW_ID FROM TABLE_ITEMS_EXCEPTIONS) AND
LOAD_DATE='#LOAD_DATE#';
-- Try to enable the constraint again. This time the operation should be
-- successful.
EXECUTE IMMEDIATE 'ALTER TABLE TABLE_ITEMS ENABLE CONSTRAINT
PK_TABLE_ITEMS_ID';

-- Rebuild the non-unique index IDX_TBL_ITEMS_COL_NAME that was disabled


-- during the load.
EXECUTE IMMEDIATE 'ALTER INDEX IDX_TABLE_ITEMS_NAME REBUILD PARALLEL
NOLOGGING';
END;
END;

Utilizing Oracle date cache


You can configure the stage to take advantage of the Oracle date cache feature by setting the Use
Oracle date cache property to value Yes. This feature is only available in bulk load mode and
can significantly improve the load performance when the input link has one or more Date or
Timestamp columns, those columns are used to load values to TIMESTAMP table columns and
there are many repeating values for those columns.
The connector prepares values for TIMESTAMP table columns in a format that requires them to be
converted prior to being stored in the table. When the date cache is utilized the result of the
conversion is cached so when the same value appears again on the input the cached result is used
instead of performing the conversion again.
You can specify the number of entries in the date cache through the Cache size property. To
avoid cache misses when looking up the values in the cache, set this property a value no smaller
than the total number of distinct values expected to be encountered on the input per each Date
and Timestamp link column.
You can also choose to disable the date cache if it becomes full during load by setting Disable
cache when full property to Yes. One scenario where this can be useful is when you run the job
repeatedly to load records to the target table. Most of the times the total number of distinct input
values is smaller than the selected cache size but occasionally most of the input values are
unique. In that case, disabling the cache when it becomes full will avoid performing lookups on
the full date cache, which will improve the performance because each of the those lookups would
result in a cache miss.
Note that each player process for the stage will use its own date cache when the stage is
configured to run in parallel execution mode.
In Information Server Version 9.1 when Date and Timestamp input link columns are used to load
values to DATE table columns instead of TIMESTAMP table columns, the values provided by the
connector are in the format that does not require conversion prior to storing them in the target
table. Therefore, the use of date cache feature will not provide much benefit in this case. In
Information Server releases prior to Version 9.1, the conversion will be needed in this case so the
use of date cache will provide benefits similar to when loading values to TIMESTAMP target table
columns.
Disabling redo logging

Database redo log tracks changes made to the database and is utilized in the database recovery
process. When the connector is loading data to the target table, the redo log is updated to reflect
the rows loaded to the table. In case of the database storage media failure, after the database is
restored from the most recent backup, the rows loaded by the connector since the backup was
taken are restored from the redo log. The ability to restore loaded rows from the redo log leads to
potential decrease in performance because the load operation takes longer to complete when the
redo log needs to be maintained and capture the data loaded to the table.
To speed up the load operation performed by the connector you can disable the redo logging by
setting the NOLOGGING attribute on the target table. You can set this flag when you create the
table, or if you wish to set it only during the load you can do that by using the Before SQL
statement and After SQL statement connector properties. In the Before SQL statement property
specify the statement ALTER TABLE table_name NOLOGGING and in the After SQL statement>
property specify the statement ALTER TABLE table_name LOGGING, where table_name is the
name of the table to which the stage is loading data. Another option is to use the Disable logging
connector property when this property is set to No then the logging flag (LOGGING or
NOLOGGING) is used, and when it is set to Yes, the redo logging is disabled.
Disabling redo logging for the bulk load can significantly improve the load performance. In case
of the media failure the data that was loaded while redo logging was disabled can be restored
from the database backup if the backup was taken after that load but it cannot be restored from
the redo log. If restoring loaded data from the redo log in case of a media failure is not required,
or if the data can be easily re-loaded in that case, then disabling redo logging should be
considered for speeding up the load operation. Refer to the Resources section for the link to
Oracle Database online documentation where you can find more information about redo log and
considerations related to disabling the redo logging.
Loading records to a staging table
If you wish to use the connector in bulk load mode to quickly transfer the data to the target
database but at the same time you need the target table triggers, indexes and constraints to be
maintained as if a conventional insert statement was used, one option for you to consider is to
initially load the records to an intermediate staging table that has the same definition as the target
table, but does not contain any triggers, indexes and constraints. After the load to the staging
table completes, issue a follow-up MERGE or INSERT statement that reads rows from the staging
table and writes them to the target table. This statement can be specified in the After SQL
statement connector property so that it is automatically invoked at the end of the job. The use of
the staging table results will cause data to be written twice (once when loaded by the connector
to the staging table and once when copied to the target table by the follow-up statement) but will,
at the same time, have the following beneficial characteristics:

The records will be processed only once by the job. This is important when the
processing of records in the job is resource intensive.
The records will be moved from the DataStage engine hosts to the database host only
once.

The INSERT statement will be executing on the database side so during that time the
DataStage engine hosts will be available to process other workloads.

You can use database SQL functions in the INSERT statement to perform additional
transformation on the values that were loaded to the staging table prior to storing them to
the target table.

If the INSERT statement that copies the values fails for some reason, the loaded records
will remain available in the staging table and you can issue another INSERT statement
after handling the rows in the staging table that have caused the error. You can utilize the
LOG ERRORS INTO clause in the INSERT statement where the rows that cause the INSERT
statement to fail are directed to the error logging table along with the details indicating
the reason for their error.

The indexes, constraints and triggers on the target table will remain enabled, valid and
enforced at all times.

Manual load
You can configure Oracle Connector stage to run in manual load mode in which case the stage
will write records to a file on the engine tier host that can then be passed to the Oracle
SQL*Loader utility to load the records to the target table. To configure the stage in this mode set
the Manual load property to Yes. Loading records in manual load mode is generally slower than
loading them directly to the target table but in some cases can be very useful, as shown next.
One example would be if you want to run the job to extract and process the data at a time when
the target database is not available for load. You can configure the stage to store the data to a file
and then later, when the database is available, you can load the data to it using the SQL*Loader
utility. The data will be already processed at that time and ready to be stored to the target table.
In cases, when the processing phase of the job takes a long time to complete, such as for example
when pulling data from an external source that is slow or when complex and CPU intensive
transformations are involved in the job, the manual load approach can result in significant time
savings.
Note that the connector actually generates two files in this mode the data file with the records
to be loaded and the control file used to pass options to the SQL*Loader and inform it about the
location and format of the data file. You can also transfer the generated files to a host that doesn't
have DataStage installed and load the records from there, presuming that you have SQL*Loader
tool available on that host. If that host is the Oracle server host the load operation will be local to
the database.
Restarting the load and backing up the data that needs to be loaded is also convenient with this
approach because the generated data file is already in the format supported by the SQL*Loader
tool.
Note that when the stage is configured to run in manual load mode it will try to connect to the
database even if it will not be loading records to the database on that connection. If the target

database is not available to accept connections at the time when you run the job, you can work
around the problem by pointing the stage to another database that is available at that time, so that
the stage can establish a connection and complete the manual load operation.
If it is critical to ensure the stage does not connect to any database in manual load mode, you can
accomplish that by ensuring that the following conditions are met:

The connector is not collecting operational metadata. To ensure this define the
DS_NO_PROCESS_METADATA environment variable for the job and set it to value FALSE.
The connector does not issue statements on the database to be interpreted as events by the
IBM InfoSphere Guardium Activity Monitor. To ensure this, ensure that you don't have
CC_GUARDIUM_EVENTS environment variable set for the job.

The Run before and after SQL statements property must be set to No. This ensures that
the connector is not trying to perform any SQL statements on the database prior to and
after loading the records to a file.

The Table action property must be set to Append. This ensures the connector is not
trying to create, drop or truncate a database table prior to performing the load operation.

The Perform operations before bulk load and Perform operations after bulk load
properties must not specify any index and constraint maintenance operations.

When all of these conditions are met then the connector will load records to the data file without
connecting to the database. You will still need to provide Username and Password property
values though in order to be able to compile the job. You can set them to some dummy values or
you can set the Use external authentication property to value Yes which will disable the
Username and Password properties so you will not need to provide values for them.
Note that even if you configure the stage not to connect to the database, you will still need to
have Oracle Client product installed on the DataStage engine host on which the connector runs,
otherwise the connector will fail the initialization when the job starts and the job will fail.
Array size and Buffer size
When the Oracle Connector stage is configured to run in bulk load mode, it reads records from
the input link and stores them in the Oracle specific direct path column array. The connector does
not create this array in memory like the array that it uses for fetch and DML operations. Instead,
Oracle client creates this array on behalf of the connector. The size of this array in number of
records is specified through the Array size property. The default value is 2000.
When the column array becomes full the connector requests from Oracle client to convert the
array to the internal stream buffer format and to send this buffer to the Oracle server. The size of
this stream buffer (in kilobytes) is specified through the Buffer size connector property. The
default value for this property is 1024 (1 megabyte).

Depending on the length of the records, it is possible that the specified Buffer size value will not
be sufficient to store the Array size number of records in stream format. In that case, the Array
size will be automatically reduced to a smaller value and Oracle Connector will log a message to
indicate the newly enforced number of records that it will load at a time. Note that regardless of
the specified Buffer size value the maximum Array size value will be 4096.
When configuring the Array size and Buffer size values in order to achieve optimal performance,
you can apply the following approach:
1. Set the Array size to 4096 (maximum supported) and Buffer size to 1024.
2. Run the job and check if the connector reported a message in the job log that it will load
less than 4096 records at a time. If it did then go to step 3. If it did not then go to step 4.
3. Continue trying with increasingly larger Buffer size values until the performance worsens
or the connector stops reporting the message, whichever takes place first.
4. Continue trying with decreasingly smaller Buffer size values until the performance
worsens.
Loading records with the stage in parallel execution mode
The connector can be configured to run in parallel execution mode and load data from multiple
player processes with each player process loading a subset (partition) of input link records to the
target table. In this case, parallel load must be enabled for the target table segment. To do this set
the Allow concurrent load sessions connector property to value Yes. If this property is set to No
and you attempt to load records from multiple player processes, you will receive Oracle error
"ORA-00604: error occurred at recursive SQL level 1" possibly accompanied by the Oracle error
"ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired".
The exception to the above rule is when the stage is loading rows to a Range, List or Interval
partitioned table (which can optionally be subpartitioned) and the Partition type setting for the
link is set to Oracle Connector. This exception scenario is covered in the next section which
discusses loading records to a partitioned table.
Setting Allow concurrent load sessions property to Yes has implications on how indexes are
handled during load. For more information refer to the earlier section which discussed the
handling of indexes.
Loading records to a partitioned table
When you wish to load records to a single partition or subpartition of the target table, set the
Table scope connector property to Single partition or Single subpartition value, and specify the
name of the partition or subpartition in the Partition name or Subpartition name property. Doing
this will result in locking only the segment of the specified partition or subpartition and will also
return error if any input record needs to be stored to another partition, that way providing
verification mechanism that the records belong to the partition or subpartition specified in the
stage.

When you wish to load records to multiple partitions of a table, set the Table scope connector
property to value Entire table. In respect to triggers, indexes and constraints defined on the
table, the concepts that apply to non-partitioned tables apply in this case as well. When
rebuilding indexes on partitioned tables, the connector will automatically rebuild only those
index partitions or subpartitions that have been marked unusable during the load process. This
applies to both locally and globally partitioned indexes.
When you wish to load records in parallel and make sure that each player process is loading
records to a dedicated table partition, you can do that for a table partitioned based on Range, List
or Interval partitioning strategy (and is optionally subpartitioned) by specifying the Oracle
connector value for the Partition type setting for the input link. In this case, each player process
of the stage will be associated with a single dedicated table partition and will load data to that
partition's segment (or its set of subpartition segments if the table is composite-partitioned) and
because no two player processes will be accessing the same segment, the Allow concurrent load
sessions connector property can be set to No. The exception is when the table has a globally
partitioned index. In that case, the job will fail with the Oracle error "ORA-26017: global
indexes not allowed for direct path load of table partition string".
When considering using Oracle connector value for Partition type when loading records to a
partitioned table, consult the list provided in Part 1 article in the section that covered DML
operations on a partitioned table.
Back to top
Reject link considerations
Reject links are a convenient mechanism for handling input records for which the database
reported an error when it tried to perform the operation specified in the stage. You can enable
reject conditions on the reject link under which the records in error should be rejected, you can
choose to include error code and error text information with the rejected records and you can
point the reject link to another stage to store the rejected records to a location of your choice,
such as a file or another database table.
However, reject links can have negative effect on the performance. When the stage has a reject
link defined, it needs to inspect the error code for each record reported by the database to be in
error, compare it to the conditions defined on the reject link and submit the record to the reject
link, potentially accompanied by the error code and error text information. These operations
consume system resources and affect the job performance as a whole.
You saw earlier that in order to use reject links to handle records with constraint violations in
bulk load write mode, you must run the stage in sequential execution mode. Doing this will result
in rejecting all the rows from the table that violated the constraints, including those rows that
were present in the table before the load. Also, when you run INSERT statements with the
APPEND_VALUES optimization hint you cannot rely on the reject link functionality.

To avoid these limitations associated with the use of reject links, avoid using the reject links
altogether and try to eliminate the records in error BEFORE they reach the Oracle Connector
stage or to eliminate them AFTER the stage has completed the operation for which it was
configured.
As an example of handling records in error BEFORE they reach the Oracle Connector stage, let
us consider the NOT NULL and CHECK constraints in a table created using the statement shown in
Listing 7.
Listing 7. Statement to create test table
CREATE TABLE TABLE_TEST(C1 NUMBER(10) CONSTRAINT CONSTR_NOT_NULL_C1 NOT NULL,
C2 VARCHAR2(20),
C3 DATE,
CONSTRAINT CONSTR_CHK_C1 CHECK (C1 > 20000 AND C1 <
70000));

The constraints on this table require that the values for field C1 are not NULL and that they are
between 20000 and 70000 (exclusive). Instead of configuring the Oracle Connector to use a
reject link for handling these constraints, the constraint checking can be done in the stages
upstream of the Oracle Connector stage. For example, you can use the Filter stage with one
output link leading to the Oracle Connector stage and another reject link leading to a Sequential
File stage (or any other stage to which you wish to direct the bad records). In the Where Clause
property of the Filter stage you would specify a condition as the one shown in Listing 8.
Listing 8. Filter stage condition
C1 IS NOT NULL AND C1 > 20000 AND C1 < 70000

You would also set the Output Rejects property of the Filter stage to value True. The Filter stage
would pass to its output link the records that meet this constraint, and would send the remaining
records to the reject link.
This configuration is shown in Figure 2.

Figure 2. Handling constraint violations with a Filter stage

For more complex constraint expressions or to annotate rejected records with the information
describing the reason why they were rejected, you could use the Transformer stage. For the
previous example, the Transformer stage would have one input link (for example, input_records)
with columns C1, C2 and C3 and would have three output links. All three output links would copy
the columns from the input link. The first output link would have the constraint
IsNull(input_records.C1) defined and an extra column to contain the constant literal value
"C1 value is null.". The second output link would have the constraint input_records.C1
<= 20000 Or input_records.C1 >= 70000 defined and an extra column to contain literal
value "C1 value: " : input_records.C1 : " is not between 20000 and 70000.". The
first and the second link would lead to the Funnel stage that would combine the records on them
and store them to a reject file using the Sequential File stage. The third output link would lead to
the Oracle Connector stage that would write them to the table. This configuration is shown in
Figure 3.

Figure 3. Handling constraint violations with a Transformer stage

The following are some examples of handling records with errors AFTER the stage has
completed moving the records to the database:

When bulk loading to the target table with constraints, load the records to a staging table
instead that doesn't have the constraints defined on it and then issue a follow up INSERT
statement to move the records to the target table. Specify LOG ERRORS INTO clause in the
insert statement so that the records in error are directed by to an alternate table. For more
information, refer to the earlier section that covered loading of records to a staging table.
When bulk loading to the target table with constraints, configure the stage to disable the
constraints before the load and enable them after the load, specify the exceptions table
but do not configure the stage to automatically process the exceptions. Instead, process
them manually after the job completes. Alternatively, perform custom handling of
constraints and indexes using a strategy such as the one suggested earlier in the article.

Back to top
Data types
In this section, we will look at how selection of data types for the link columns can affect the
connector performance. The default link column data types are provided by the connector when
it imports the table definition. They are also provided by the connector at runtime when the stage
has an output link without columns defined on it and the runtime column propagation feature is
enabled for the link.
The default link column data types represent the closest match for the corresponding table
column data types in terms of the ranges of values that they support. In some cases, you may
have an additional insight regarding the format and range of the actual data values processed by
the stage which may in turn allow you to select data types for the link columns that result in
better performance in the connector. In this section, a number of such scenarios are examined.
You should always start with the link columns of default data types. To determine the default
data types, use the connector to import the table that you will be accessing from the stage and
look at the column definitions in the imported table definition. If you configured the stage to
issue a complex SELECT statement that references multiple tables then you will not be able to
import a single table definition for that statement. What you can do in that case is create a
temporary view based on that statement and then import the view instead. When your stage is
configured to execute a PL/SQL anonymous block, you will not be able to import a table or a
view to determine the default link column data types. In this case, you can analyze the PL/SQL
block and for each bind parameter in the PL/SQL block (specified in :param_name or
ORCHESTRATE.param_name format) determine the object in the database to which it corresponds.
In some cases, this object will be a column in a table or a view, in some cases it will be a
parameter of a stored procedure, but in all cases that object will have an associated Oracle data
type. Create a temporary table with the columns of those same data types and import that table.

Refer to the Resources section for the link to the Information Server Information Center where
you can find the table that shows how Oracle Connector maps Oracle data types map to
DataStage data types by default.
Once you have placed columns of default data types on the link, run the job and measure the
performance. Then apply the information from the remainder of this section to further optimize
the job by changing link column data types in such a way so that the performance improves
while no loss of data occurs.
Representing CHAR, VARCHAR2, NCHAR and NVARCHAR2 table column data types
In this section, we will look at the CHAR, VARCHAR2, NCHAR and NVARCHAR2 Oracle data types.
When you use Oracle Connector to import tables with columns of these four data types, the
connector will model CHAR and VARCHAR2 table columns as DataStage Char and VarChar link
columns, and it will model NCHAR and NVARCHAR2 table columns as DataStage NChar and
NVarChar link columns. The exception to this rule is the case when CHAR and VARCHAR2 table
column definitions are based on the character length semantics and the database character set is
a multi-byte character set such as AL32UTF8. In that case, CHAR and VARCHAR2 columns are
modeled as NChar and NVarChar link columns.
If your database character set is a multi-byte character set such as AL32UTF8 and your table
columns use character length semantics, such as a column defined to be of VARCHAR2(10 CHAR)
type, and all the actual data values use only single-byte characters, then use the Char and
VarChar columns on the link (with Extended attribute left blank) instead of the NChar and
NVarChar columns.
Minimize the character set conversions in the stage by ensuring that the effective NLS_LANG
environment variable value for the job is compatible with the NLS map name specified for the
job. This is important not just for the performance reasons, but for the correct functioning of the
stage in regard to interpretation of character set encoding of the values. To determine the
effective NLS_LANG value for the job, run the job with the CC_MSG_LEVEL environment variable
set to 2 and inspect the job log. The connector will log the effective NLS_LANG environment
variable value. If you wish to use a different value, you will need to define NLS_LANG
environment variable for the job and set it to the desired value. Oracle Connector stage editor
does not have the NLS tab so the NLS map name selected for the job is used for the stage. You
can specify this value for the job by opening the job in the DataStage Designer, selecting the
menu option Edit and then clicking Job properties and the NLS page in the dialog and
providing the value in Default map for stages setting. Table 1 shows examples of compatible
NLS map name and NLS_LANG values. Note that you should set the language and category
portions of the NLS_LANG value to match the locale of your DataStage engine host.
Table 1. Compatible NLS map name and NLS_LANG values
NLS map name
Compatible NLS_LANG
UTF-8
AMERICAN_AMERICA.AL32UTF8

Windows-1252
ISO-8859-P1
ASCL_ASCII
Shift-JIS

AMERICAN_AMERICA.WE8MSWIN1252
GERMAN_GERMANY.WE8ISO8859P1
AMERICAN_AMERICA.US7ASCII
JAPANESE_JAPAN.JA16SJIS

If the values processed by the stage are based on only ASCII characters, set the NLS map name
to ASCL_ASCII and set NLS_LANG to AMERICAN_AMERICA.US7ASCII. Otherwise if possible set the
NLS_LANG value to match the database character set. For example, if the stage is writing records
to a database with the AL32UTF8 database character set, use the NLS_LANG value of
AMERICAN_AMERICA.AL32UTF8 and make sure the NLS map name for the job is set to UTF-8. To
determine the database character set you can connect to the database from SQL*Plus and issue
the statement shown in Listing 9.
Listing 9. Determining the database character set
SELECT VALUE FROM NLS_DATABASE_PARAMETERS WHERE PARAMETER='NLS_CHARACTERSET';

Always make sure that the values in the job represented by the Char and VarChar link columns
are actually encoded as indicated in the NLS map name and NLS_LANG settings and that the
values for NChar and NVarChar link columns are UTF-16 encoded. Note that the connector
handles Char and VarChar link columns which have the Extended attribute set to Unicode the
same way it handles NChar and NVarChar link columns.
Make sure that all Char, VarChar, NChar and NVarChar link columns have Length attribute set
to a concrete value. Leaving the Length attribute empty can have severe negative impact on the
performance of the stage. When the Length is not specified, the connector makes assumption that
the length is the maximum one allowed for the respective bind parameters, which is 4000 bytes.
The connector will therefore allocate the memory with the assumed column lengths, which may
result in large memory consumption. Suppose that the connector is configured to insert rows to a
database table with the default Array size of 5000, and the link contains 50 VarChar columns
that do not have the Length attribute set. The connector will allocate (in each player process)
approximately the total of 50 (columns) x 5000 (array size) x 4000 (maximum value length)
bytes which is close to 1 gigabyte. If the actual table columns in the database are defined as
VARCHAR2(50 BYTE) then by setting the Length attribute of the link columns to 50, the Array
size could be left set to 5000 and the size of the allocated memory per player process would drop
to 50 (columns) x 5000 (array size) x 50 (maximum value length) bytes which is about 12
megabytes.
Do not use LongVarBinary, LongVarChar and LongNVarChar link columns to represent CHAR,
VARCHAR2, NCHAR and NVARCHAR2 table columns. These link columns are appropriate for
handling table columns of LONG, LONG RAW, BLOB, CLOB, NCLOB and XMLType columns as
discussed later in the text. Even in those cases, if the actual values for those columns are less

than 4000 bytes in size, the Binary, VarBinary, Char, VarChar, NChar and NVarChar link
columns should be used instead of LongVarBinary, LongVarChar and LongNVarChar link
columns. When LongVarBinary, LongVarChar and LongNVarChar link columns are used, the
connector disables array processing and enforces the Array size value of 1, even if you specify
small Length attribute values for those link columns. Enforcing the Array size of 1 can have a
major negative impact on the performance.
Ensure that the connector is using optimal character set form for the Char, VarChar, NChar and
NVarChar link columns. Character set form is an Oracle database concept and can be one of two
values: IMPLICIT and NCHAR. The optimal character set form for the connector to use for CHAR
and VARCHAR2 table columns is IMPLICIT and for NCHAR and NVARCHAR2 table columns it is
NCHAR.
The connector will automatically use optimal character set form if you use Char and VarChar
link columns for CHAR and VARCHAR2 table columns and NChar and NVarChar link columns for
NCHAR and NVARCHAR2 table columns. As stated earlier, note that Char and VarChar link columns
with Extended attribute set to Unicode are treated as NChar and NVarChar link columns.
In some cases, you may end up using link columns that result in sub-optimal character set form
selection in the connector. For example, when you use the connector to import a table with
VARCHAR2 columns that are based on character length semantics and the database is based on
AL32UTF8 (multi-byte) character set, the connector will import that column as NVarChar link
column. By importing it this way the connector is able to set the Length attribute of the link
column to match the length specified in the table column definition while ensuring that data
truncation does not happen at runtime. At runtime the connector will choose the character set
form NCHAR for this link column which will not be the optimal choice for the VARCHAR2 target
table column. If this link column happens to be referenced in a WHERE clause in the statement
specified for the stage then any index defined on this table column will not be utilized. This
applies to SELECT statements used by the stage configured for sparse lookup mode, as well as
UPDATE and DELETE statements when the connector is configured to write records to the database
in Insert, Update, Delete, Insert then update, Update then insert or Delete then insert write
modes. The skipping of the index in this case can have significant performance implications. To
force the connector to use the optimal character set form in this case, you could apply one of the
following strategies:

Change the link column data type from NVarChar to VarChar. In this case, you will also
need to update the Length attribute for this link column by multiplying it by 4 because
UTF-8 characters can consume up to 4 bytes. As we mentioned before, if you know that
all the actual data values processed by the stage use single-byte characters, then you can
leave the Length attribute unchanged.
Change the data type of the table column to NVARCHAR2. In this case, the NCHAR character
set form selected by the connector will be the optimal choice. In many practical cases
though, making changes to the existing column definitions in the database tables is not an
option.

Utilize the CC_ORA_BIND_FOR_NCHARS environment variable. Set the value of this


environment variable for the job to contain comma-separated list of link column names
for which you wish the connector to use NCHAR character set form. If you specify the
special value (none) for this environment variable, then the connector will use IMPLICIT
character set form for all columns on the link. Another special value is (all) (the
parenthesis are part of the value) which will result in the connector using NCHAR character
set form for all the columns on the link. If none of the tables in your database have
NCHAR, NVARCHAR2 and NCLOB (discussed later) columns then you can define
CC_ORA_BIND_FOR_NCHARS environment variable at the DataStage project level and
specify (none) as its default value (again the parenthesis are the part of the value). That
way the connector will always use IMPLICIT form which will always be the optimal
character set form.

Representing NUMBER, FLOAT, BINARY_DOUBLE and BINARY_FLOAT table column data


types
In this section we will take a look at the NUMBER, FLOAT, BINARY_DOUBLE and
BINARY_FLOAT Oracle data types.
For NUMBER(p, s) table column where precision p and scale s are explicitly specified, the
default and at the same time the optimal link column data type is Decimal with Length attribute
set to p and Scale attribute set to s. The exception to this rule are the scenarios where s < 0 and
where s > p. In those cases, the values need to be provided to the stage and retrieved from the
stage in text format so the link column should be of VarChar type with the Length attribute value
sufficiently large value to accommodate all possible values for the respective column in the table.
For NUMBER(p) table column where precision p is explicitly specified but scale s is omitted, the
default link column data type is Decimal with the Length attribute set to p and Scale attribute
left empty. If the remaining stages in the job are operating on integer (TinyInt, SmallInt,
Integer or BigInt) values for these columns then you may be able to achieve better
performance if you preserve those integer data types for the columns on the link of the Oracle
Connector stage than if you change them to the Decimal data type. This is especially the case
when the connector stage is writing to a partitioned table and the corresponding table column is
used as part of the partition key of the table and you specified Oracle connector value for the
Partition type setting of the link. If you decide to replace Decimal(p) link columns with integer
link columns keep in mind the differences between these link data types. Although they all
represent integer values, the ranges of values that they cover differ.
For example, if a table column is defined as NUMBER(5) and you choose to replace the default
Decimal(5) link column with a SmallInt link column, the values supported by the NUMBER(5)
table column will be in the range [-99999, 99999] and the values supported by the SmallInt link
column will be in the range [-32768, 32767]. If you set the Extended attribute of the link column
to Unsigned, the values that are supported by the link column will be in the range [0, 65535]. In
both cases, the range of the link column will be a sub-range of the respective table column. So
you should only use this approach if you are sure that all the values that will be processed by the
stage belong to both ranges. In the previous example, you may choose to use SmallInt link

column (signed or unsigned) when writing values to the database, but to use Integer column
(signed) when reading values from the database because the range of the supported values for the
Integer (signed) link column is [-2147483648, 2147483647] and is therefore sufficient to
support any value supported by the NUMBER(5) table column. Note that if you choose to use
BigInt column on the link, the range of supported values for the link column will be [9223372036854775808, 9223372036854775807]. And if the Signed column attribute is set to
value Unsigned the range will be [0, 18446744073709551615]. However, the choice of BigInt
link column will likely result in smaller performance gains than if TinyInt, SmallInt or
Integer link column is used, and may even produce worse performance than the Decimal link
column. This is because the connector will exchange BigInt link columns values with the
database as text values which will involve additional data type conversions.
For the NUMBER table column where both precision p and scale s are omitted, the default link
column data type is Double. But if you are certain that the actual values in the table are all
integer values, you may choose to use link column of one of the integer data types (TinyInt,
SmallInt, Integer, BigInt) or you may choose to represent the values as decimal values with a
specific precision and scale by using a link column of Decimal data type. Keep in mind that the
supported ranges of these link column data types are not fully compatible so only use these link
column data types in cases when they will not result in the loss of data precision due to the
rounding or in cases when such a rounding is acceptable.
For FLOAT, BINARY_DOUBLE and BINARY_FLOAT table columns, the default and optimal column
types to use for the corresponding link columns are Float, Double and Float, respectively.
Representing DATE, TIMESTAMP and INTERVAL table column data types
In this section we will take a look at the DATE, TIMESTAMP and INTERVAL Oracle data types.
For DATE, TIMESTAMP(0), TIMESTAMP(0) WITH TIME ZONE and TIMESTAMP(0) WITH LOCAL
TIME ZONE table columns the default link column data type is Timestamp with Extended
attribute left empty. This is the optimal link column type to use in case when hour, minute and
second portions of the values need to be preserved. Otherwise, the Date link column will likely
result in better performance because the stage does not need to handle hours, minutes and
seconds in the values. Note that if you need to preserve time zone information in the values you
need to use character based link columns such as VarChar. This is covered in more detail later in
this section.
For TIMESTAMP(p), TIMESTAMP(p) WITH TIME ZONE and TIMESTAMP(p) WITH LOCAL TIME
ZONE table columns where precision p > 0, the default link column data type is Timestamp with
Extended attribute set to Microseconds. The processing of Timestamp link columns with
Extended attribute set to Microseconds takes considerably more time in the Oracle Connector
than the processing of Timestamp link columns which have the Extended attribute left empty.
This is especially the case for SELECT and DML (INSERT, UPDATE, DELETE) statements and to a
lesser extent bulk load operation. Avoid the use of Timestamp link columns with Extended
attribute set to Microseconds unless you absolutely need to support fractional seconds in the
timestamp values.

The connector models TIMESTAMP(p), TIMESTAMP(p) WITH TIME ZONE and TIMESTAMP(p)
WITH LOCAL TIME ZONE table columns with p > 6 as Timestamp link columns with Extended
attribute set to Microseconds. The connector will in this case perform truncation of fractional
seconds to the microsecond precision. To handle timestamp values with fractional second
precision greater than 6 up to and including 9, use the VarChar link column. In that case, the
connector will exchange timestamp values with Oracle client as text values. Ensure in this case
that the NLS_TIMESTAMP_FORMAT session parameter is set to match the format of the actual values
provided to the stage on its input link. The same format will be used by the stage for the values
that it provides on its output link. You can set this session parameter using the Before SQL (node)
statement property, by issuing an ALTER SESSION statement. For example, to handle timestamp
values as text values with 9 fractional digits, a statement like the one shown in Listing 10 could
be specified in the Before SQL (node) statement property.
Listing 10. Setting the timestamp format with 9 fractional seconds for the session
ALTER SESSION SET NLS_TIMESTAMP_FORMAT='YYYY-MM-DD HH24:MI:SS.FF9'

You can apply this same approach for TIMESTAMP(p) WITH TIME ZONE, INTERVAL YEAR (yp)
TO MONTH and INTERVAL DAY(dp) TO SECOND(sp) table columns. For TIMESTAMP(p) WITH
TIME ZONE table columns set the NLS_TIMESTAMP_TZ_FORMAT session parameter to the
appropriate format, which includes the time zone format if the time zone needs to be preserved in
the values. The INTERVAL(yp) TO MONTH and INTERVAL DAY (dp) to SECOND(sp) values
should be specified as interval literals in the format defined in the Oracle database. Refer to the
Resources section for the link to Oracle Database online documentation where you can find more
details and examples for INTERVAL Oracle data types and interval literals.
If you are using Timestamp (with Microseconds) link columns because you need to preserve
microseconds in timestamp values, you can again apply this same approach as it can potentially
lead to improved performance. For example if you are fetching values from a TIMESTAMP table
column and the downstream stage is capable of handling timestamp values in text format, then
replace the Timestamp (with Microseconds) column on the output link of Oracle Connector stage
with a VarChar column so that the connector provides timestamp values in text format.
Likewise, if you are inserting values to a TIMESTAMP table column and the upstream stage can
provide the timestamp values in text format, replace the Timestamp (with Microseconds)
column on the input link of Oracle Connector stage with a VarChar column so that the connector
accepts timestamp values in text format. The Length attribute of the VarChar column and the
NLS_TIMESTAMP_FORMAT session variable must be appropriate for the format of the timestamp
text values. For example, to use the default YYYY-MM-DD hh:mm:ss.ffffff format for the
timestamp values, set the Length attribute of the VarChar column to 26 and specify the statement
shown in Listing 11 in the Before SQL (node) statement connector property.
Listing 11. Setting the timestamp format with 6 fractional seconds for the session
ALTER SESSION SET NLS_TIMESTAMP_FORMAT='YYYY-MM-DD HH24:MI:SS.FF6'

Even in the cases when the remaining stages of the job are not capable of handling timestamp
values with microseconds in text format and require that the Timestamp (with Microseconds)
link column is used, you can choose to map this column to a VarChar column explicitly for the
Oracle Connector stage by using a Transformer stage as shown below:

If the link column is located on the output link of the Oracle Connector stage and the link
name is out_link, insert a Transformer stage between the Oracle Connector stage and the
next stage in the job flow and map the VarChar(26) column on the input link of the
Transformer to the Timestamp(with Microseconds) column on the output link of the
Transformer stage using the derivation shown in Listing 12.
Listing 12. Converting string to timestamp
TimestampToString(in_link.C1,"%yyyy-%mm-%dd %hh:%nn:%ss.6")

If the column is located on the input link of the Oracle Connector stage and the link name
is in_link, insert a Transformer stage between the previous stage in the job flow and the
Oracle Connector stage and map the Timestamp(with Microseconds) column on the input
link of the Transformer stage to the VarChar(26) column on the output link of the
transformer stage using the derivation shown in Listing 13.
Listing 13. Converting timestamp to string
TimestampToString(in_link.C1,"%yyyy-%mm-%dd %hh:%nn:%ss.6")

Representing RAW table data type


In this section we will take a look at the RAW Oracle data type.
Use VarBinary link columns to read and write RAW table column values in their native raw
binary form. For the Length attribute of the link columns specify the sizes of the respective RAW
table columns.
Use the VarChar link columns if you need to read or write binary values as sequences of
hexadecimal digit pairs. In this case, make sure to set the Length attribute of the links columns to
a value twice the size of the respective RAW table columns because each byte in a RAW table
column value will be represented by a pair of hexadecimal digit (0-9 and A-F) characters. The
use of VarChar link columns will allow you to read and write RAW table column values in
hexadecimal digit pair's form without a need to handle the conversions in another stage in the
job.

Representing LONG, LOB and XMLType table column data types


In this section we will take a look at the LONG, LONG RAW, BLOB, CLOB, NCLOB and XMLType
Oracle data types.
The preferred link column data types for these table column data types are shown in Table 2.
Table 2. Preferred link column data types for LONG, LOB and XMLType table column
data types
Table column data type Preferred link column data type
BLOB, LONG RAW
LongVarBinary
CLOB, LONG, XMLType LongVarChar
NCLOB
LongNVarChar
You can use other combinations of link and table column data types as well but doing that may
negatively affect the performance and will likely require you to use additional data type
conversion functions in your SQL statements in order to avoid data type mismatch errors.
Every time you have LongVarBinary, LongVarChar or LongNVarChar columns defined on the
link the connector enforces Array size value of 1 for that link which has negative effect on
performance. This is the main reason why LongVarBinary, LongVarChar and LongNVarChar
link columns should only be used when handling the table column data types covered in this
section.
When the values represented by the LongVarBinary, LongVarChar and LongNVarChar link
columns are 4000 bytes long or less, you should use VarBinary, VarChar and NVarChar link
columns instead, respectively, even when the corresponding table columns are of the data types
discussed in this section. When you do that the connector will not enforce the Array size value of
1 which will typically result in significant performance improvements. Also the guidelines
presented earlier for character and binary table column data types will apply in that case instead
of the guidelines presented in the remainder of this section.
When you configure the connector stage to fetch values from a table column represented by a
LongVarBinary, LongVarChar or LongNVarChar link column, you have two choices for how the
connector should pass the fetched values for that column to the output link. You can configure
the stage to pass the values inline which means that each value is passed in its original form to
the output link. Alternatively, you can configure the stage to pass the values by reference which
means that a locator string representing the location of each value is passed instead of the actual
value. The actual value represented by the locator string is then fetched by the last downstream
stage in the job that stores this value to its intended destination. The locator string contains
information that specifies how to retrieve the actual value from its source.
By default, the Oracle Connector stage passes values inline. To pass the values by reference you
need to set the Enable LOB references connector property to Yes, and then in the Columns for

LOB references connector property specify the names of the input link columns for which you
wish the connector to pass the values by reference. You can use the following guidelines when
deciding to pass values by reference or inline for the LongVarBinary, LongVarChar and
LongNVarChar link columns:

Do the values only need to be passed through the job or do they also need to be
interpreted (parsed or modified) by certain stages in the job? In the latter case, do not
pass the values by reference because it is not the values that are passed in that case but
the locator strings that represent them.
Which stage in the job is the consumer of the values? Only stages of connector-based
stage types such as WebSphere MQ Connector, DB2 Connector, ODBC Connector,
Teradata Connector and Oracle Connector are capable of recognizing locator strings and
resolving them to actual values. Stages of other stage types will treat locator strings as the
actual field values. For example, you can use Oracle Connector stage to pass CLOB value
by reference to a DB2 Connector stage to store the value to a CLOB column in the DB2
database. But if you pass that value by reference to a Sequential File stage, the locator
string will be stored in the target file and not the actual CLOB value. You can use the latter
scenario to check the size of the locator strings so that you can compare them to the
average size of the actual values and estimate the savings in terms of the amount of data
transferred through the job when the locator strings are passed instead of the actual
values. Typically the locator strings will be less than 2000 characters in length.
How large are the actual values? The larger the values are the more likely that passing
them by reference will result in performance gains. As a rule of thumb values that
measure about 100 kilobytes should be passed inline otherwise they should be passed by
reference. Understanding the specifics of your environment in combination with the
actual testing that you perform should help you with this decision. For example, if the
Oracle Connector stage that reads the value from the database runs on a DataStage engine
host which is different from the host in which the downstream stage consumes those
values, then passing values by reference may provide performance benefits even for
smaller actual values (as long as they are still larger than the locator strings) because they
will be need to transferred over the network as they get exchanged between the player
processes of the respective stages.

Note that DataStage imposes a limit on the size of records transferred between the player
processes in the job. The APT_DEFAULT_TRANSPORT_BLOCK_SIZE built-in DataStage
environment variable can be used to control this limit and by default it is set to 131072 bytes
(128 kilobytes). For example, if you pass link column values inline and the values consume
hundreds of kilobytes then you will need to override this environment variable and set it to a
sufficiently large value, otherwise you will receive error message at runtime indicating that the
record is too big to fit in a block. Set the value for this environment variable to accommodate the
largest record that will be passed through the job. Do not set it to a value larger than necessary
because the value will apply to all the player connections on all the links in the job so the
increase of this value may lead to a significant increase of the memory usage in the job.

Oracle Connector stage handles values represented by LongVarBinary, LongVarChar and


LongNVarChar link columns as follows:

For the read mode (fetching) it utilizes one of the following two Oracle mechanisms piecewise fetch or LOB locators. The connector inspects the source table column data
type and automatically selects the mechanism that is more suitable for that data type.
For the bulk load write mode it utilizes the Oracle direct path interface when loading
large values. Again the connector does this automatically and there are no additional
options to consider.
For the write modes other than bulk load it utilizes one of the following two Oracle
mechanisms - piecewise inserts/updates or OCI LOB locators. By default it uses
piecewise inserts/updates. To force the connector to use OCI LOB locators for some of
the link columns, specify those link column names in a comma-separated list and set that
list as the value of the CC_ORA_LOB_LOCATOR_COLUMNS environment variable for the job.
To use OCI locator string mechanism for all LongVarBinary, LongVarChar and
LongNVarChar link columns, specify the special value (all) for this environment
variable (the parenthesis are included with the value). Note that for the link columns that
represent LONG and LONG RAW target table columns the connector must be configured to
use piecewise insert/update mechanism. Also note that for the link columns that represent
XMLType target table columns the connector must be configured to use OCI locator string
mechanism if the values are larger than 4000 bytes. Do not confuse LOB locators with
the locator strings used when passing values by reference. The former is an Oracle
mechanism for accessing large values in the database, the latter is a DataStage
mechanism for propagating large values through the job.

The character set form and character set conversion considerations that affect the performance
and that were presented earlier in the section which covered CHAR, VARCHAR2, NCHAR and
NVARCHAR2 table column data types apply here as well. Simply assume the use of LongVarChar
and LongNVarChar link columns instead of the VarChar and NVarChar link columns as you take
those considerations into account.
As we saw in this section, to populate target table with records that contain LongVarBinary,
LongVarChar and LongNVarChar link columns you have three different options: bulk load mode,
insert mode with piecewise insert mechanism and insert mode with OCI LOB locator
mechanism. The three modes are available when the values arrive to the stage as inline values as
well as when they arrive as locator strings (values passed by reference). Typically, the bulk load
mode will have the best performance, followed by the insert mode with piecewise insert
mechanism and then the insert mode with OCI LOB locator mechanism. Depending on the
specifics of your environment this may not always be the case and one way to determine which
option provides the best performance is to run a series of tests and compare the results. The
exception is the case when the target table has columns of XMLType data type in which case bulk
load mode should not be considered if BINARY XML or OBJECT RELATIONAL storage type (as
opposed to CLOB storage type) is used for those XMLType columns or if the table is defined as a
XMLType object table.

When fetching, inserting or updating XMLType values, utilize the Oracle XMLType package
functions GETCLOBVAL, GETBLOBVAL and CREATEXML. Refer to the Resources section for the link
to the Information Server Information Center where you can find example statements for
accessing XMLType columns with the connector.
Back to top
Conclusion
In this article we covered various aspects of performance tuning of the Oracle Connector stage in
DataStage jobs. We discussed running Oracle Connector in bulk load mode taking into account
data volume considerations, handling table triggers, indexes, and constraints. We also looked at
reject link considerations for handling input records that have errors. Finally, we went over how
the data types for link columns can affect the connector performance.
In combination with the information listed in the Resources section and the information you
collected through your own research, this article should help you achieve optimal performance of
Oracle Connector stages in your DataStage jobs.
Back to top
Acknowledgments
Many thanks to the following contributors for their valuable input and review of this article:

Paul Stanley, InfoSphere Senior Architect


Tony Curcio, InfoSphere Product Manager

Fayaz Adam, InfoSphere Staff Software Engineer

Resources
Learn

Use an RSS feed to request notification for the upcoming articles in this series. (Find out
more about RSS feeds of developerWorks content.)

Explore the IBM InfoSphere Information Server Version 9.1 Information Center where
you will find technical documentation for Information Server tools and components such
as DataStage and Oracle Connector.

Visit IBM Information Management Redbooks page and download redbook titles such as
InfoSphere DataStage Parallel Framework Standard Practices and IBM InfoSphere
DataStage Data Flow and Job Design.

Find details about Oracle Database concepts mentioned in this article in the Oracle
Database Documentation Library 11g Release 2 (11.2).

Access a variety of technical resources for Information Management products in


developerWorks Information Management zone.

Stay up to date with IBM products and IT industry by attending developerWorks Events.

Follow developerWorks on Twitter.

Get products and technologies

Evaluate IBM software products in a way most convenient for you: by downloading trial
product versions, by trying products online for few hours in a SOA sandbox environment
or by accessing products in a cloud computing environment.

Discuss

Connect with other developerWorks users in developerWorks Community as you explore


developer oriented blogs, forums, groups, wikis and more.

About the author

Predrag Maksimovic is a member of the InfoSphere Connectivity team at IBM in Boca Raton,
Florida. He has 15 years of experience in design and development of connectivity components
and tools for enterprise data integration. His areas of interests include relational databases,
enterprise messaging systems and big data technologies.
Rate this article
Average rating (6 votes)
1 star

1 star

2 stars

2 stars

3 stars

3 stars

4 stars

4 stars

5 stars

5 stars

Comments

Add comment:
Sign in or register to leave a comment.
Note: HTML elements are not supported within comments.

Notify me when a comment is added1000 characters left

Be the first to add a comment


Back to top

Table of contents

Introduction
Bulk load

Reject link considerations

Data types

Conclusion

Acknowledgments

Resources

About the author

Comments

Dig deeper into Information Management on


developerWorks

Overview
New to Information Management

Products

Downloads

Technical library (articles, tutorials, and more)

Community and forums

Events

Try IBM PureSystems. No charge.

Experience today!
Get started with the cloud-based trial and pattern development kit.

Special offers

Trial software offers

Print this page

Share this page

Follow developerWorks

About

Help

Contact us

Submit content

Feeds

Report abuse

Terms of use

IBM privacy

IBM accessibility

Faculty

Students

Business Partners

You might also like