Troubleshooting Guide

IBM InfoSphere DataStage
Version 9 Release 1
Troubleshooting Guide
SC19-3804-00
IBM InfoSphere DataStage

Version 9 Release 1
SC19-3804-00
Note Before using this information and the product that it supports, read the information in Notices and trademarks on page 91.
Copyright IBM Corporation 2008, 2012. US Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Troubleshooting InfoSphere DataStage 1
Troubleshooting problems when starting an InfoSphere DataStage and QualityStage client . . . 1 Failure to connect to services tier: invalid host name . . . . . . . . . . . . . . . . 1 Failure to connect to services tier: invalid port . . 2 IBM WebSphere Application Server fails to start: AIX and Linux. . . . . . . . . . . . . 3 Cannot authenticate user . . . . . . . . . 5 Troubleshooting scheduled jobs . . . . . . . . 5 Resolving scheduling problems on Windows engine tier hosts . . . . . . . . . . . . 6 Resolving scheduling problems on UNIX and Linux servers . . . . . . . . . . . . . 8 Resolving job termination problems . . . . . . 10 Resolving problems with database stages on 64-bit systems. . . . . . . . . . . . . . . . 10 Resolving ODBC connection problems on UNIX and Linux systems . . . . . . . . . . . . . 10 Testing ODBC driver connectivity . . . . . . 10 Checking the shared library environment . . . 11 Checking symbolic links . . . . . . . . . 12 Resolving configuration problems on UNIX systems 12 Running out of file units . . . . . . . . . 12 Running out of memory on AIX computers. . . 13 Troubleshooting Designer client errors . . . . . 14 Handling exceptions in the Designer client . . . 14 Viewing log files and error reports . . . . . 15 Troubleshooting a failure to submit jobs when you run a column analysis . . . . . . . . . . . 16 Troubleshooting login failures . . . . . . . . 16 Client side login failures . . . . . . . . . 16 Server-side login failures . . . . . . . . . 18 Server-rich client login failures . . . . . . . 20 Troubleshooting job design issues . . . . . . . 24 IBM InfoSphere DataStage Error: Job xxx is being accessed by another user . . . . . . . . . 24 DataStage Parameter Set - Parameter Set locked by non-existent user . . . . . . . . . . 25 Cannot get exclusive access to the log for a job. 25 Troubleshooting problems when creating InfoSphere DataStage projects . . . . . . . Troubleshooting job failures . . . . . . . . . Low system resource issues . . . . . . . . Disk space issues . . . . . . . . . . . Disk lookup issues . . . . . . . . . . . Data processing failures . . . . . . . . . DataStage timeout variables . . . . . . . . Troubleshooting Specific Stages . . . . . . . . DB2 Connector Stage . . . . . . . . . . Join Stage . . . . . . . . . . . . . . Lookup Stage. . . . . . . . . . . . . Sequential File Stage . . . . . . . . . . Teradata Connector Stage. . . . . . . . . Sort Stage . . . . . . . . . . . . . . Transformer Stage . . . . . . . . . . . DataStage Parallel framework changes that require DataStage job modifications . . . . . Troubleshooting for specific operating systems . . Troubleshooting slow jobs that use data sets in cluster environments . . . . . . . . . . Heap allocation errors with DataStage Parallel Jobs on the AIX platform . . . . . . . . . Tuning engine parameters . . . . . . . . . Using tunable parameters in the UVCONFIG file Using tunable parameters in the UVCONFIG file Enabling tracing for DataStage parallel jobs. . . . 26 33 33 42 45 46 52 54 54 57 58 61 62 63 63 68 71 71 72 74 74 78 82
Contacting IBM . . . . . . . . . . . 85 Accessing product documentation. . . 87 Product accessibility . . . . . . . . 89
Notices and trademarks . . . . . . . 91 Index . . . . . . . . . . . . . . . 95
Copyright IBM Corp. 2008, 2012
iii
iv
Troubleshooting InfoSphere DataStage

These topics contain troubleshooting information for IBM InfoSphere DataStage. Some of the information is also useful for QualityStage users.
Troubleshooting problems when starting an InfoSphere DataStage and QualityStage client

If you cannot start your client and connect to the services tier (domain), the problem might be due to an invalid host name or invalid port, a startup script error, or incorrect user credentials. The following table gives a list of the possible error messages, and tells you where to look for solutions.
Table 1. Error messages when starting an InfoSphere DataStage and QualityStage client Error message Failed to authenticate the current user against the selected Domain: Server [servername] not found. Failed to authenticate the current user against the selected Domain: Could not connect to server [servername] on port [portnumber]. Failed to authenticate the current user against the selected Domain: Invalid user name (username) or password. Related topic Failure to connect to services tier: invalid host name Failure to connect to services tier: invalid port on page 2 Cannot authenticate user on page 5
Failure to connect to services tier: invalid host name

You can verify whether the IBM WebSphere Application Server has started, and test whether the host name that you are using is valid.
Symptoms
When you attempt to start one of the InfoSphere DataStage and QualityStage clients, the following message is displayed:
Failed to authenticate the current user against the selected Domain: Server [servername] not found.
Causes
You might be specifying an incorrect name for the computer that is hosting the services tier.
Diagnosing the problem

You can check whether the application server is running by attempting to connect to the application server using an Internet browser. Connect to the application server by using a web browser: 1. Open a web browser. 2. Type the application server address in the form: http//isserver:portnumber/ ibm/iis/console, where isserver is the name of the computer where the services tier is installed (or its IP address), and portnumber is the port for connecting to the services tier (by default, 9080).
If the application server has started, the login screen is displayed; otherwise an error message is displayed. You can test whether you specified the correct name for the isserver by attempting to ping the computer that is hosting the services tier.
Resolving the problem

You can try to fix the problem by specifying a fully qualified path name for the computer that is hosting the application server. For example, instead of isserver:9080, you might have to type isserver.mycompany.com:9080. If the application server is not running, attempt to start the service. To start the application server where the services tier is installed on a Microsoft Windows computer, click Start > All Programs > IBM WebSphere > Application Server v6 > Profiles > default > Start the server To start the application server where the services tier is installed on a UNIX or Linux computer, you must have root authority. To start the application server, do the following steps: 1. From a terminal window, change to the WASInstDir/ASBServer/bin/ directory. WASInstDir is the installation directory for the application server. The default installation directory is /opt/IBM/InformationServer/. 2. Run the following command:
./MetadataServer.sh start
Failure to connect to services tier: invalid port

You can verify that the IBM WebSphere application server has started, test whether the port number is valid, and confirm that you specified the correct port number.
Symptoms
Failed to authenticate the current user against the selected Domain: server [servername] on port [portnumber]. Could not connect to
Causes
The port number is incorrect or is unavailable.

You can check whether the application server is running by attempting to connect to the application server by using an Internet browser. Connect to the application server by using a web browser: 1. Open a web browser. 2. Type the application server address in the form: http//isserver:portnumber/ ibm/iis/console, where isserver is the name of the computer where the services tier is installed (or its IP address), and portnumber is the port for connecting to the services tier (by default, 9080). If the application server has started, the login screen is displayed; otherwise an error message is displayed.
Test whether the port is accessible from the client computer by typing at the command line:
telnet hostname port
If you get an error message, then the port is inactive. If you get no response, then the port is active. You can also test which ports are listening on the server computer by typing the following command:
netstat -a
Look for an entry in the form: isserver:port_number You can check whether you are specifying the correct port number in the WebSphere Administrative Console. To look up the port number: 1. From the start menu, select IBM WebSphere > Application Server v6 > Profiles > default > Administrative console to start the WebSphere Administrative Console. 2. Log in using the websphere user name and password that was specified when IBM InfoSphere Information Server was installed. 3. In the left pane, select Servers > Application servers 4. Click the server1 link. 5. Select Communications > Ports. 6. Look for the port number for WC_defaulthost. This is the port number you should use when connecting to the application server.

If the application server is not running, attempt to start the service. To start the application server where the services tier is installed on a Microsoft Windows computer, click Start > All Programs > IBM WebSphere > Application Server v6 > Profiles > default > Start the server To start the application server where the services tier is installed on a UNIX or Linux computer, you must have root authority. To start the application server, do the following steps: 1. From a terminal window, change to the WASInstDir/ASBServer/bin/ directory. WASInstDir is the installation directory for the application server. The default installation directory is /opt/IBM/InformationServer/. 2. Run the following command:
./MetadataServer.sh start
You can also check whether there is a firewall between the client and the server. If there is a firewall, temporarily disable it to verify that all inbound and outbound ports are open.
IBM WebSphere Application Server fails to start: AIX and Linux

If the WebSphere application server has not started, you can try to fix the problem by changing one of the startup scripts.
Symptoms
The application server fails to start after system is restarted. No messages are generated in the application server logs.
Causes
The Metadata server startup script fails to finish. You must issue the nohup command for the Metadata server startup script.
Environment
IBM AIX or Linux systems.

Check to ensure that WebSphere Application Server is running. Connect to the application server by using a web browser: 1. Open a web browser. 2. Type the application server address in the form: http//isserver:portnumber/ ibm/iis/console, where isserver is the name of the computer where the services tier is installed (or its IP address), and portnumber is the port for connecting to the services tier (by default, 9080). If the application server has started, the login screen is displayed; otherwise an error message is displayed.

1. Run the following command to locate the WebSphere Application Server startup scripts on your computer:
cd /etc find . -name "*" -print | xargs grep -i InformationServer
This command might return multiple files with various prefixes in the name. Some files might be links to other files and could reflect the change you made in the original file without needing to edit each file that was found. If you have multiple instances of WebSphere Application Server installed, unique files might exist for each WebSphere Application Server instance. You only have to modify the files that reference the instances of WebSphere Application Server that you have configured to start as non-root. 2. Identify the files that you need to modify. Typically, you must modify the following files:
Operating system AIX Files /etc/rc#.d/S99ISFServer The number symbol (#) can have the value of 0 through 6. For example: /etc/rc0.d/S99ISFServer /etc/rc2.d/S99ISFServer /etc/rc5.d/S99ISFServer Linux /etc/init.d/ISFServer
3. In each file, change the following content. Locate the following text, where IS_install_path is the directory where you installed InfoSphere Information Server. The default installation path is /opt/IBM/InformationServer:
"IS_install_path/ASBServer/bin/MetadataServer.sh" "$@"
Change the text to match this example:

nohup "IS_install_path/ASBServer/bin/MetadataServer.sh" "$@"
4. Save the modified files and restart your system.
Cannot authenticate user

You can verify whether the failure to start the client is due to a problem with the user credentials.
Symptoms
Failed to authenticate the current user against the selected Domain: or password. Invalid user name (username)
Causes
There are several possible causes of this problem. v The user name is invalid. v v v v The password is invalid or has expired. The user has no suite user role. Credential mapping is required, but has not been defined for this user. The user has no DataStage role or has the incorrect DataStage role.

You can diagnose the likely cause of your problem by identifying where the problem occurs: v InfoSphere Information Server authentication is performed when you retrieve a list of available projects in the Attach to Project window. If you get an error when retrieving the project list, then the user name or password are invalid, or the user has no suite user role, or credential mapping is required, but has not been performed. v DataStage role checking is performed when you attach to the project. If you can retrieve the list of projects but you cannot attach to a project, then the problem lies with the DataStage role for this user.

Information on resolving issues concerned with user roles is available in IBM InfoSphere DataStage and QualityStage Administrator Client Guide. Information on creating users when configuring InfoSphere Information Server is available in IBM InfoSphere Information Server Planning, Installation, and Configuration Guide. Information on credential mapping is available in IBM InfoSphere Information Server Administration Guide.
Troubleshooting scheduled jobs

You can schedule jobs to run when the system is less busy. You schedule jobs from the Job Schedule view in the InfoSphere DataStage and QualityStage Director client window.
InfoSphere DataStage does not have its own separate scheduling program. Instead, whenever an InfoSphere DataStage user schedules a job, the underlying operating system controls the job. If scheduled jobs do not run correctly, the problem is usually with the operating system configuration on the engine.
Resolving scheduling problems on Windows engine tier hosts

On Microsoft Windows engine tier hosts, job scheduling is carried out by the Schedule service. If your scheduled job did not run, there are a number of steps that you can take to identify the cause.
Viewing the schedule log

You can view the schedule log to diagnose problems with job scheduling.
Symptoms
Scheduled jobs do not run when expected.
Environment
This advice applies to the Windows environment.

The schedule log is a text file named dsr_sched.log. It is located in the project directory (by default: c:\IBM\InformationServer\Server\Projects). This file records any problems that occurred before control was transferred from the scheduler to InfoSphere DataStage. (After that point, messages are written to the appropriate job log file.) The schedule log contains a message if, for example, the server password that you specified has expired.
Testing user name and password

If you specified a user name and password to run all scheduled jobs in a project, you can test the user name and password.
Symptoms
Causes
The user ID used to run the schedule service has invalid user name or password details.
Environment

If the test works correctly but scheduled jobs still do not run, check that the user name you specified has permission to read and write the project directory. If the test fails, there might be a problem with the user rights for the user name you specified. In which case, check the user rights. To test the user name and password: 1. Open the Administrator client and attach to the engine that you are scheduling jobs for. 2. Click the Projects tab and select a project from the list. 3. Click Properties.
4. 5. 6. 7.
Click the Schedule tab. Enter the user name and password to test. Click Test. Wait for the user name and password to be verified (this might take some time).
Checking user rights

If the Windows Schedule service on the engine tier host does not run under the default user name, try this procedure to ensure that the Schedule service has the correct user rights:
Symptoms
Causes
The user running the schedule service does not have sufficient user rights.
Environment

To check user rights and allocate new rights, if required: 1. From the Windows engine tier host, select Start > Programs > Administrative Tools > Local Security Policy. 2. Open the Local Policies folder and select User Rights Assignment. 3. In the list on the right of the Local Security Settings window, double-click Act as part of the operating system. 4. In the Properties window, check if the user name for the schedule server is included in the list. 5. If the user name is missing, click Add User or Group, add the user name in Enter the object name to select field and click OK. 6. In the list on the right of the Local Security Settings window, double-click Replace a process level token. 7. In the Properties window, check if the user name for the schedule server is included in the list. 8. If the user name is missing, click Add User or Group, add the user name in Enter the object name to select field and click OK. 9. In the list on the right of the Local Security Settings window, double-click Increase scheduling priority. 10. In the Properties window, check if the user name for the schedule server is included in the list. 11. If the user name is missing, click Add User or Group, add the user name in Enter the object name to select field, and click OK.
Resolving problems using the scheduler on non-English language systems

You might to need to localize the names of the days of the week if you schedule jobs on a non-English language system.
Symptoms
Causes
The AT command, which performs the Windows scheduling, only accepts day names in the local language.
Environment

If you run IBM InfoSphere DataStage on a system with a language other than English, and encounter problems when scheduling jobs to run on specific days of the week, you can try localizing the days of the week for each project. To localize the names of the days: 1. Navigate to the project directory for your first project, which is located on the engine. By default the project directory is the folder C:\IBM\InformationServer\ Server\Projects. 2. Edit the file DSParams by using a text editor such as Notepad. 3. Add the localized days of the week to the end of the file. The following example shows what you might add for a French system:
[SCHEDULER] MONDAY=L TUESDAY=M WEDNESDAY=ME THURSDAY=J FRIDAY=V SATURDAY=S SUNDAY=D
You might have to experiment with which day names the local AT command will accept. If in doubt, enter the full name (for example, LUNDI, MARDI, and so on). 4. Repeat these steps for each of your projects. You might find that you get an error message equivalent to There are no entries in the list' when you use the scheduler on a non-English language system. This message is output by the AT command and passed on by the Director client. To prevent the Director client from passing on the message: 1. Identify a unique part of the message that the AT command is outputting (for example, est vide' in French). 2. For each project, add the following line to its DSParams file:
NO ENTRIES=est vide
The AT command usually accepts other keywords besides days of the week in English. If your system does not accept other keywords, you can add localized versions of the additional keywords NEXT, EVERY, and DELETE to your projects by doing the following tasks: 1. Edit the DSParams file for each project. 2. For each keyword, add a line of the form:
KEYWORD=localized_keyword
For example:
NEXT=Proxima
Resolving scheduling problems on UNIX and Linux servers

On UNIX servers, the scheduling of IBM InfoSphere DataStage jobs is handled by the at and cron commands.
If your scheduled job did not run, there are a number of steps that you can take to identify the cause.
Viewing scheduled jobs

On UNIX servers, you can view only jobs that you scheduled yourself.
Symptoms
Administrator cannot see all the jobs that the users have scheduled.
Environment
This advice applies to the UNIX environment.

For an IBM InfoSphere DataStage administrator, the problem means that it is not possible to get a quick overall view of all the InfoSphere DataStage jobs that are scheduled to run over a particular period. The only way to find out which jobs are scheduled is to examine the files in the cron directory for each user ID. The naming and location of these files varies from system to system. For more information, see the reference page for the cron command on your system.
Dealing with scheduled jobs not running

If a scheduled job does not run, check that the user who scheduled the job has permission to use the cron command.
Symptoms
Scheduled job does not run when expected.
Environment
This advice applies to the UNIX environment.

To check user permissions, examine the cron.allow and cron.deny files which contain lists of users who can and cannot use the command. The location of these files varies from system to system. For more information, see the reference page for the cron command on your system.
Scheduled jobs do not run on an AIX server

If scheduled jobs are not running on your IBM AIX server, check your file permissions.
Symptoms
Scheduled jobs do not run.
Environment
This advice applies to AIX servers.

To schedule jobs on an AIX server, change the permissions of /usr/ spool/cron/atjobs from 770 to 775 (rwxrwxr-x).
Resolving job termination problems

If you experience delays in the termination of an IBM InfoSphere DataStage job when it is run, clear the &PH& directory.
Symptoms
Jobs take too long to terminate.
Causes
Each InfoSphere DataStage project directory contains a &PH& directory. The &PH& directory contains information about active stages that is used for diagnostic purposes. The &PH& directory is added to every time a job is run, and needs to be cleared periodically.

To clear the directory: 1. Ensure that there are no jobs running anywhere on the system. 2. Open the Administrator client, go to the Projects page, select the project whose file you want to clear, and click Command. 3. In the Command Interface window, type the following command:
CLEAR.FILE &PH&
4. Click Execute to run the command and clear the file.
Resolving problems with database stages on 64-bit systems

If you are running jobs in a 64bit bit environment, you must ensure that any database clients that are required by connectivity stages match the installed version of InfoSphere DataStage.
Symptoms
Failure of the stage with symptoms such as a memory fault and corresponding core dump.
Causes
If you are running a 64bit version of InfoSphere DataStage, you must ensure any database clients you use are also 64bit. If you are running a 32bit version of InfoSphere DataStage, you must ensure any database clients you use are also 32bit. For example, Oracle database is available with both 32- and 64- bit clients. You must use the 32-bit client with 32-bit InfoSphere DataStage, and the 64-bit client with 64-bit InfoSphere DataStage.
Environment
Applies to 64bit UNIX, Linux, or Windows environments.
Resolving ODBC connection problems on UNIX and Linux systems

IBM InfoSphere DataStage relies on third-party ODBC drivers to connect to ODBC data sources. There are various steps to take to diagnose and fix problems.
Testing ODBC driver connectivity

You can test the whether your ODBC drivers can successfully connect to your data sources.
Symptoms
10
If a job fails to connect to a data source using an ODBC connection, test the connection outside the job to see if the ODBC connection is the source of the problem.
Environment
The procedure applies to ODBC connections in a UNIX environment.

To test the connectivity of your ODBC connections: 1. Change directory to $DSHOME and set up the IBM InfoSphere DataStage environment by running dsenv:
. ./dsenv
2. Start the engine shell:

./bin/dssh
3. In the engine shell, log to the project:

LOGTO project_name
4. Get a list of available DSNs by typing:

DS_CONNECT
5. Test the required connection by typing:

DS_CONNECT DSN
Where DSN specifies the connection that you want to test. 6. Enter the user name and password to connect to the required data source. 7. After you have connected to the data source, enter .Q to close the connection.
Checking the shared library environment

Connection errors can be caused by incorrect environment settings.
Symptoms
Cannot connect to database using ODBC connection.
Environment
This problem occurs when using ODBC connections in a UNIX environment.

If you see a message similar to the following message:
ld.so.1: uvsh: fatal: libxxxx: cant open file: errno=2
check that the ODBC driver shared library has been added to the environment variable used to locate shared libraries

When the ODBC access is configured for DataStage, entries specifying the environment are added to the file $DSHOME/dsenv. Check the dsenv file to ensure that your environment is configured correctly. The name of the shared library environment variable that you need to check depends upon the type of UNIX system. The required entry depends upon the type of database that you are attempting to connect to. The environment variables for the UNIX platforms are in the following table. Consult your database documentation for the location of the shared libraries.
11
Table 2. Library path environment variables Platform Solaris HP-UX HP-UX Itanium AIX Linux Environment variable LD_LIBRARY_PATH SHLIB_PATH LD_LIBRARY_PATH LIBPATH LD_LIBRARY_PATH
Checking symbolic links

If your ODBC connection does not work, check your symbolic links.
Symptoms
Cannot connect to database using ODBC connection.
Causes
If you have moved shared libraries to a new directory or have installed a new ODBC driver manager, you might have broken symbolic links that the engine uses to access the shared libraries for the source database.
Environment
This problem occurs when using ODBC connections in a UNIX environment.

To reset the symbolic links to a new directory, run this command at the UNIX prompt:
# $DSHOME/bin/dspackinst relink.uvlibs pathname
$DSHOME is the home directory of the server engine. pathname is the full path name of the directory that contains the shared libraries. To reset links for a new ODBC driver manager: 1. Install the ODBC driver manager according to the vendor's instructions. 2. Determine where the ODBC shared library libodbc.xx resides. For example, the library for the Intersolv driver is in $ODBCHOME/ dlls, and the library for the Visigenics driver is in $ODBCHOME/ libs. 3. Close all InfoSphere DataStage clients. 4. Run the relink.uvlibs command as described above. 5. Restart the InfoSphere DataStage clients.
Resolving configuration problems on UNIX systems

There are various problems that you might encounter when running IBM InfoSphere DataStage on a UNIX system.
Running out of file units

If you receive notification that jobs fail because they run out of file units you can increase the allocation of file units.
Symptoms
Jobs fail because they run out of file units.
12
Environment
This advice applies to UNIX systems.

The engine uses the parameter MFILES and the kernel parameter NOFILES to determine the number of open files allowed. The number of open files allowed is NOFILES - MFILES. If you encounter problems and run out of file units, you can decrease the value of MFILES in the server engine file uvconfig or increase the value of NOFILES for your operating system. The uvconfig file resides in the DSEngine directory. If you change the value of MFILES, you must stop and restart the engine as follows: 1. To stop the engine:
$DSHOME/bin/uv -admin -stop
2. To upgrade configuration information for the engine:

$DSHOME/bin/uv -admin -regen
3. To start the engine:

$DSHOME/bin/uv -admin -start
Ensure that you allow at least thirty seconds between executing stop and start commands.
Running out of memory on AIX computers

You can tune IBM InfoSphere DataStage to increase the configured memory on IBM AIX systems.
Symptoms
Jobs with large memory requirements cause unable to locate memory errors.
Environment
This advice applies to AIX systems.

Increase the memory allocation by doing the following steps: 1. Edit the $DSHOME/.uvconfig file in the engine directory. Make the following changes: v Change DMEMOFF to 0x90000000 v Change PMEMOFF to 0xa0000000 2. Ensure that there are no active InfoSphere DataStage users on the system, then shut down the engine:
$DSHOME/bin/uv -admin -stop
3. To upgrade configuration information for the engine, run the command:

$DSHOME/bin/uv -admin -regen
4. Add the following line to the dsenv file (in the $DSHOME directory):
LDR_CNTRL=MAXDATA=0x30000000;export LDR_CNTRL
5. Run the dsenv command to apply the new environment settings. 6. Restart the engine:
$DSHOME/bin/uv -admin -start
13
Troubleshooting Designer client errors

Use the information provided by the Designer client to handle errors.
Handling exceptions in the Designer client

The Designer client collects the relevant information for error reporting when an exception occurs.
Exception handling modes

The Designer client handles exceptions in one of these modes: Automatic error report mode Automatic error report mode is used when an exception is unexpected for the current operation. Optional error report mode Optional error report mode is used when an exception might be unexpected for the current operation. Optional error report mode is also used when it is not possible to determine if an exception is unexpected. No error report mode No error report mode is used when an exception is expected, or when a message is just a warning.
Automatic error report mode

Automatic error report mode is used when an exception is unexpected for the current operation. The Designer client creates an error report containing all the information relating to the error. The error report is a .zip file called ds_errorreport_YYMMDDHHmm.zip, where YYMMDDHHmm is the date and time of the error. The Designer client displays an Automatic error report message:
You can do the following actions on the Automatic error report message: v Click ds_errorreport_YYMMDDHHmm.zip to view the directory containing the error reports using the Windows File Explorer. v Click customized to open the Customize Report window where you can add a description of the scenario that caused the problem. v Click More to display details of the exception and the client machine. The ds_errorreport_YYMMDDHHmm.zip file contains the following information: v the original error message v the stack trace and exception details v the client machine details
14
v the Client Version.xml file v the associated dstage_wrapper_trace_NN.log file v an optional user-defined description, entered on the Customize Report window
Optional error report mode

Optional error report mode is used when an exception might be unexpected for the current operation. Optional error report mode is also used when it is not possible to determine if an exception is unexpected. The Designer client displays an Optional error report message:
You can do the following actions on the Optional error report message: v Click here to create an error report for the exception. The Customize Report window opens, where you can add a description of the scenario that caused the problem. v Click More to display details of the exception and the client machine.
No error report mode

No error report mode is used when an exception is expected, or when a message is just a warning. The Designer client displays an No error report message:
Viewing log files and error reports

View the log files and error reports that are created by the Designer client.
Viewing the current log file

Click Help > Support > View Current Log to view the contents of the current log file. The application that is registered to .log files in used.
15
Viewing the directory that contains the log files

Click Help > Support > Open Log/Trace Folder to view the directory containing the log files using the Windows File Explorer.
Viewing the directory that contains the error reports

Click Help > Support > Open Error Reports Folder to view the directory containing the error reports using the Windows File Explorer.
Troubleshooting a failure to submit jobs when you run a column analysis

Symptoms
You receive a "Failed to submit Job" error when you run Column Analysis.

If you require WLM to run in your environment, first stop it manually with the \IBM\InformationServer\Server\DSWLM\stopwlm.cmd command. This command forces WLM to shut down if it is running. Then start WLM manually with the IBM\InformationServer\Server\DSWLM\startwlm.cmd command. WLM starts up with correct character set, which resolves the error. If you do not require Workload Management (WLM), you can shut it down to eliminate the error. Edit the configuration file at \IBM\InformationServer\Server\ DSODB\DSODBConfig.cfg, and change the WLMON setting to WLMON=0. Then issue the net stop DSEngine net start DSEngine command to restart the DataStage Engine without starting WLM, which resolves the error.
Troubleshooting login failures

When you encounter client login failures, you must analyze the stack trace to determine the reason why the login failed.
Client side login failures

You received the following error: Failed to authenticate current user against selected Domain Could not connect to server [rmanikon-2] on port [9080]
Stack Trace
The error indicates that your client cannot connect to the Information Server Services Tier (domain) server. There are many reasons that can cause this problem to occur. It can be as simple as an invalid server name or port number. Click the More button to get a stack trace for the error.
javax.security.auth.login.LoginException: Could not connect to server [RMANIKON-2] on port [9081]. at com.ascential.acs.security.auth.client.AuthenticationService.getLoginException (AuthenticationService.java:991) at com.ascential.acs.security.auth.client.AuthenticationService.doLogin (AuthenticationService.java:370) Caused by: com.ascential.acs.registration.client.RegistrationContextManagerException: Caught an unexpected exception. at com.ascential.acs.registration.client.RegistrationContextManager.setContext (RegistrationContextManager.java:76) at com.ascential.acs.security.auth.client.AuthenticationService.doLogin (AuthenticationService.java:364) Caused by: com.ascential.acs.registration.client.RegistrationHelperException:
16
Caught an unexpected exception. at com.ascential.acs.registration.client.RegistrationHelper.getBindingProperties (RegistrationHelper.java:672) at com.ascential.acs.registration.client.RegistrationHelper.getBindingConfigProperties (RegistrationHelper.java:566) at com.ascential.acs.registration.client.RegistrationContextManager.setContext (RegistrationContextManager.java:173) at com.ascential.acs.registration.client.RegistrationContextManager.setContext (RegistrationContextManager.java:73) ... 1 more Caused by: java.net.ConnectException: Connection refused: connect at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:391) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:252) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:239)
There are four important things to note in the stack trace. There is no text that states Trace from Server, so this means that it is a client side issue. Look at the first highlighted message in the stack trace example. It is giving the host name and port number. The second highlighted message indicates that the error happens during the RegistrationHelper call. The last thing to note is the last highlighted message indicates the root cause is a socket connection error.
Invalid server name or port

v Verify server name in DataStage login screen ping <DataStage Server> v Verify that the port number is correct View the registered-servers.xml file in the following location: InformationServer/ASBNode/eclipse/plugins/ com.ibm.isf.client_configuration_<is_ver sion>/ registered-servers.xml Find <asb-server <asb-server name="RMANIKON-2 hostname="RMANIKON-2 port="9080 is-primary-server="true" /> Client has invalid entry in hosts file for DataStage server v Check Windows\System32\drivers\etc\hosts ds_ip_address host_short_name host_long_name There are several scenarios that can cause an error could not connect to the server. The first issue can be an invalid server name or port. Verify that the server name is entered correctly in the DataStage login screen. Ping the DataStage server name from the client system to be sure that it is a valid host name. Next, be sure that the Information Server port number you are using in the DataStage login screen is correct. To check this, open the registered-servers.xml file. Look for the string <asb-server> and check the port number in the asb-server tag. Another issue could be that the client has an incorrect entry in the client hosts file. For example, the ip_address for the DataStage server in the host file might be wrong or the client does not have a valid entry for the server. If so, correct the entry in the hosts file for the DataStage server.
Causes and resolution

v Server listening port might be blocked by firewall telnet <DataStage server> <port number> Linux nc v z <host> <port> Check with network administrator to see if port 9080 is blocked v Information Server not running
17
Windows: Check IBM WebSphere Application Server status => Started UNIX or Linux: ps -ef | grep java root 25468 1 0 May 02 ? 33:33 /u1/IBM/WebSphere/AppServer/java/bin/java ... Another issue might be that the port is blocked by a firewall. You can do a quick test by trying to telnet to the host and port number. Use the command telnet <DataStage host> <port number>. If the telnet fails, then the port is most likely blocked. If you are on Linux, you might also use the nc command to see if the port is open. If the port is blocked, your administrator must open the port. The last issue might be that WebSphere Application Server is not running. For Windows, go to Services in the control panel and see if the service IBM WebSphere Application Server is started. For UNIX and Linux use the command ps ef | grep javaand check to be sure that the WebSphere process is running.
Server-side login failures

You can determine the cause of a server-side login failure by analyzing the information in the stack trace.
Server callback failure with localhost

The Trace from server message indicates that this is a server callback failure. The initial and forwarded IOR inaccessible message and client IP address are also included in the stack trace. The last message in red indicates that the exception is thrown from com.ibm.rmi.iiop.CDRInputStream.read_value. The stack trace includes host= localhost.
Trace from server: 1198777258 at host issun2 >> org.omg.CORBA.MARSHAL: Unable to read value from underlying bridge : initial and forwarded IOR inaccessible : Forwarded IOR failed with: java.net.ConnectException: Connection refused: host=localHost ,port=33507 Initial IOR failed with: java.net.ConnectException: Connection refused: host=localHost,port=33507 vmcid: IBM minor code: 89A completed: No at com.ibm.rmi.iiop.CDRInputStream.read_value (CDRInputStream.java:1993) at com.ascential.xmeta.shared.repository.core._EJSRemoteStatefulSandboxRemoteStatefulService _4baa4bb1_Tie.executeQuery__CORBA_W StringValue__ CORBA_WStringValue__com_ascential_xmeta_crud_InternalQueryOptions__com_ascential _xmeta_crud_InternalQueryCompil eOptions__java_util_Map(Unknown Source) at com.ascential.xmeta.shared.repository.core. _EJSRemoteStatefulSandboxRemoteStatefulService_4baa4bb1_Tie._invoke(Unknown Source) at com.ibm.CORBA.iiop.ServerDelegate.dispatchInvokeHandler(ServerDelegate.java:614) at com.ibm.CORBA.iiop.ServerDelegate.dispatch(ServerDelegate.java:467) at com.ibm.rmi.iiop.ORB.process(ORB.java:439) at com.ibm.CORBA.iiop.ORB.process(ORB.java:1761) at com.ibm.rmi.iiop.Connection.respondTo(Connection.java:2376) at com.ibm.rmi.iiop.Connection.doWork(Connection.java:2221) at com.ibm.rmi.iiop.WorkUnitImpl.doWork(WorkUnitImpl.java:65) at com.ibm.ejs.oa.pool.PooledThread.run(ThreadPool.java:118) at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1475)
This is a unique symptom and the root cause for this is that the WebSphere is not at a right Java level. To resolve this issue, WebSphere Java must be upgraded to Java SDK 1.4.2 SR10
18
Server callback failure ORB failure

Unique message: Read beyond end of data. No fragments available
Trace from server: 1198777258 at host green.bocaraton.ibm.com >> org.omg.CORBA.MARSHAL: Unable to read value from underlying bridge : No available data: Request 18: read beyond end of data. No fragments available. vmcid: IBM minor code: 89A completed: No at com.ibm.rmi.iiop.CDRInputStream.read_value(CDRInputStream.java:1993) at com.ascential.acs.security.auth.server. _EJSRemoteStatelessAuthenticationService_e0d03809_Tie.login(_EJSRemot eStatelessAuthenticationService_e0d03809_Tie.java:146) at com.ascential.acs.security.auth.server. _EJSRemoteStatelessAuthenticationService_e0d03809_Tie._invoke(_EJSRe moteStatelessAuthenticationService_e0d03809_Tie.java:92) at com.ibm.CORBA.iiop.ServerDelegate.dispatchInvokeHandler(ServerDelegate.java:614) at com.ibm.CORBA.iiop.ServerDelegate.dispatch (ServerDelegate.java:467) at com.ibm.rmi.iiop.ORB.process(ORB.java:439) at com.ibm.CORBA.iiop.ORB.process(ORB.java:1761) at com.ibm.rmi.iiop.Connection.respondTo(Connection.java:2376) at com.ibm.rmi.iiop.Connection.doWork(Connection.java:2221) at com.ibm.rmi.iiop.WorkUnitImpl.doWork(WorkUnitImpl.java:65) at com.ibm.ejs.oa.pool.PooledThread.run(ThreadPool.java:118) at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1475)
This is caused by a known ORB defect that can ve resolved by upgrading WebSphere Application Server with iFix: PK76826
Bad hosts file on server

com.ascential.asb.util.invocation.EntryPointInstantiationException: An error occurred trying to instantiate an object of the entry point client implementation "com.ascential.acs.security.auth.ejb.EJBAuthenticationService" at com.ascential.asb.util.invocation.ejb.EJBServiceCreator. <init>(EJBServiceCreator.java:125) at com.ascential.acs.security.auth.ejb.EJBAuthenticationService. <init>(EJBAuthenticationService.java:39) at com.ascential.acs.security.auth.JAASAuthenticationService. getAuthService(JAASAuthenticationService.java:401) at com.ascential.acs.security.auth.JAASAuthenticationService. loginImpl(JAASAuthenticationService.java:381) at com.ascential.acs.security.auth.JAASAuthenticationService. login(JAASAuthenticationService.java:160) at com.ascential.acs.security.auth.client.AuthenticationService .doLoginImpl(AuthenticationService.java:879) at com.ascential.acs.security.auth.client.AuthenticationService.doLogin (AuthenticationService.java:365) Caused by: javax.naming.NamingException: Error during resolve [Root exception is org.omg.CORBA.INTERNAL: initial and forwarded IOR inaccessible vmcid: IBM minor code: 58C completed: No] at com.ibm.ws.naming.jndicos.CNContextImpl.doLookup(CNContextImpl.java:1784 ) at com.ibm.ws.naming.jndicos.CNContextImpl.doLookup(CNContextImpl.java:1707) at com.ibm.ws.naming.jndicos.CNContextImpl.lookupExt(CNContextImpl.java:1412) at com.ibm.ws.naming.jndicos.CNContextImpl.lookup(CNContextImpl.java:1290) at com.ibm.ws.naming.util.WsnInitCtx.lookup(WsnInitCtx.java:145) at javax.naming.InitialContext.lookup(InitialContext.java:363) at com.ascential.asb.util.invocation.ejb.EJBServiceCreator. <init>(EJBServiceCreator.java:120)
In the last scenario, there is an example stack trace from the client with an error that is caused by a bad hosts file on the server. The first message in the red indicates that the error is caused at the login time by the authentication service and the last red message indicates that this happened during the server lookup.
19
Your /etc/hosts entry might use a single-line format, which can cause problems. The single-line format is similar to the following entry:
127.0.0.1 localhost.localdomain localhost machine_long_hostname machine_short_hostname
To resolve the problem, separate the /etc/hosts entry to use the following double-line format:
127.0.0.1 localhost.localdomain localhost <real ip address> machine_long_hostname machine_short_hostname
Server-rich client login failures

InfoSphere Information Server rich clients have authentication failures that occur during login.
Server not reachable

Symptoms
v The stack trace is from client side, does not include message of "Trace from server v Apparent root case is a socket connection error v Host name and port number is included in error message v Exception comes from RegistrationHelper call
Failed to authenticate the current user against the selected Domain: Could not connect to server [purple1.bocaraton.ibm.com] on port [9080]. javax.security.auth.login.LoginException: Could not connect to server [purple1.bocaraton.ibm.com]on port [9080]. at com.ascential.acs.security.auth.client.AuthenticationService.getLoginException (AuthenticationService.java:965) at com.ascential.acs.security.auth.client.AuthenticationService.doLogin (AuthenticationService.java:358) Caused by: com.ascential.acs.registration.client.RegistrationContextManagerException: Caught an unexpected exception. at com.ascential.acs.registration.client.RegistrationContextManager. setContext(RegistrationContextManager.java:67) at com.ascential.acs.security.auth.client.AuthenticationService.doLogin (AuthenticationService.java:352) Caused by: com.ascential.acs.registration.client.RegistrationHelperException: Caught an unexpected exception. at com.ascential.acs.registration.client.RegistrationHelper.getBindingProperties (RegistrationHelper.java:567) at com.ascential.acs.registration.client.RegistrationHelper.getBindingConfigProperties (RegistrationHelper.java:534) at com.ascential.acs.registration.client.RegistrationContextManager.setContext (RegistrationContextManager.java:167) at com.ascential.acs.registration.client.RegistrationContextManager.setContext (RegistrationContextManager.java:65) ... 1 more Caused by: java.net.ConnectException: Connection refused: connect at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:336) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:201) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:188).........
Causes
v Client has invalid entry in host file v Server listening port might be blocked by a firewall v Server is down
20
v Update the host file on client system so that the server host name can be resolved from client. v Make sure the WebSphere TCP/IP ports are opened by the firewall. v Make sure that the WebSphere application server is running.
Server callback failure

Symptoms
v v v v Stack trace is from server (noted in trace as "Trace from server") Error happened during initial and forwarded IOR Exception comes from com.ibm.rmi.iiop.CDRInputStream.read_value The client IP is listed
Failed to authenticate the current user against the selected Domain: CORBA MARSHAL 0x4942f89a No; nested exception is: org.omg.CORBA.MARSHAL: Trace from server: 1198777258 at host PURPLE1 >> org.omg.CORBA.MARSHAL: Unable to read value from underlying bridge : initial and forwarded IOR inaccessible:Forwarded IOR failed with: java.net.SocketException: Operation timed out: connect:could be due to invalid address:host=10.38.86.83, port=3953Initial IOR failed with: java.net.SocketException: Operation timed out: connect:could be due to invalid address:host=10.38.86.83,port=3953 vmcid: IBM minor code: 89A completed: No at com.ibm.rmi.iiop.CDRInputStream.read_value(CDRInputStream.java:1993) at com.ascential.acs.security.auth.server. _EJSRemoteStatelessAuthenticationService_e0d03809_Tie. login(_EJSRemoteStatelessAuthenticationService_e0d03809_Tie.java:146) at com.ascential.acs.security.auth.server. _EJSRemoteStatelessAuthenticationService_e0d03809_Tie. _invoke(_EJSRemoteStatelessAuthenticationService_e0d03809_Tie.java:92) at com.ibm.CORBA.iiop.ServerDelegate.dispatchInvokeHandler(ServerDelegate.java:614) at com.ibm.CORBA.iiop.ServerDelegate.dispatch(ServerDelegate.java:467) at com.ibm.rmi.iiop.ORB.process(ORB.java:439) at com.ibm.CORBA.iiop.ORB.process(ORB.java:1761) at com.ibm.rmi.iiop.Connection.respondTo(Connection.java:2376) at com.ibm.rmi.iiop.Connection.doWork(Connection.java:2221) at com.ibm.rmi.iiop.WorkUnitImpl.doWork(WorkUnitImpl.java:65) at com.ibm.ejs.oa.pool.PooledThread.run(ThreadPool.java:118) at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1475) << END server: 1198777258 at host PURPLE1 vmcid: IBM minor code: 89A completed: No javax.security.auth.login.LoginException: CORBA MARSHAL 0x4942f89a No; nested exception is: org.omg.CORBA.MARSHAL:
Causes
v The client IP address is listed in the stack trace, and is not reachable from the server v The client port is blocked

Upgrade to Version 8.1 refresh 1 or later.
Server callback failure with localhost

Symptoms
The stack trace includes host=localHost.
Trace from server: 1198777258 at host issun2 >> org.omg.CORBA.MARSHAL: Unable to read value from underlying bridge : initial and forwarded IOR inaccessible: Forwarded IOR failed with: java.net.ConnectException: Connection refused:host=localHost,port=33507 Initial IOR failed with: java.net.ConnectException: Connection refused:host=localHost,port=33507 vmcid:
21
IBM minor code: 89A completed: No at com.ibm.rmi.iiop.CDRInputStream.read_value(CDRInputStream.java:1993) at com.ascential.xmeta.shared.repository.core. _EJSRemoteStatefulSandboxRemoteStatefulService_ 4baa4bb1_Tie.executeQuery__CORBA_WStringValue __CORBA_WStringValue__com_ascential_ xmeta_crud_InternalQueryOptions__com_ascential _xmeta_crud_InternalQueryCompileOptions__ java_util_Map(Unknown Source) at com.ascential.xmeta.shared.repository.core. _EJSRemoteStatefulSandboxRemoteStatefulService_ 4baa4bb1_Tie._invoke(Unknown Source) at com.ibm.CORBA.iiop.ServerDelegate.dispatchInvokeHandler (ServerDelegate.java:614) at com.ibm.CORBA.iiop.ServerDelegate.dispatch (ServerDelegate.java:467) at com.ibm.rmi.iiop.ORB.process(ORB.java:439) at com.ibm.CORBA.iiop.ORB.process(ORB.java:1761) at com.ibm.rmi.iiop.Connection.respondTo(Connection.java:2376) at com.ibm.rmi.iiop.Connection.doWork(Connection.java:2221) at com.ibm.rmi.iiop.WorkUnitImpl.doWork(WorkUnitImpl.java:65) at com.ibm.ejs.oa.pool.PooledThread.run(ThreadPool.java:118) at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1475) <<END>>
Causes
The WebSphere Application Server SDK is outdated

Install the WebSphere Java SDK 1.4.2 SR10
Server callback failure with a WebSphere Application Server SDK that is outdated
Symptoms
The stack trace includes the following message. Read beyond end of data. No fragments available
Trace from server: 1198777258 at host green.bocaraton.ibm.com >> org.omg.CORBA.MARSHAL: Unable to read value from underlying bridge : No available data: Request 18:read beyond end of data. No fragments available. vmcid: IBM minor code: 89A completed: No at com.ibm.rmi.iiop.CDRInputStream.read_value(CDRInputStream.java:1993) at com.ascential.acs.security.auth.server. _EJSRemoteStatelessAuthenticationService_e0d03809_ Tie.login(_EJSRemoteStatelessAuthenticationService_e0d03809_Tie.java:146) at com.ascential.acs.security.auth.server. _EJSRemoteStatelessAuthenticationService_e0d03809_ Tie._invoke(_EJSRemoteStatelessAuthenticationService_e0d03809_Tie.java:92) at com.ibm.CORBA.iiop.ServerDelegate.dispatchInvokeHandler(ServerDelegate.java:614) at com.ibm.CORBA.iiop.ServerDelegate.dispatch(ServerDelegate.java:467) at com.ibm.rmi.iiop.ORB.process(ORB.java:439) at com.ibm.CORBA.iiop.ORB.process(ORB.java:1761) at com.ibm.rmi.iiop.Connection.respondTo(Connection.java:2376) at com.ibm.rmi.iiop.Connection.doWork(Connection.java:2221) at com.ibm.rmi.iiop.WorkUnitImpl.doWork(WorkUnitImpl.java:65) at com.ibm.ejs.oa.pool.PooledThread.run(ThreadPool.java:118) at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1475) << END server: 1198777258 at host green.bocaraton.ibm.com
Causes
The ORB data is fragmented, which is a known issue.

v Upgrade to Version 8.1 Refresh 1 or later
22
v Install WebSphere Application Server iFix PK77267
Invalid access information from server

Symptoms
v The stack trace is from client side, it has no message of "Trace from server v The exception comes from com.ibm.ws.naming.jndicos.CNContextImpl.doLookup, which is the second level on stack trace. v The exception comes from com.ascential.asb.util.invocation.ejb.EJBServiceCreator, which is the third level on stack trace. v The first method of the stack trace is com.ascential.acs.security.auth.client.AuthenticationService.doLogin
com.ascential.asb.util.invocation.EntryPointInstantiationException: An error occurred trying to instantiate an object of the entry point client implementation "com.ascential.acs.security.auth.ejb.EJBAuthenticationService" at com.ascential.asb.util.invocation.ejb.EJBServiceCreator.<init> (EJBServiceCreator.java:125) at com.ascential.acs.security.auth.ejb.EJBAuthenticationService.<init> (EJBAuthenticationService.java:39) at com.ascential.acs.security.auth.JAASAuthenticationService.getAuthService (JAASAuthentication Service.java:401) at com.ascential.acs.security.auth.JAASAuthenticationService.loginImpl (JAASAuthenticationService. java:381) at com.ascential.acs.security.auth.JAASAuthenticationService.login (JAASAuthenticationService. java:160) at com.ascential.acs.security.auth.client.AuthenticationService.doLoginImpl (AuthenticationService. java:879) at com.ascential.acs.security.auth.client.AuthenticationService.doLogin (AuthenticationService. java:365) Caused by: javax.naming.NamingException: Error during resolve [Root exception is org.omg.CORBA.INTERNAL: initial and forwarded IOR inaccessible vmcid: IBM minor code: 58C completed: No] at com.ibm.ws.naming.jndicos.CNContextImpl.doLookup(CNContextImpl.java:1784) at com.ibm.ws.naming.jndicos.CNContextImpl.doLookup(CNContextImpl.java:1707) at com.ibm.ws.naming.jndicos.CNContextImpl.lookupExt(CNContextImpl.java:1412) at com.ibm.ws.naming.jndicos.CNContextImpl.lookup(CNContextImpl.java:1290) at com.ibm.ws.naming.util.WsnInitCtx.lookup(WsnInitCtx.java:145) at javax.naming.InitialContext.lookup(InitialContext.java:363) at com.ascential.asb.util.invocation.ejb.EJBServiceCreator. <init>(EJBServiceCreator.java:120) ... 6 more Caused by: org.omg.CORBA.INTERNAL: initial and forwarded IOR inaccessible vmcid: IBM minor code: 58C completed: No at com.ibm.rmi.corba.ClientDelegate.createRequest(ClientDelegate.java:1213) at com.ibm.CORBA.iiop.ClientDelegate.createRequest(ClientDelegate.java:1320)........
Causes
v Some Linux computers automatically configure the host file with the following entry: 127.0.0.1 localhost.localdomain local host machinelonghostname machineshorthostname v The server has more than one IP address, and one IP address is not accessible from the client

23
Ensure that the host name of each endpoint in WebSphere Application Server is resolved as a client-accessible IP address. The WebSphere Application Server endpoint configuration can be found from WebSphere administrator console: Servers -> Application servers -> server1 -> Ports. The server name specified on the endpoint must be resolved as a client-accessible IP address. IP address 127.0.0.1 or IP address 192.168.x.x are normally not accessible.
Troubleshooting job design issues IBM InfoSphere DataStage Error: Job xxx is being accessed by another user
A job can be accessed only by one user at a time.
Symptoms
You are unable to view a job, and receive the following error message.
Error: Job xxx is being accessed by another user
Causes
The job that you are trying to view is currently being accessed by another user.

Perform the following actions: 1. In web console check active sessions and if job is listed disconnect session. In the web console, click the Administration tab. In the Navigation pane, select Session Management > Active Sessions. The Active Sessions pane shows the users that are currently connected to the server. 2. Check for and clear any Universe locks: a. Start the DataStage Director b. Go to Select Job > Cleanup Resources. If the Cleanup Resources option is disabled, skip to the step below where you launch the DataStage Administrator client. c. In the Processes pane, click Show All d. In the Locks pane, click Show All e. In the Locks pane, scroll to the job name in the Item ID field f. Note the PID/User # associated with the job g. h. i. j. k. Click the PID # in the upper pane (Processes) Click Show by process (Locks pane) Launch the DataStage Administrator In the Projects tab, highlight the job Click Properties
l. Check Enable job administration in Director m. Click OK n. o. p. q. r. Click Close Exit DataStage Director and relaunch Repeat steps C through I. Log in to the server as the dsadm user cd to the DSEngine directory
24
Enter ./dsenv to source the dsenv file Enter ./bin/uvsh to get into DataStage prompt At ">" DataStage engine prompt, enter LOGTO project name Run LIST.READU EVERY to list all the locks Check active record locks under "Item Id" column for job name or RT_CONFIG# or RT_LOG# (# matches the job description number x. Write down the Inode numbers and user numbers associated with these locks y. Enter LOGTO UV. If the LOGTO command is disabled, enter the following command: s. t. u. v. w.
CHDIR path_to_the_DSEngine_folder
The UNLOCK command lives in the UV account. z. Enter UNLOCK INODE inode# USER user# ALL aa. You can use Q to get out of DataStage engine 3. Use cleanup_abandoned_locks utility to clear any abandoned locks. The cleanup_abandoned_locks utility deletes session locks from the Information Server repository that were left over from some usage of an Information Server suite application such as DataStage. Log in to the domain layer as either the root or Administrator user.
cd /opt/IBM/InformationServer/ASBServer/bin ./cleanup_abandoned_locks.sh (on Unix/Linux) ./cleanup_abandoned_locks.bat (on Windows) usage: cleanup_abandoned_locks -P,--password Password -U,--user User name -h,--help Print this message.
DataStage Parameter Set - Parameter Set locked by non-existent user

Symptoms
The user receives the following error message.
Unable to edit the Item Properties The Parameter Set <parametersetname> is locked by user <username> The user was no longer connected to DataStage. No session found in the Web Console for the user.
Causes
The user was no longer connected to DataStage. No session found in the Web Console for the user.

To clear an abandoned lock, run the cleanup_abandoned_locks.sh script found in the ..IBM/InformationServer/ASBServer/bin directory.
Cannot get exclusive access to the log for a job.

If you cannot get exclusive access to a log for a job, you are unable to delete the job.
Symptoms
The user deletes a job from the DataStage Designer, and receives the following error:
25
Unable to delete the item(s). Delete object for \<path>\<jobname> failed. Cannot get exclusive access to log for job <jobname>
Causes
A lock remains on the RT_LOG file for the job.

Make sure that the job is not running. If the job is running, stop the job if possible. From DataStage Director, select the job, then Job menu -> Reset Try to delete the job again from Designer. If the delete still fails, manually delete the lock from uvsh with the following instructions where '$' is the OS prompt, and '>' is the uvsh prompt. That is, you do not type them. 1. cd to $DSHOME, which is ../IBM/InformationServer/Server/DSEngine by default. Source the dsenv file with: the following command: $ . ./dsenv 2. Start uvsh with the $ bin/uvsh command. In uvsh, the shift key is not needed to produce uppercase character, but it is used to produce lowercase characters. The role of the shift key is reversed from normal. As a result, copy-and-paste behaves strangely and is not useful. 3. At the UV prompt, log on to the project. The project name is case-sensitive. Use the following command: LOGTO projectname 4. Find out what the job description number is with the LIST DS_JOBS jobname command. 5. List the active locks with the LIST.READU EVERY command. Check the active record locks under the Item Id column for the job name. You can also use RT_CONFIG#or RT_LOG# where # is the job description number. 6. Write down the inode number and user number for the lock that is not a valid lock. 7. Log in to the UV account where the UNLOCK command lives. 8. Run the following command: UNLOCK INODE inode# USER user# ALL 9. Type Q to logout of uvsh. 10. Delete the job from the Designer.
Troubleshooting problems when creating InfoSphere DataStage projects

Symptoms
Creating a project fails during the installation of InfoSphere Information Server, when using the Administrator client, or when using the dsadmin command.
Causes
v Incorrectly configured repository database v Leftover metadata in repository database from a previously failed project creation v Unable to create log file on the DataStage Server v Incorrectly configured locale on the DataStage Server v Incorrectly configured locale on the DataStage Server v Failed to load JVM into the DataStage Server process (dsapi_slave) v Firewall configuration v Trusted authentication between DataStage Server system and the Domain system failed
26
v v v v v
DataStage was not installed on the Domain system Locale regional settings customized on the Client system Disk / partition full or user quota reached on DataStage Server system Project creation fails at "Initializing demo files..." within the Administrator client Stack Execution Disable (SED) is enabled (AIX only)
v Unable to increase the table space for the metadata repository (XMETA) v Error updating secondary indexes

The dstage_wrapper_trace_N.log indicates where the problem lies. These log files might require customer support, but some errors can be interpreted to attempt further diagnostics. Locate the error message. The error message location is based on how the project creation was started. Installation Search in the installation logs (for example, /opt/IBM/InformationServer/ logs/) for "RUN BP DSR_QUICKADD.B". The error message is a few lines that follow this text. Administrator client The error message is displayed in a message box. dsadmin command The error message is displayed on the console. After you locate the error message, there might be more detailed error information in the following log file on the DataStage Server system in the following variable path: /home/_Credential_Mapped_Username_/ds_logs/dstage_wrapper_trace_N.log v N is a number from 1 to 20 for the log files for the last 20 DataStage sessions v _Credential_Mapped_Username_ is the DataStage Server user name that credentials are mapped to in Web Administration console. These logs go to ${user.home}/ds_logs where ${user.home} is the home directory of the credential mapped user or the user that runs the installation for projects that are created during installation. On Windows systems, the user home directory is C:\Documents and Settings\_Credential_Mapped_UserName_ Sometimes tracing the dsrpcd and child processes can produce useful information from which IBM support can narrow down the causes of a project creation problem. Operating systems differ in how to trace processes but a few examples are shown here:
AIX: "truss -o /tmp/dsrpcd_truss.out -fp <dsrpcdPID>"RedHat: "strace -fp <dsrpcdPID> > /tmp/dsrpcd_strace.out"
These commands attach to the running dsrpcd process and record all of the system calls that are made by that process and its children during subsequent client-server sessions. For example, a call to create a project from the Administrator client or the dsadmin command line is recorded. To produce extra diagnostic information for the JVM initialization after all of its libraries are successfully loaded you can enable JVM startup tracing. Add the following lines to /opt/IBM/InformationServer/Server/DSEngine/dsenv:
27
XMOG_TRACE_LEVEL=TraceVerbose XMOG_TRACE_FILE=/tmp/xmogtrace.txt export XMOG_TRACE_LEVEL export XMOG_TRACE_FILE
On Windows these tracing options can be set as System Environment Variables by using the System Control panel. Remember to restart the DataStage Server engine processes after adding these variables, and to remove these environment variables after they are no longer needed. Enable repository database tracing. To enable tracing of the code that populates the repository database follow these steps: 1. Create a file on the DataStage Server system in /opt/IBM/InformationServer/ ASBNode/conf/ called NewRepos.debug.properties. The file name is case sensitive. 2. In the file, add the following three lines:
log4j.logger.com.ascential.dstage=DEBUG log4j.logger.com.ibm.datastage=DEBUG NewRepos.spy.trace=true
The dstage_wrapper_trace_N.log will then contain extra tracing information the next time a project creation is attempted. Ensure that you delete the NewRepos.debug.properties file when finished. In addition, spy trace files, such as dstage_wrapper_spy_N.log, are produced in the same directory as the log files. These files contain a detailed record of low-level method calls and can grow large. Running project creation manually. The project creation code runs in the context of a dsapi_slave process which does not have any console output. Locate the full "RUN BP DSR_QUICKADD.B" command line from the domain installation log files on /opt/IBM/InformationServer/logs/. Use the following commands to run the project creation code so that you can view the console output: Linux and UNIX 1. cd /opt/IBM/InformationServer/Server/DSEngine 2. 3. 4. 5. ./dsenv bin/uvsh RUN BP DSR_QUICKADD.B <<i>arguments from log file</i>> QUIT
Windows 1. cd /opt/IBM/InformationServer/Server/DSEngine 2. bin/uvsh 3. RUN BP DSR_QUICKADD.B <arguments from log file> <newProjectName> C:\IBM\InformationServer\Server\Projects\<newProjectName> CREATE 4. QUIT

Incorrectly configured repository database. 8.0.x message: "Error creating DR elements, Error was -1", "Invalid node name: %1"
28
8.1 and later message: "DSR.ADMIN: Error creating DR elements, Error was Unique constraint violation." These types of errors usually occur because the repository database returns an error when attempting to make an update. The dstage_wrapper_trace_N.log file may contain more specific details about the exact database error. There might also be a database log, depending on what type of database the repository is running in, which contains more information. For example, DB2 has the db2diag tool which can be run to find out the exact reason why an update failed. Typical failures are: out of disk space, memory configuration problems and so on For repository database errors it is important to confirm that the database was created with the scripts supplied on the installation media. These scripts configure important database parameters which if missed might cause project creation problems. It is also important that the database was created with the correct character set, as per the database creation script documentation on the installation media (typically UTF16/32). If a different character set was used some of the metadata stored can become corrupted or might cause unexpected primary key violations. If the wrong character set was used the product needs to be reinstalled. For errors at this level the WebSphere Application Server logs might contain additional information. The files SystemOut.log and SystemErr.log can be found in the following directory: ...WebSphere/AppServer/ profiles/_profile_name_/logs/server1/ Leftover metadata in repository database from a previously failed project creation 8.0.x message: "Error creating DR elements, Error was -1" This problem occurs only on 8.0.x systems and can be identified by looking in the dstage_wrapper_trace_N.log file for a "unique constraint violation" error. This can occur when a project creation failed and did not remove all of its metadata from the repository. Even though the project cannot be seen in DataStage, attempting to create a project of the same name results in this error. To work around this problem, you can simply create a project with a different name. Alternatively, IBM support can provide a tool and instructions for how to remove the leftover data from the repository. Unable to create log file on the DataStage Server 8.0.x message: "Error creating DR elements, Error was -1" 8.1 and later message: DSR.ADMIN: Error creating DR elements, Error was log4j:ERROR setFile(null,true) call failed. Just before the metadata repository is populated with the default project contents, a log file is created on the DataStage Server system in /home/_Credential_Mapped_Username_/ds_logs/. If this log file cannot be created, the project creation will fail. On Windows computers, the user home directory is usually C:\Documents and Settings\ _Credential_Mapped_UserName_
29
The usual reasons why this log was not created are either because the user has no home directory at all or they do not have appropriate permissions on it. Incorrectly configured locale on the DataStage Server 8.0.x message: "Error creating DR elements, Error was -1" 8.1 and later message: "DSR.ADMIN: Error creating DR elements, Error was Unmatched quotation marks" This problem is ultimately caused by bad locale configuration on the DataStage Server system. This manifests itself because the "hostname" command is run during project creation, and instead of returning the correct host name it returns a string such as "couldn't set locale correctly". Failed to load JVM into the DataStage Server process (dsapi_slave) 8.0.x message: "(The connection is broken (81002))" The JVM (Java Virtual Machine) can fail to load for several reasons. If it does fail to load, the dsapi_slave process is terminated, which results in broken connection errors on the client such as error 81002. A core file might be produced which can be used to determine what caused the process to be terminated. Possible causes of this problem are: v The LIBPATH (or equivalent) is too long and caused a buffer overflow. This can be confirmed by using the Administrator client to run the "env" command with the Command button. If the contents of LIBPATH are duplicated then it is probable that dsenv are sourced twice. The dsenv does not need to be sourced when starting the DataStage Server engine processes with the uv -admin -start command. v Incompatible or missed patches on the Client, Server, and Domain systems. By looking in the version.xml file of each system you can confirm what patches are installed. Ensure that patches are installed on all appropriate systems. v Environment variables such as LDR_CNTRL were added or modified in the IBM/InformationServer/Server/DSEngine/dsenv file. Generally speaking LDR_CNTRL settings in dsenv must not be modified unless otherwise directed by IBM. v Incompatible operating system kernel parameters. Firewall configuration 8.0.x message: "Error creating DR elements, Error was -1" 8.1 and later message: DSR.ADMIN: Error creating DR elements, Error was com.ascential.xmeta.exception.ServiceException The DataStage Server system needs to communicate with the domain system, which means that certain ports need to be open between these systems if they are on separate machines. This problem can be confirmed by looking in the dstage_wrapper_N.log file for the following error: Connection refused:host=<hostname>,port=2809. Ensure that the firewall is correctly configured and use telnet <hostname> <port> from the DataStage Server machine to confirm the port is accessible. The necessary firewall configuration can be found in the installation guide.
30
Trusted authentication between DataStage Server system and the Domain system failed 8.0.x message: "Error creating DR elements, Error was -1" 8.1 and later message: "DSR.ADMIN: Error creating DR elements, Error was Mapping failed to copy attributes: MetaTable -> DSTableDefinition (EObject: null, MetaTable)" The DataStage Server system authenticates with the Domain system by a process called trusted authentication. This process uses a secure certificate exchange rather than explicit user name and password authentication. If the process fails, the project creation fails. Trusted authentication failure is identified by multiple exceptions in the DataStage Server ds_logs that says "Null session". This can fail for a number of reasons: v If the DataStage Server is installed onto a Windows system (say C:\IBM\InformationServer), installing the clients into a different directory (say C:\IBM\InformationServer2) causes the certificate exchange to fail, ultimately causing the project creation to fail. See technote #1409412 and APAR JR34441 for more information. v The number of trusted sessions reaches a maximum limit, so a new session cannot be started. This is identified by an entry in the WebSphere logs that says the limit is exceeded. If so, restarting WepSphere Application Server clears everything so that new sessions can be created and project creation can succeed. DataStage was not installed on the Domain system 8.0.x message: "Error creating DR elements, Error was -1" 8.1 and later message: "DSR.ADMIN: Error creating DR elements, Error was Mapping failed to copy attributes: MetaTable -> DSTableDefinition (EObject: null, MetaTable)" When installing the Domain and DataStage Server onto different physical systems, the installation of DataStage Server fails to create projects specified in the installer if DataStage is not installed onto the Domain. These errors can be found in the installation logs. Furthermore, attempting to create projects that use the Administrator client or command line fails. In both these cases, the exceptions state that The package with URI "http:///1.1/DataStageX.ecore" is not registered. DataStage can be added to the Domain system by rerunning the installer, selecting DataStage and clearing the other components. Locale or regional settings customized on the Client system 8.0.x message: "Error creating DR elements, Error was -1" 8.1 and later message: "Invalid Node Name %1" If the regional language settings are modified to use a customized short date format (for example "ddd dd/MM/yyyy") it can cause the DataStage Administrator client to send the wrong date information to the DataStage Server, causing project creation to fail. A patch for this issue is available under APAR JR34770. Disk / partition full or user quota reached on DataStage Server system 8.0.x message: "Error creating DR elements, Error was -1"
31
8.1 and later message: "DSR.ADMIN: Error creating DR elements, error was log4j: ERROR failed to flush writer." The project creation operation creates a log file on the DataStage Server system, called dstage_wrapper_trace_N.log, in the path indicated at the beginning of this document. The log creation fails when the disk /partition is full or the user to which credentials are mapped to reaches their disk quota. Free up space as necessary and try the operation again. Project creation fails at "Initializing demo files..." within the Administrator client 8.5 message: "Errors were detected during project creation that might render project <name> unstable. Caused by: DSR.ADMIN: Error creating DR elements, Error was <date timestamp> java.utils.prefs.FileSystemPreferences$2 run. This error states that there was a problem with being able to write Java preference data. One of the following items causes these problems: v SE (Security Enhanced) Linux is enabled. v The user ID that is trying to create the project does not have a local home directory to write to If SELinux is enabled, disable it. To determine if SELinux is installed and in enforcing mode, you can do one of the following actions: v Check the /etc/sysconfig/selinux file v Run the sestatus command v Check the /var/log/messages file for SELinux notices (Notice format might differ between RHEL 4 and RHEL 5.) To disable SELinux, you can do one of the following actions: v Set it in permissive mode and run the setenforce 0 command as a superuser v Modify /etc/sysconfig/selinux and reboot the machine If there is no home directory for the user ID, create a local home directory with write permissions. (766) and have the group as part of the local dstage group. Stack Execution Disable (SED) is enabled (AIX only) 8.0.x message: "Error creating DR elements, Error was -1" 8.1 and later message: "Unable to confirm the JVM can be loaded into the DataStage server process 'DSR_CREATE.PROJECT.B TestJVM' failed" If Stack Execution Disable (SED) is enabled in AIX, the JIT compiler fails when trying to run code it generated in the process data area. This occurs with all of the DataStage executable items that have embedded JVMs. The solution to this problem is to turn off the SED at the system level and reboot the machine. To turn off the SED use the command: sedmgr -m off Unable to increase the table space for the metadata repository (XMETA) 8.0.x message: "Error creating DR elements, Error was -1" 8.1 and later message: "DSR.ADMIN: Error creating DR elements, Error was unable to save"
32
The SystemOut.log might show the following error:

Oracle}ORA-01653: unable to extend table XMETAUSER.LOGGING_LOGGINGEVENT1466CB5F by 128 in tablespace XMETA
If DB2 is used for the metadata repository (XMETA) look in the <db2instance_home>/sqllib/db2dump/db2diag.log for errors. To resolve this problem, increase the table space, and try the operation again. It might be necessary to manually delete any partially created project, which can be done by following the material here: http://www-01.ibm.com/support/docview.wss?uid=swg27021312 Error updating secondary indexes Error message: "DSR.ADMIN: Error updating secondary indexes. Status code = -135 DSJE_ADDPROJECTFAILED" A known cause for the error error updating secondary indices is one or more missing I_* directories in the /opt/IBM/InformationServer/Server/ Template directory. If there is another DataStage engine installation (of the same version and patch level) available it is possible copy the Template directory from the working engine and use it to replace the Template directory on the broken engine. However be careful the backup the existing Template directory first. If a template directory is taken from a working engine of a different patch level, some of the patches on the broken engine might be rendered ineffective.
Troubleshooting job failures

Jobs can fail to start for several reasons, but you can determine the cause and resolve the problem.
Low system resource issues

Resources unavailable to create a DataStage parallel job
DataStage is unable to create parallel jobs when system resources are limited.
Symptoms
You receive the following message
DataStage parallel job fails with fork() failed, Resource temporarily unavailable
Causes
This error occurs when the operating system is unable to create all of the processes that are needed for the job at run time. Unfortunately, the exact reason for the failure is not available. This problem occurs on UNIX and Linux platforms for the following reasons: v The maximum process limit is reached v The kernel or the maximum open file limit is reached v The swap space allocation or pre-allocation is exceeded

The exact methods for identifying and modifying process and file limits vary between the different UNIX and Linux platforms. The system administrator for the operating system must be engaged to assist with this. If you are unable to adjust these limits, or if the limits are at maximum for your environment and the error still occurs, then the job run time is too large for this
33
environment and you need to scale back the job run time. This might reduce performance, but allow the job to complete. To reduce the number of processes, use the following methods: v Reduce the number of logical nodes in the APT_CONFIG_FILE v Ensure that APT_DISABLE_COMBINATION is not set The following command can be used on the AIX platform to see the current value of maxuproc: lsattr -E -l sys0 | grep maxuproc A reasonable setting for environments that are running large jobs would be MAXUPROC = 1000. To optimize this value, you can monitor the number of processes over time that are running daily, and then set the value appropriately. Here is some sample shell script code that you can use to monitor the number of processes that belongs to the 'dsadm' user. The script loops 365 times and take a measurement every 5 seconds.
#!/bin/sh COUNTER=360 rm dsadm_count.txt until [ $COUNTER -lt 0 ]; do let COUNTER-=1 sleep 5 date >>dsadm_count.txt ps -ef | grep dsadm |wc -l >> DSADM_uproc_values.txt done
There are special considerations for the Windows platform. Read the related technotes for tuning Windows environments for large jobs. Tuning Windows Environments
DataStage error "The section leader on xxx died"

This error occurs when resources are temporarily unavailable and the conductor process times out.
Symptoms
The DataStage job aborts with the following message:
main_program: The section leader on xxx died.
Causes
This section leader xxx died error is related to temporary resource non-availability. The conductor process is timing out because it did not receive an acknowledgment from the player process that it started successfully.

Review the system logs to determine if there were any system issues during the time period when the DataStage job ran.

You can avoid this problem by setting the APT_PM_NODE_TIMEOUT environment variable to a higher value. The APT_PM_NODE_TIMEOUT environment variable controls the number of seconds that the conductor waits for a section leader to start and load a score before determining that something failed. Setting this environment variable to a higher value allows the conductor to wait longer before timing out and might help to avoid this error in the future. The default for starting a section leader process is 30. The default for loading a score is 120.
34
Set the APT_PM_NODE_TIMEOUT to 300. This resolution might resolve the error message but might not resolve availability of resources on the system. See this note from the 'Parallel Job Advanced Developers Guide' on this environment variable: APT_PM_NODE_TIMEOUT The need for long timeouts in the job startup process shows that the engine tier hardware is approaching overload. It is better to run fewer concurrent jobs order to keep startup times low.
DataStage jobs fail with error message: Message: Error setting up internal communications (fifo RT_SCTEMP/jobName.fifo)
Symptoms
DataStage jobs fail with the following error message:
Message: Error setting up internal communications (fifo RT_SCTEMP/jobName.fifo)
DataStage is unable to create, delete, read, or write a temporary fifo file for a job to the RT_SCTEMP directory within the project that owns the job.
Causes
The error occurs for the following reasons: v DataStage cannot process the file because they are locked. v DataStage cannot process the file because of inadequate file permissions or other file system problems. v Virus Scan or backup program interferes with writing fifo files to temp or scratch directories, which is described in the following technote: https://www-304.ibm.com/support/docview.wss?uid=swg21445893&wv=1

If the failure is caused by locks, then the error must have additional text at the end or in subsequent message that states the lock status, for example:
Error setting up internal communications (fifo RT_SCTEMP/MyTestJob.fifo LOCKED STATUS () -1); file is locked.
In that situation, read the following technote for instructions on how to clear locks for a job: https://www-304.ibm.com/support/docview.wss?uid=swg21438482 If the issue is not caused by locks, then review the following checklist to resolve other common causes for this error: v Check the file limits at job run time, especially if all jobs run under a common user ID such as DSADM. You can check the limits used at DataStage job runtime even if you cannot run jobs, by running the command through the DataStage Administrator client. Log in to DataStage Administrator Client, select the failing project, click the COMMAND button, and then enter command: sh ulimit -a If the number is under 2048, consider increasing it. On busy systems it may need to be higher. In this situation, you can add command to set the limit to $DSHOME/dsenv script such as: ulimit -n 10240 After you make the change you must stop and restart DataStage and then perform the test again to ensure that new limit is in effect. v Check available space on the volume that contains the RT_SCTEMP directory. If the project that contains failing jobs is named "MyProject", then the path to RT_SCTEMP is similar to the following path: /opt/IBM/InformationServer/ Server/Projects/MyProject/RT_SCTEMP v Check permissions for the RT_SCTEMP directory and the files inside it. Ensure that the user ID running jobs, which is listed on event messages in job log, has
35
read and write permission to the directory and the files within it either via ownerid, group membership, or public permissions. A quick test to confirm if permissions are the problem is to set directory permissions temporarily to 777 so that all users can write to it. v Confirm that DSADM or the user ID that runs the failing jobs can create a file in this directory using the following steps: Login to server operating system as the user ID who runs the failing DataStage jobs. Change directory to InformationServer/Server/Projects/projectname/ RT_SCTEMP directory. Enter the following command: touch test.fifo If the above command fails, then the user ID is unable to create a file at that location and that issue must be resolved before DataStage jobs can run correctly. If this issue is not due to locks, then the DataStage error occurs due to an inability to correctly create, read, write, or delete the temporary fifo files. If the above tests to not isolate the cause of file system i/o problem, then it may be necessary to contact Information Server support for assistance in performing a system trace (truss or strace) of the dsapi process launching the failing jobs to track down the actual OS operations which are failing.
Information Server or DataStage job fails with "Could not map table file"
Jobs start to fail when memory is fragmented, or the amount of data that is used in the Lookups exceeds its limit.
Symptoms
Information Server or DataStage job fails with the following message:
Could not map table file
DataStage fails when trying to load lookup data into memory or create lookup file
Causes
There might be other applications that are running concurrently with resources that are no longer available to DataStage. Available memory might be reconfigured from creating or moving lpars. ** LDR_CNTRL environment setting on AIX might limit the ulimit -d (data) setting even if you have hard limit set higher. DataStage is limited to the amount of memory that can be allocated for a Lookup. A single Lookup stage in designer has multiple Lookup inputs. These stages analyze the corresponding number of Lookup operators in the generated osh script. When operator compatibility is optimal, each Lookup operator has one physical process for each partition that is defined by the configuration file. Each physical process can address only up to 2 GB of memory because it is a 32-bit application. The Windows version of the DataStage Parallel Engine is only available with 32-bit pointers. Each lookup requires contiguous memory allocation. Each process is limited to the ulimit setting of the DataStage Environment that can be limited by LDR_CNTRL on AIX. Each lookup data set uses the entire partitioning method by default. With the entire partitioning method, one memory segment is used and shared across all partitions for a physical server. The method is defined by the fastname option in the configuration file.
36
For MPP environments, each server has an individual copy of a memory segment. If you use a method other than entire or auto partitioning, each partition uses its own copy of data in memory and only up to 2 GB or ulimit -d (data)**. This method is the most restraining. All lookup data for lookups are processed to a file in scratch and then loaded to a mmap structure in memory. The mmap function is a C++ function. Allocation of this structure requires contiguous memory, and happens before any source data is processed for the lookup.

You must design the job to be more scalable. By default the Lookup stage uses Entire partitioning for the Lookup data. This ensures that no matter what partition the source data resides in, all the Lookup data is available to be matched. Use hash partitioning. Create a hash on the lookup keys for both the source and lookup data. As the amount of lookup data increases, add more nodes to the configuration file to distribute data across more processes and more memory segments.
Parallel startup failed for job runs on multiple nodes across multiple servers
Parallel jobs can fail from configuration errors.
Symptoms
A parallel DataStage job with configuration file that is set up to run multiple nodes on a single server fails with the following error:
Message: main_program: **** Parallel startup failed ****
Causes
The full text for this parallel startup failed error provides some additional information about possible causes of the problem. The problem is often caused by one of the following configuration errors: This problem is caused by configuration errors v The Orchestrate installation directory is not properly mounted on all nodes. v The rsh permissions are not set correctly with /etc/hosts.equiv or .rhosts. v The job runs from a directory that is not mounted on all nodes The messages in the server log that precede the startup failed message contain more information about the cause of the failure. For the situation where a site is attempting to run multiple nodes on multiple server machines, the above statement is correct. More information about setting up ssh/rsh and parallel processing can be found in the following topics: v Configuring remote and secure shells v Configuring a parallel processing environment In the case where all nodes are running on a single server machine, the "Parallel startup failed" message is usually an indication that the fastname defined in the configuration file does not match the name output by the "hostname" command on the server. In a typical node configuration file, the server name where each node runs is indicated by the fastname in the /opt/IBM/InformationServer/Server/ Configurations/default.apt
{ node "node1" {fastname "server1"
37
pools "" resource disk "/opt/resource/node1/Datasets" {pools ""} resource scratchdisk "/opt/resource/node1/Scratch" {pools ""} } node "node2" { fastname "server1" pools "" resource disk "/opt/resource/node2/Datasets" {pools ""} resource scratchdisk "/opt/resource/node2/Scratch" {pools ""} } }

Log in to the DataStage server machine and at the operating system command prompt, enter hostname command. If the host name output EXACTLY matches the fastname defined for local nodes, then the job runs correctly on that server. However, if the hostname command outputs the host name in a different format (such as with domain name appended) then the names defined for fastname will be considered remote nodes and a failed attempt will be made to access the node via rsh/ssh. Using the above example, if the output of the hostname command was server1.mydomain.com, then prior to the "Parallel startup failed" error in job log you will likely see the following error: Message: main_program: Accept timed out retries = 4 server1: Connection refused This problem occurs even if your /etc/hosts file maps server1 and server1.mydomain.com to the same address since it is not the inability to resolve either address that causes this issue, but rather that the fastname in node configuration file does not exactly match the system host name or the value of APT_PM_CONDUCTOR_NODE.

You can resolve the problem with the following actions: v Change fastname for nodes in configuration file to exactly match the output of hostname command. v Set APT_PM_CONDUCTOR_NODE to the same value as fastname. This would need to be defined either in every project or every job. Do not change the host name of server to match fastname. Information Server / DataStage stores some information based on the current host name. If you change the host name after installation of Information Server / DataStage, then you will need to contact support team for additional instructions to allow DataStage to work correctly with the new host name.
DataStage Parallel Job fails with reading connection message error

DataStage parallel job fails with a reading connection message error and no data is processed.
Symptoms
The DataStage job log contains the following unrecoverable error:
Item #: 13 Event ID: 1960 Timestamp: 2011-09-01 06:30:44 Type: Fatal
38
User Name: dsadm Message Id: IIS-DSEE-TFPM-00154 Message: main_program: APT_PMConnectionRecord::start: Reading connection message returned 28, expected 40, Error 0 Item #: 14 Event ID: 1961 Timestamp: 2011-09-01 06:30:44 Type: Fatal User Name: dsadm Message Id: IIS-DSEE-TFPM-00356 Message: main_program: **** Parallel startup failed **** This is usually due to a configuration error, such as not having the Orchestrate install directory properly mounted on all nodes, rsh permissions not correctly set (via /etc/hosts.equiv or .rhosts), or running from a directory that is not mounted on all nodes. Look for error messages in the preceding output.
Causes
The framework that is used by DataStage to start all the parallel processes uses TCP/IP connections during the startup phase, even on single-host configurations. The processes are listening for very specific responses on these ports, to coordinate the startup. This error means that one of the processes received an unexpected response and terminated. These ports are used for low-level coordination between the specific DataStage processes, not for user requests, so there is very little error handling capability. When this error occurs it is an indicator that some process other than DataStage has connected to one or more of the ports and put invalid data there. The DataStage process that receive this unauthorized connection has no other alternative except to print the error and exit. The default port range used by DataStage for this communication is 10,000 - 11,000 and 11,000 and up (there is no upper bound, but it reasonably will not be more than a few thousand.) These are not common port ranges for other software applications to use, so when this problem occurs it usually means that port scanning software, network monitoring software, or intrusion detection software might be the cause.

Ensure that no other applications are using the ports in the 10,000 - 15,000 range. If the problem persists, change the default starting point for these port ranges to something else. You can use the operating system utility netstat -an to see which ports are in use. Select starting port ranges where netstat shows that nothing is in use.
APT_PM_STARTUP_PORT=50000 APT_PLAYER_CONNECTION_PORT=51000
You can set these environment variables at the job or project level for testing. When you find suitable values and the problem does not recur, set and export these variables in the /opt/IBM/InformationServer/dsenv file so that they take effect for all projects.
DataStage Parallel job failed to start because of a process fork failure in Solaris.
Symptoms
A parallel job terminates with the following message:
Fatal Error: Unable to start ORCHESTRATE process on node node1 (sun01): APT_PMPlayer::APT_PMPlayer: fork() failed, Not enough space
39
Causes
This error indicates that the fork() system call failed with an ENOMEM error returned, which means there is not enough swap space to support the virtual memory required by the call. The ENOMEM error for fork() can occur in other Operating Systems like AIX, Linux, or HP-UX, but it is seen more frequently in Solaris because Solaris requires much more virtual memory when using the fork() command because it does not have a memory overcommit feature. Linux, AIX, and HP-UX operating systems have a feature called a memory overcommit, or lazy swap allocation. In a memory over commit mode, malloc() does not reserve swap space and always returns a none NULL pointer, regardless of whether there is enough virtual memory on the system to support it or not. The swap space must be made available only when the memory is referenced. In contrast, under the Solaris OS, when the application calls malloc() and internally starts sbrk(2) to get more memory from the system, the kernel goes through its free memory lists and finds the requested amount of virtual memory. If it finds the requested amount of virtual memory, the kernel returns a pointer to that memory and reserves the swap space for it such that no other process can use it until the owner releases it. If the requested amount of virtual memory is not found, malloc() fails with an ENOMEM error and returns a NULL pointer. For a large-memory process in Solaris, the fork() system call can fail because an inadequate amount of virtual memory because the fork() call requires twice the amount of the parent memory. This can happen even when the fork() call is immediately followed by an exec() call that would release most of that extra memory.

You can reduce the swap space required for DataStage with the following methods: v Increase virtual memory by adding swap device. Consult your Solaris administrator for assistance. Adding more physical memory also increases virtual memory. v Decrease the number of UNIX processes for DataStage jobs. You can decrease the number of UNIX processes for a parallel job by using an APT configuration file of fewer nodes. Unsetting APT_DISABLE_COMBINATION variable also helps. v Don't use the /tmp directory for temporary files. In Solaris, /tmp is allocated from virtual memory and requires swap space. You can reduce /tmp usage by adjusting the DataStage environment variables TMP, TMPDIR, APT_FIFO_DIRECTORY, APT_PM_SCORE_DIR, and the uvconfig parameter UVTEMP to specify another location for these files. Specify that this new location is on a locally mounted disk. The maximum size for /tmp can be configured at mount time. Consult your Solaris administrator for assistance.
DataStage Jobs fail to start or perform poorly when temporary directories are large
DataStage jobs write multiple files to temporary directories which are not automatically cleaned.
Symptoms
When the number of temporary files grows large, DataStage jobs have slower performance and can hang. Sites that run DataStage for a year or more and did not cleanup the temporary directories can contain 100,000 or more files.
Causes 40
Normally temporary files are cleaned up when a job ends. However, terminated jobs can leave behind files.

Check these temporary directories for old DataStage files which can be removed, or many files in general since other programs also write to these directories. You can check the job log of any DataStage job to determine the value of the TEMP, TMPDIR, and TMP variables at job run time. On UNIX systems, also check location /tmp. DataStage writes non-temporary files to the following locations which are not automatically be cleaned up. The directories are pointed to by the following environment variables: v TEMP v TMPDIR v TMP v /tmp on UNIX You can check the job log of any DataStage job to determine the value of these variables at job run time.

Perform periodic cleanup of old DataStage files in these directories to prevent performance issues that relate to the number of files in the directories. You can remove DataStage files from these directories that are older than longest job run time or older than the last DataStage engine restart. Unless you have long running jobs or "always-on" web service jobs, it is generally safe to remove files more than two days old from the following directories: v The &PH& directory within each project directory v The directory that is pointed to by UVTEMP variable in the uvconfig file in the DSEngine directory v The directory that is pointed to by UVSPOOL variable in the uvconfig file in the DSEngine directory on UNIX systems. On Windows, this variable might not exist in uvconfig. When you remove files from the directory that is pointed to by UVTEMP variable in the uvconfig file in the DSEngine, the primary files to clean up are the capture files which capture the system output of external commands that are called from DataStage jobs. For example: capture0001360098aa1312969887. On UNIX platforms, the UVTEMP setting points to /tmp directory. On Windows platform, if UVTEMP variable is undefined, it defaults to the UVTEMP directory under DSEngine. /opt/IBM/InformationServer/Server/DSEngine/UVTEMP
Jobs with join stages unexpectedly terminated by UNIX Signal 11 SIGSEGV

InfoSphere DataStage job with a join stage aborts with the following error "Unexpected terminated by UNIX Signal 11 (SIGSEGV)"
Symptoms
InfoSphere DataStage job with a join stage terminates with the following error Unexpected terminated by UNIX Signal 11 (SIGSEGV)
Causes
41
If the size of the record is larger than the default setting of 20 MB, the sort that is inserted for the join fails.

Calculate the size of your largest record in bytes. Double that size and set the environment variable APT_TSORT_STRESS_BLOCKSIZE to that value in bytes. If this environment variable does not exist, create it first. The variable APT_TSORT_STRESS_BLOCKSIZE sets the size of the shared memory block that is used to pass data between the writer, the sorter, and merger processes within the sort operation. The default is set so that each sort in a job attempts to map 20 MB each for these in-memory functions. In the case when a single record is larger than 20 mb, this step is necessary to increase the size of the block. The APT_TSORT_STRESS_BLOCKSIZE environment variable can be set at the job level, which is recommended. The APT_TSORT_STRESS_BLOCKSIZE environment variable can also be set at the project level by defining a project level default value.
Disk space issues

Errors when writing record OshExecuter.osh to file
Symptoms
When you import a job into a project, you receive the following error:
Error writing record OshExecuter.sh to file RT_SCxxx - error number is 40019
The job is corrupted.
Causes
The following items commonly cause job corruption: v Full disk space in the /tmp UVTEMP directory or the DataStage project directory. v A 32-bit hash file becomes larger than 2GB v Power outages v System crashes v Rebooting the server while a job is running v Virus checker/scanner running while job is running v Backup running while job is running v Failures on the system

To see if the job is corrupted or not run the following process: 1. Log on to DataStage Administrator client 2. In the projects tab, select the corrupted project 3. Click Command 4. Issue the following command: LIST <FILENAME> F1 DET.SUP If the job is not corrupted you can see a number that indicates the number of records in the file. If the file is corrupted you receive an error message.

If you confirm that the file RT_SC file is corrupted you can re-create the job.
42
Unable to view data or run a Parallel job with a Sequential File stage
In a DataStage Parallel job with a Sequential File stage, cannot view data or run the job.
Symptoms
The following error message is generated:
File archive: Trouble creating file
Parallel job with Sequential File stage plug-in. View data results in error like : IIS-DSEE-TFAR-00015 00:10:13 <main_program> File archive: Trouble creating file "/tmp/...." Run time results in errors like: Message Id: IIS-DSEE-TFAR-00015 Message: main_program: File archive: Trouble creating file "/tmp/...." Message Id: IIS-DSEE-TFPX-00002 Message: main_program: Fatal Error: Null archive.
Causes
The program is searching for relative path called "tmp" that is not present. This occurs on Windows installations when the project is not on the same drive as the engine.

Create a directory called "tmp" at the root of the drive where the DataStage project is located. For example, if the DataStage projects are on the D: drive, create the following directory: D:\tmp If the directory exists, check the remaining disk space on your drives to ensure that limited disk space is not the cause of the problem.
How to verify if DataStage project has any corrupted files

If DataStage project or temp directory runs out of disk space, some of the buffered write requests might not get correctly completed and the project might ended up having corrupted hash files.
About this task

Run the uvbackup process and redirect the backup to null devices. The uvbackup produces an output file that can be used to identify the corrupted files.
Procedure
1. Source your dsenv file in $DSHOME (. ./dsenv) 2. Go to your project directory (../InformationServer/Project/<project name>) 3. List all files and direct them to a file (ls > myfiles.txt) - this is used to list of files for the uvbackup 4. Run the uvbackup and redirect output to null with this command: "$DSHOME/bin/uvbackup -V -f -cmdfil myfiles.txt -s uvbackupout.txt -t /dev/null 2>&1 > testing123.txt" 5. grep "WARNING:" uvbackupout.txt
Results
The output file: uvbackupout.txt will help to identify if there are any corrupted files in the project.
43
Example
Here is an example what you might see in the uvbackupout.txt file:
WARNING: Unable to open file RT_STATUS3 for reading. File not saved!
The uvbackup verifies the integrity of the files and will not backup any files that are corrupted.
DataStage PX job errors with no available output files

Symptoms
You receive the following error message:
Error Message: Sort_17,2: write failed: Output file full, and no more output files [sort/merger.C:1393] Message: Sort_17,2: Fatal Error: Tsort merger aborting: mergeOneRecord( ) punted [sort/merger.C:1214] Message: Sort_17,2: Fatal Error: Pipe read failed: short read [sort/m erger.C:1740]
Causes
This error indicates that the job is running out of scratch, temporary, or swap space.

Checking the amount of space available without the job that is running can give a false indication that there is enough space. Open the configuration file that is being used by the job. This can be found by checking the APT_CONFIG_FILE parameter of the job. You can find the configuration file in ../IBM/InformationServer/Server/Configurations/*.apt. Check the location of the scratch space that is identified for each node. While the job is running, issue the following command: df -k This command displays the amount of space that is allocated and being used on the system. Make sure that there is space available for /tmp, /swap, and the scratch area. Also make sure that the location that is identified in the configuration file for scratch space also has enough space.

Increase the disk space for /tmp, /swap, and /scratch. You can create an additional scratch area in your configuration file. For instance, you might add a second scratch disk area with the following command:
node "node1" { fastname "gcp5bs03" pools "" resource disk "/opt/local/InformationServer/Server/Datasets" {pools ""} resource scratchdisk "/opt/local/InformationServer/Server/Scratch" {pools ""} resource scratchdisk "/etc/svc/volatile" {pools ""} }
Temporary lookuptable files consuming disk space

When a DataStage job with a lookup stage aborts, there might be lookuptable files left in the resource directories and they consume space.
Symptoms
The resource directories and common space contain files with names that are similar to the following file name: lookuptable.20091210.513biba
44
Causes
When a job aborts, it leaves the temporary files for postmortem review in the resource directories. Temporary files are left in scratch, but lookup files are created in the resource directories. Lookup file sets are not removed. A lookup file set is similar to the following file set: /opt/IBM/InformationServer/Server/Datasets/ export.dsadm.abcdefg.P000000_F0000 A lookup file has a structure that is similar to the following file: /opt/IBM/InformationServer/Server/Datasets/ lookuptable.20091210.513biba

Look for files with file names similar to the following structure: lookuptable.yyyymmdd.nnnnnnn You can find these files on the disk when no jobs are running.

When there are no jobs that are running, remove all of the files with lookuptable at the beginning of the file name. These files get re-created with every new run of the job and are never reused. If the job runs successfully, then only the lookuptable file created during that job run is removed. Create a maintenance script that cleans up the lookuptable files regularly when jobs are not running.
Disk lookup issues

DataStage parallel job with lookup aborts with error File too large
Symptoms
A DataStage parallel job that contains a lookup failure with an error that is similar to the following error: Lookup_107,0: Error writing table file "/d01/Ascential/DataStage/Datasets/lookuptable.20100217.abcde": File too large
Causes
The lookup table is too large to fit in available memory.

Create a hash partition on the lookup data and add more nodes to the parallel configuration file to break the lookup data into smaller pieces.
DataStage jobs fail to map table files

Symptoms
DataStage jobs fail with error: Message:: stage_lkup_FS,0: Could not map table file "/Projects/Datasets/lookuptable.20110720.oja0s0c (size 29056 bytes)": Invalid argument. This error occurs despite the fact that the file is small and disk volumes are not full.
Causes
Normally when a DataStage job fails with the logged error message "Could not map table file" the message ends with "not enough space". In that situation, the issue is either insufficient disk space, or table too large to map into a single process and can be resolved by the steps described in the following related technote: Information Server or DataStage job fails with "Could not map table file" For the special case where the error message ends with "invalid argument" this is typically due to an I/O error unrelated to disk full. The most common cause of the error is that one of the directories where DataStage is writing is mapped to a volume which was mounted with the CIO option.
45

In the job log, locate the error message that contains "could not map table file". That file contains the name of directory we were writing to. Additionally, the next message in file might be a second error that points to a different directory. For each of these directories and any other resource and scratch disks identified in the node configuration file used by this job, use the df command to confirm the volume that contains the directory. For example: df /Projects/Datasets Then issue the mount command to list the mounted volumes and the options that they are mounted with. Check to see if the volume with failing I/Os was mounted with the "cio" option. If the volume was mounted with the "cio" option, it causes problems for DataStage jobs and severe problems for jobs that use the lookup stages or other stage types that rely on memory map files. The reason why the "cio" option causes problems is explained in the AIX mount command reference. The explanation for the cio option states why this option causes problems for programs that use the mmap() function and other related functions.

Stop the DataStage engine, then have the system administrator remount that volume without the "cio" option. Use the default rw mode to remount the volume, and restart DataStage. This action resolves the invalid argument error that occurs when you map table files.
Data processing failures

Issues with NULL values in input columns in Parallel Engine Transformer stage
InfoSphere Information Server 8.5 changed how the parallel engine transformer stage handles columns that contain null values.
Symptoms
Jobs that process nulls in transformer stages show different behavior when migrated from 8.1 to 8.5, even when legacy null handling is set.
Causes
In InfoSphere Information Server version 8.1 and prior versions, the job design had to explicitly handle null column values in the Transformer stage. If the Parallel Engine encountered null column values outside of specific contexts, the entire row containing the null was dropped, or sent to a reject link if the Transformer stage had a reject link. Note: NOTE: This topic refers to SQL value NULL, not the character with value 0x00, and not an empty string. Explicit null handling made Transformer stage coding too complex and allowed inconsistent behavior. In InfoSphere Information Server version 8.5, the default behaviors were changed and explicit null handling was no longer required. It was recognized that some customers would want to retain the original null-handling behavior so an environment variable, APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING, was introduced. The environment variable, when defined, preserves compatibility with the behavior of pre-version 8.5 InfoSphere Information Server.
46
Since version 8.5 shipped, differences in the default null handling behavior of Version 8.5 and problems with the implementation of the InfoSphere Information Server 8.1 compatibility mode have been discovered. There have been issues with null handling in InfoSphere Information Server 8.5 with backward compatibility enabled. There have also been issues with null handling in InfoSphere Information Server 8.1, and earlier versions. Most of these issues were due to lack of clear explanation about how null values should be handled in the Transformer stage.

InfoSphere Information Server 8.1 NULL handling: Explicit handling required. If you use input columns in an output column expression, a NULL value in that input column will cause the row to be dropped, or rejected if a reject link is defined. This applies for the following conditions: v An input column is used in an output column derivation expression like ?DSLink4.col1 + 1?. v An input column is used in an output column constraint. v An input column is used in a stage variable derivation. It does not apply where an output column is mapped directly from an input column, with a straight assignment expression. In InfoSphere Information Server 8.1, there are three legal ways to explicitly handle NULL values in input columns in the Transformer stage. Conversion of NULL values: Nulls can be converted to other values using any of the following functions. v NullToEmpty() v NullToZero() v NullToValue() See the following example:
DSLink3.OrderCount + 1 --> If DSLink3.OrderCount is NULL, the whole record containing DSLINK3.OrderCount will be dropped or sent to the reject link
This expression can be changed to the following:

NullToZero(DSLink3.OrderCount) + 1 --> If DSLink3.OrderCount is NULL, the target field will be assigned the integer value 1.
Check for NULL value To test if a value is NULL in a logical expression, use one of these two functions. v IsNotNull() v IsNull() For example:
DSLink3.OrderCount + 1 --> If DSLink3.OrderCount is NULL, record will be dropped or rejected. This expression can be changed to: If(IsNotNULL(DSLink3.OrderCount) Then DSLink3.OrderCount + 1 Else 1 --> If DSLink3.OrderCount is NULL, the target field will be the integer 1 .
47
Each nullable column in a given expression needs to be properly NULL checked or the NULL value needs to be converted to a concrete value. IF-ELSE operations on NULL values Handling NULL values in IF-ELSE conditions can be complex. Consider the following examples to get familiar with using NULL checks in IF-ELSE statements. Example 1: Simple IF ELSE statement
If (DSLink1.Col1 > 0) Then xxx Else yyy In InfoSphere Information Server 8.5 code will be generated to drop records in case DSLink1.Col1 is NULL.
This statement must be written as the following:

If (IsNotNull(DSLink1.Col1) and (DSLink1.Col1 > 0)) Then xxx Else yyy or If (IsNull(DSLink1.Col1) or (DSLink1.Col1 > 0)) Then xxx Else yyy based on the business requirement on how the NULL case should be treated
The statement can also be written as the following:

If (NullToZero(DSLink1.Col1) > 0) Then xxx Else yyy or If (NullToZero(DSLink1.Col1) >= 0) Then xxx Else yyy again based on the business requirement. NUllToValue() or NullToEmpty() also can be used instead of NullToZero(), based on the requirement. NULL conversion functions should not be used for checking a NULL Example: If ((NullToZero(DSLink1.Col1) = 0) or (DSLink1.Col1 > 0)) Then xxx Else yyy NullToZero(DSLink1.Col1) = 0 is not considered a NULL check, code will be generated to drop records in case DSLink1.Col1 is NULL.
Example 2: IF ELSE statement with multiple conditions

If ((DSLink1.Col1 = 5) or (DSLink1.Col2 = 8)) Then xxx Else yyy --> Col2 is a nullable field
This statement cannot be written as the following:

If (DSLink1.Col1 = 5 or IsNotNull(DSLink1.Col2) and (DSLink1.Col2 = 8)) Then xxx Else yyy This is not the proper way of checking. If(condition1 or condition2 and condition3) will be treated as If((condition1 or condition2) and condition3))
The statement should be written as the following:

If (DSLink1.Col1 = 5 or (IsNotNull(DSLink1.Col2) and (DSLink1.Col2 = 8))) Then xxx Else yyy
The condition which contains the nullable column should be properly (order should be clearly specified using parentheses where ever needed) pre-"AND"ed with IsNotNull() check or pre-"OR"ed with IsNull() check.
48
Example 3: IF ELSE statement in which the nullable field is used multiple times
If ((DSLink1.Col1 = 3) or (DSLink1.Col1 = 5)) Then xxx Else yyy Records will be dropped in case Col1 is NULL.
This statement cannot be written as the following:

If ((IsNotNull(DSLink1.Col1) and (DSLink1.Col1 = 3)) or (DSLink1.Col1 = 5)) Then xxx Else yyy
This statement should be written as the following:

If ((IsNotNull(DSLink1.Col1) and (DSLink1.Col1 = 3)) or (IsNotNull(DSLink1.Col1) and (DSLink1.Col1 = 5)) Then xxx Else yyy
Each instance of nullable field must be pre-"AND"ed or pre-"OR"ed with NULL check. Example 4: Using 2 nullable columns in a condition.
If (DSLink1.Col1 = DSLink1.Col1) Then xxx Else yyy
Both the columns must be NULL checked or NULL conversion functions should be used on both the columns.
If (IsNotNull(DSLink1.Col1) and (IsNotNull(DSLink1.Col2) and (DSLink1.Col1 = DSLink1.Col1))) Then xxx Else yyy
InfoSphere Information Server 8.5 NULL handling: Explicit handling not required In InfoSphere Information Server 8.5, NULL value in that input column will NOT cause the row to be dropped nor will it be sent to reject link. A NULL value in that input column will be handled by the Transformer stage, following specific logic. The job designer can skip explicit NULL handling. For further details please see the following technote:https:// www.ibm.com/support/docview.wss?uid=swg21514921 The environment variable APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING can be used, in case the designer wants to have 8.1 behavior in 8.5. Enabling old NULL handling can be done at the three following stages: v Setting APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING at project level v Setting APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING at job level v Checking "Legacy null processing" in the DataStage Designer for individual Transformer stages in a given job IBM has recently discovered a previously undocumented difference in behavior of InfoSphere Information Server 8.1 and InfoSphere Information Server 8.5 with old NULL handling enabled. InfoSphere Information Server 8.5 allowed the three "NullToxxxx()" functions to be used as Null tests. The following is an example of the IF-ELSE condition:
If ((NullToZero(DSLink1.Col1) = 0) or (DSLink1.Col1 > 0)) Then xxx Else yyy
The (NullToZero(DSLink1.Col1) = 0) section is considered as a NULL check and records were not dropped or sent to the reject link because of inconsistency in the code. In InfoSphere Information Server 8.5 this code
49
inconsistency was eliminated and only IsNull() and IsNotNull() can be used as Null checks. InfoSphere Information Server 8.1 Jobs which used NullToZer0(), NulltoValue(), or NulltoEmpty() for null checking must be changed to use IsNull() or IsNotNull(). Note: !IsNull() can be used instead of IsNotNull(), and !IsNotNull() can be used instead of IsNull() in NULL checking.
DataStage jobs that compiled on previous versions, have transformer compile errors v8.1
Symptoms
Your DataStage v7 jobs which have a transformer stage where a stage variable is set to null compile successfully. However, the same job on Information Server DataStage v8.1 fails with the following compile error:
<transform> Error when checking composite operator: Setting null to this non-nullable field: StageVar0_myStageVariable

Stage variables which can be created in the transformer stage do not have a "nullable" option. DataStage considers stage variables to be non-nullable objects, thus the usage of SetNull() for stage variables is not supported. Information Server 8.1 traps this condition and reports it as an error at job compile time. DataStage 7.5 did not trap this condition at compile time which meant that the job did compile, but had the potential to fail at runtime when handling null data. The capability to support null stage variables was added in Information server 8.5. This capability is not available for Information server 8.1. For Information Server 8.1 , if a transformer stage must assign a nullable field to a stage variable, assign an empty string with two single quotes instead of a null. i.e.: If IsNull(InLink.FieldName) then '' Else InLink.FieldName For Information Server / DataStage 8.5, the transformer stage properties dialog now has new options on the Stage -> General page: Legacy null processing Abort on unhandled null When this is enabled, the legacy option handles nulls as with prior releases. When disabled, null values can be output. Newly created 8.5 jobs have the legacy option disabled by default, while jobs imported/migrated from older releases might have this option enabled by default. More information about these options is available via the Help button for the Transformer properties dialog in DataStage 8.5.
Converting a nullable source warning message in IBM InfoSphere DataStage

Symptoms
The job log contains the following warning message:
When checking operator: When binding output interface field "<field name>" to field "<field name>": Converting a nullable source to a non-nullable result; a fatal runtime error could occur; use the modify operator to specify a value to which the null should be converted.
Causes
The warning occurs when the field comes in with Nullable=Yes and the output gets set to the value of that field, and the output field is Nullable=No. Setting a constraint in the database does not allow the incoming value to be Null, but it does not eliminate the warning
50

Ensure that both fields are either Nullable=Yes or Nullable=No. Note that the warning refers to the job design, and does not refer to the actual data or tables. Because the table definitions are generated from the tables themselves, the source table might have a column that is nullable and the same column in the target table is not nullable. This results in the respective column definitions when the tables are imported. You can also eliminate the warning by changing the source and target tables to both be nullable or not nullable and regenerating the table definitions.
InfoSphere DataStage: Problems when running multiple instances of a job from a job sequence, or from a script that uses dsjob.
IBM InfoSphere DataStage: There are a number of related problems that can occur when you are running multiple instances of a job either from job sequence, or from a script that sequences jobs by using the dsjob command.
Symptoms
v Multiple instances of a job are run from a sequence or a script, and the sequence reports status=99 for one or more of the job instances. v Multiple instances of a job are run from a sequence or a script, and the job instances take a long time to start and to finish. v More than 25 instances of a job are run from a sequence (or a script) and the sequence reports status=99 for one or more of the job instances. v The system does not have enough resources because of a heavy work load and the sequence reports an error code=-99 for a parallel job. v On Intel RedHat and Suse platforms, jobs can hang despite having successfully run the underlying osh code. v Some jobs run with missing parameters, or parameters erroneously set to default values.

For release 8.0.x /8.1: v Install JR30015v5 (except RedHat and Suse-Intel) for the server. v Install JR30015v6 (RedHat and Suse-Intel) on the server. v If auto-purge is enabled, and more than 25 instances of a job are run simultaneously, then the client patch must be installed on all client machines JR30015v3 v Recompile all parallel jobs after installing the patch or fix pack. For release 8.1: v Install fix pack 1. v Recompile all parallel jobs after installing the patch or fix pack. The fix introduces the following optional capabilities: v Environment variable: DSWaitResetStartup When multiple instances of a job are run from a sequence, and one or more of the job instances are set to reset, the sequence might report a status=99. This can occur because the controlling sequence did not give the job instances enough time to reset before polling its status. The startup time for a job reset must be increased. The environment variable DSWaitResetStartup can be used for this purpose. (The maximum value that can be set for DSWaitResetStartup is the value of DSWaitStartup (default is
51
60). For example, if a value of 120 is required for DSWaitResetStartup, then ensure that DSWaitStartup is also set to a minimum of 120.) v Environment variable: DS_NO_INSTANCE_PURGING If the system is under extreme load, it might be necessary to use the DS_NO_INSTANCE_PURGING environment variable if Status=99 errors still occur when running many multi-instance jobs and auto-purge is enabled. This environment variable must be set to 1. This stops the auto-purge from deleting the status records for the job instance, allowing the controlling job to read its status when system resource becomes available. (In other situations, you might want clean logs with no persistent instance entries, so the default behavior is to purge instance entries.) v Client change, and environment variable DSJobStartedMax: The number of recorded Instance Identifiers increased from 25 to 100, to prevent Status records from being purged when more than 25 Instances are being run simultaneously. If you are using N-Instance auto-purging, and running more than 25 simultaneous Instances, then the N-Instance auto-purge limit must be set to more than 25 in the Director or Administrator. If more than 100 Instances are to be run simultaneously, then the environment variable DSJobStartedMax must be set to the required value, up to a maximum of 9999. The APAR number for this issue is APAR JR30015
DataStage timeout variables

A DataStage job completes successfully but dsjob returns "Error 81002 waiting for job"
Symptoms
When running dsjob from UNIX, the status of success or failure is returned correctly for some jobs, but long-running jobs return "Error 81002 waiting for job" even though the job rus successfully. Specifically, if the job runs longer than the timeout set in DataStage Administrator -> General tab -> Inactivity timeout, the 81002 error is returned. If the job completes before the timeout, the correct status is returned.

Increase the timeout in the DataStage Administrator to accommodate the longest-running job. However, this also increases the time that inactive client connections remains on the system before being cleaned up. If increasing the timeout in DataStage Administrator does not work, create a script that starts dsjob, and uses a loop to query dsjob periodically to avoid the timeout.
Parallel job that runs on a remote node fails with broken pipe in IBM InfoSphere Information Server
Symptoms
You run a parallel job in a cluster environment with a remote node and receive the following message in the job log:
main_program: Fatal Error: Service table transmission failed for node2 (<node name>-svc:Broken pipe.

Increase the value of APT_PM_CONDUCTOR_TIMEOUT to a larger value. The default value is 60 seconds. Double the value to 120 seconds. If increasing the APT_PM_CONDUCTOR_TIMEOUT value does not solve the problem, the issue might be a problem with starting the resource tracker. You can
52
use the resource tracker to gather machine statistics from the machine where jobs are running. These statistics have no effect on the actual job that is running, and are unrelated to the information that is captured in the DataStage job monitor or job log. If you are not currently using the resource tracker functionality, it can easily be turned off to avoid this problem. To turn off the resource tracker, add the following APT_DISABLE_TRACKER_STARTUP environment variable at the project level, and set the default value to 1.
A parallel job fails intermittently on systems with a heavy workload

An IBM InfoSphere DataStage parallel job can fail intermittently on systems with a heavy workload.
Symptoms
A parallel job fails with one of the following error messages:
ds_ipcopen() - call to OpenFileMapping() failed - The system cannot find the file specified ds_ipcput() - timeout waiting for mutex
Causes
In systems with a heavy load, timeouts can cause the system to fail.

Increase the active link performance timeout variable. DSIPC_OPEN_TIMEOUT To increase these values, perform the following steps: 1. Open the Administrator client. 2. Open the project properties for your project. 3. Open the Tunables page and increase the Timeout value under Active-to-active link performance. (You might have to select both Enable row buffer and Inter process to enable the Timeout field). 4. Open the General page and click Environment. 5. Open the Parallel > Operator-specific branch. 6. Increase the value of the DSIPC_OPEN_TIMEOUT environment variable. If increasing the timeout does not help, or if the problem always affects your job, contact IBM support for further advice.
Parallel job ends with an APT_PMsyncWithSectionLeaders: Non-zero status 4 error.

Symptoms
The DataStage Parallel job log shows the following information:
APT_PMsyncWithSectionLeaders: non-zero status 4 from APT_PMpollUntilZero broadcastStepIR: step timed out sending 66,135,142-bytescore; status = 4 Error during score broadcast or reload. Score size is 66,135,142 bytes
Causes
The error usually indicates that a resource issue is the cause of the problem.

The environment variable APT_PM_NODE_TIMEOUT controls the number of seconds that the conductor waits for a section leader to start and load a score before deciding that something failed. The default for starting a section leader
53
process is 30 seconds. The default for loading a score is 120 seconds. Set the following environment variable at the project level: APT_PM_NODE_TIMEOUT=300 You can increase the value of this environment variable to 600 if 300 does not resolve the problem. If adding the APT_PM_NODE_TIMEOUT environment variable does not correct the issue, monitor processor, disk space, memory, and swap when the job is running. Check with your network administrator to see if the nodes are on a SAN or NFS mount.
Troubleshooting Specific Stages DB2 Connector Stage

Could not obtain the error message from server
Symptoms
You receive the error Could not obtain the error message from server (function=SQLAllocHandle(SQL_HANDLE_ENV))when trying to view data or run a job.

Verify the Instance property specifies a valid DB2 instance. If the Instance property is blank, verify the DB2INSTANCE environment variable specifies a valid DB2 instance. For more information on known issues relating to the DB2 instance, see the release notes.
Error loading connector library when trying to view data

Symptoms
When you try to view data, you receive the following error: Error loading connector library. The error message complains about symbols not exported in libfcl.

The LIBPATH/LD_LIBRARY_PATH/PATH environment variable used by the ASBAgent does not have the directories listed in the correct order. The fcl library located in the ASBNode directory must appear first in the LIBPATH/ LD_LIBRARY_PATH/PATH environment variable that is used by the ASBAgent. Change your environment variable so that the ASBNode fcl library is found first. For example, the path should be similar to the following:
LIBPATH=/opt/IBM/InformationServer/ASBNode/lib/cpp:....[add'l directories]
Action String parameter is not valid or too long

Symptoms
When running a job with Write mode set to Bulk load, you receive the following error SQL3009N The Action String parameter is not valid or too long.

This error occurs due to a bug in DB2 9.5 FP2 and is documented in the following technote: http://www-01.ibm.com/support/docview.wss?rs=14&uid=swg21322938. You must install DB2 9.5 FP3. After installing DB2 9.5 FP3, you may need to rebind your packages by following the directions in the following technote: http://www-01.ibm.com/support/docview.wss?uid=swg21215626
SQL0443N Routine error when running a job

Symptoms 54
You receive an error similar to SQL0443N Routine "SYSIBM.SQLCOLUMNS" (specific name "COLUMNS") has returned an error SQLSTATE with diagnostic text "SYSIBM:CLI:-805". SQLSTATE=38553 when you run a job.

Rebind your packages. Follow the directions in the following technote: http://www-01.ibm.com/support/docview.wss?uid=swg21215626
Error when setting the active DB2 instance value

Symptoms
In the DB2 Connector stage GUI, when you change the Instance property and click the Test Connection or View Data buttons, you receive the following error: An error occurred setting the active DB2 instance to value: <NewInstance>. The active DB2 instance is already set to value: <CurrentInstance>.
Potential data loss and corruption from the connector

Symptoms
When running a DataStage job you see warnings similar to the following: DB2: [IIS-CONN-DAAPI-000396] Writing the WCHAR column COL2 into a CHAR database column can cause data loss or corruption due to character set conversions.

DB2: [IIS-CONN-DAAPI-000393] The length of WCHAR column COL2 cannot be validated because the database column is CHAR and character set conversion is involved. Inadequate column lengths can lead to data truncation or unexpected errors. This can happen for two reasons: v The column type in the job is specified as NChar or NVarChar v The column type in the job is specified as Char or VarChar with the extended attribute 'Unicode' set. In both cases above the column stores a Unicode string. On the other hand the CHAR database column stores data in a non-Unicode format. The Unicode field can potentially have characters that cannot be converted into the database code page used for the CHAR column. Even if all the characters can be converted the connector cannot determine if the size of the database CHAR column is adequate to store every possible combination of Unicode characters contained in the job Unicode column as the converted characters may occupy more then one byte. This is what the warning is trying to tell the user. To avoid this warning, make sure you use the same column type in both the job and the database (that is, both Char/VarChar or both NChar/NVarChar). If the columns in the job are loaded from a saved table definition in DataStage they will automatically get the extended attribute Unicode set. This will make Char/VarChar columns equivalent to NChar/NVarChar and potentially cause the above warning. If this is not desired, make sure you manually remove the extended attribute. Note that WCHAR in the message refers to a 'wide' character type which is equivalent to NChar types.
55
An error occurrs while getting the DB2 instance

Symptoms
When you run a job that accesses a remote DB2 node and I have specified an instance in the Instance property, you receive the following error: An error occurred while getting the DB2 instance. Verify that a valid DB2 instance is specified in the Instance property. The method sqlegins returned reason code 0, SQLCODE -1,390.

This may be due to the fact that the instance you have specified in the Instance property is not valid on all the nodes defined in your configuration file. (This is typically true when the connector is set to run on the actual DB2 nodes themselves.) You can work around this by unsetting the DB2INSTANCE environment variable according to the following tech note: http://www01.ibm.com/support/docview.wss?rs=14&context=SSZJPZ&dc=DB560&dc=DB520 &uid=swg21381234&loc=en_US&cs=UTF-8&lang=en&rss=ct14db2
Jobs that failover are slow to report connection errors

Symptoms
Jobs that failover are slow to report connection errors.

If the DB2 server that the DB2 Connector is attempting to connect to fails during the connection, the DB2 Connector may take hours to report the failure (i.e. the job is running and/or appears hung). This situation occurs when the DB2 server fails during the DB2 Connectors call to SQLConnect(). This timeout is a DB2 feature. The timeout can be adjusted by setting the ConnectTimeout CLI/ODBC configuration keyword in db2cli.ini, or by setting the DB2TCP_CLIENT_CONTIMEOUT registry variable on the DB2 client. The default timeout is to wait forever. The timeout can be set per DB alias. Please see the following links for more information: http://publib.boulder.ibm.com/infocenter/ db2luw/v9r5/topic/com.ibm.db2.luw.apdv.cli.doc/doc/r0021533.html
User defined functions terminate abnormally

Symptoms
When I run a job I get the error SQL0430N User defined function SYSPROC.ENV_GET_SYS_INFO" (specific name "ENV_GET_SYS_INFO") has abnormally terminated. SQLSTATE=38503 (CC_DB2Connection::queryServerHostName, file CC_DB2Connection.cpp, line 3,248)

The DB2 Connector executes the following queries to obtain information about the DB2 environment: v select HOST_NAME from SYSIBMADM.ENV_SYS_INFO v select IS_INST_PARTITIONABLE from SYSIBMADM.ENV_INST_INFO These queries require the following items: v SELECT or CONTROL privilege on the ENV_SYS_INFO administrative view and EXECUTE privilege on the ENV_GET_SYS_INFO table function. v SELECT or CONTROL privilege on the ENV_INST_INFO administrative view and EXECUTE privilege on the ENV_GET_INST_INFO table function. Please add these privileges to the user specified in the DB2 Connectors User name property and then re-run the job.
56
The connector cannot find any available nodes in the APT configuration file
Symptoms
When you run a job you receive the following error: The connector could not find any available nodes in the APT configuration file. This usually occurs when a node pool constraint is specified for a connector stage, but it is not defined in the APT configuration file (CC_DB2Configuration::validateEnvironment, file CC_DB2NodeNegotiation.cpp, line 787

This error occurs when a node pool constraint is specified for a DB2 Connector stage, but the node pool is missing from the APT configuration file. This typically occurs when a job using the DB2/UDB Enterprise (aka DB2EE) stage has been migrated to use the DB2 Connector stage via the Connectivity Migration Tool. The DB2/UDB Enterprise stage requires a node pool called DB2 to be defined. When jobs are migrated, the DB2 Connector stage in those jobs will have a DB2 node pool constraint specified. If you edit your APT file and remove the DB2 node pool before running your migrated jobs, you will see the error documented above. The solution is to add the DB2 node pool back into the APT file and then edit the node pool constraints on the DB2 Connector stages.
Join Stage
Join stage exits with a heap allocation error
Symptoms
The join stage exits with a heap allocation error.
Causes
The Join Stage process the data using the primary link or driving data sets first. It gets a row from this link and then retrieves all rows with matching values in the secondary link (also called reference link.) These rows are temporarily stored in memory. If there is a large number of rows in the secondary link that match the current record in the primary link then a large amount of memory is allocated to hold this result. If there is not enough heap memory the job will fail. Because of this, the amount of memory used by the Join stage depends, among other factors, on the cardinality of the reference side. The lowest the cardinality of the reference side, the larger the amount of memory it will use for each row. Note: Cardinality is here the uniqueness of data values of column. A column with all unique values is said to have high cardinality, while a column with repeated values is said to have low cardinality.

If you are working with a symmetric join such as Inner Join or a Full Outer join and you are experiencing heap memory problems you may be able to workaround this by making sure that the dataset with the lowest cardinality, is used as the primary link.
57
Changing this link order will not impact the outcome of the Join because Inner Joins and Full Outer Joins are symmetric. in other words, the sides are interchangeable and the order does not impact the outcome of the Join. If you are working with non-symmetric joins such as Left Outer and Right Outer, then changing the link order does have an impact in the output of the data and therefore you should not change the link order unless you understand the consequences of this change. For a more detailed explanation about the differences between the type of joins this stage can perform please refer to the "Parallel Job Developer Guide".
Unexpected terminated by Unix Signal 11 (SIGSEGV)

Symptoms
InfoSphere DataStage job with a join stage aborts with the following error "Unexpected terminated by Unix Signal 11 (SIGSEGV)"
Causes
If the size of the record is larger then the default setting of 20 MB the sort inserted for the join fails.

First, you must calculate the size of your largest record, in bytes. Then, double that size and set the environment variable APT_TSORT_STRESS_BLOCKSIZE to that value (in bytes.) If this environment variable does not already exist, create it first. The variable APT_TSORT_STRESS_BLOCKSIZE sets the size (in bytes) of the shared memory block used to pass data between the writer, sorter and merger processes within the sort operation. The default is set such that each sort in a job will by default try to mmap 20MB each for these in-memory functions. In the case when a single record is larger than 20mb, this step is necessary to increase the size of the block. The APT_TSORT_STRESS_BLOCKSIZE environment variable can be set at the job level (recommended) or at the project level by defining a project level default value, if necessary.
Lookup Stage
InformationServer DataStage Lookup Stage fails on Linux
Symptoms
InformationServer DataStage Lookup Stage fails on the Linux operating system.
Causes
Lookup stage creates files in the resource disk area that use the C++ MMAP function. When those files are used on an NFS or shared mount the MMAP function may fail. This is a known issue on Linux and is due to the C++ libraries not DataStage.

To allow the lookups to create the files required on a Linux NFS mount without using the MMAP function at file creation, set the environment variable APT_LUTCREATE_NO_MMAP=1 at the project level. No need to recompile the jobs as this environment variable will take effect at job runtime.
Parallel job with lookup aborts with a File too large error
Symptoms 58
A DataStage parallel job that contains a lookup fails with the following error: Lookup_107,0: Error writing table file "/d01/Ascential/DataStage/Datasets/ lookuptable.20100217.abcde": File too large
Causes
The lookup table is too large to fit in available memory.

Hash partition the lookup data and add more nodes to the parallel configuration file to break the lookup data into smaller pieces.
Error finalizing or saving a table

Symptoms
When an Information Server DataStage job attempts to write to a lookup dataset, the following error occurs: Event: test_lkup_by_key_fs,2: Error finalizing / saving table /home/data/datasets/test_lkup_by_key.fs
Causes
The error message "Error finalizing / saving table" when writing to a dataset generally occurs for one of several reasons: v The userid running the DataStage job does not have permission to write to the directory shown in error. The userid is identified in each event message in job log for the failing job. v The volume containing the output directory stated in error message does not have enough free space to write the file. v An "out of memory" error precedes the dataset write error in log in DataStage 8.1 (in DataStage 7.5.x, the only error is the space error; there is no additional memory-related error message even when that is the cause of failure writing lookup dataset). v Output of temporary datasets may occur to the directory specified by the UVTEMP setting in uvconfig file. Parallel jobs can also write output to the directory specified by tmpdir environment variable. Ensure those directories have sufficient space for the file.

Use the following steps to confirm if any of the above conditions exist for failing job. 1. Login to server machine using the userid which was running the job, then navigate to the failing directory and confirm you can create a new file. For Unix operating systems, just use the touch command (i.e. touch test.txt) to write a small test file. If this fails, then you need to adjust permissions of target directory to ensure that DataStage users can write to it. 2. If directory permissions are ok, confirm that target volume has sufficient free space. Note that datasets can often be many gigabytes in size, so some analysis of the job may be necessary to estimate how much data will be written to the dataset. 3. If the above 2 items are not the cause, obtain a "full detail" job log for the failing job and look for additional errors. One common error when writing large datasets is the following out of memory error: Event: test_lkup_by_key,2: Could not map table file "/home/data/datasets/n3/ lookuptable.20091216.xxhxipb (size 2956470664 bytes)": Cannot allocate memory
59
This memory error typically occurs when building large datasets due to the need to load the dataset first into memory. The lookup stage uses memory mapped files. You must have enough system memory available to store the entire contents of the file AND enough disk space to shadow the file in memory. In a 32-bit system, there will be 2GB limit on size. Refer to the next troubleshooting tip for more information on dealing with the "could not map table file" error:
DataStage job fails with "Could not map table file"
Information Server or DataStage job fails with "Could not map table file"
Symptoms
Information Server or DataStage job fails with the following error: Could not map table file. DataStage is failing when trying to load lookup data into memory or create lookup file.
Causes
DataStage is limited to the amount of memory that can be allocated for a Lookup.

The job must be designed to be more scalable. By default the Lookup stage uses Entire partitioning for the Lookup data. This ensures that no matter what partition the source data resides in, all the Lookup data is available to be matched. The list below provides the types of platformand system constraints that can result in the memory allocation failing for a Lookup stage. v A single Lookup stage in designer with multiple Lookup inputs will parse out to the same corresponding number of Lookup operators in the generated osh script. v Each Lookup operator has it's own physical process for each partition defined by the configuration file, though it can combine with an upstream or downstream operator. v Each physical process can only address up to 2GB of memory for 32-bit server platforms such as Windows. v Each Lookup REQUIRES contiguous memory allocation. v Each process is limited to the ulimit setting of the DataStage Environment (which can be limited by LDR_CNTRL on AIX) v Each Lookup dataset uses Entire partitioning method by default. v With Entire (auto) partitioning 1 memory segment is used and shared across all partitions for a physical server. (defined by fastname in configuration file). For MPP environments, each server will get its own copy. v When using other than Entire or Auto, each partition will use it's own copy of data in memory. (up to 2GB for 32 bit platforms or ulimit -d (data)**, which ever is most restraining) v All Lookup data for Lookups are processed to a file in scratch then loaded to a mmap (C++ function) structure in memory. This happens before ANY source data is processed for the Lookup. v Use of Hash partitioning is recommended, hash on the lookup keys for both the source and lookup data.
60
As the amount of Lookup data increases, you can add more nodes to the configuration file to further distribute data across more processes and thus more memory segments. There are a lot of factors that lead to a job that runs successfully then starts to fail. Like length of time since the server was rebooted (memory fragmentation), or the amount of data used in the Lookups may have grown to just over the limit. There may be other applications that are running concurrently using resources that used to be available to DataStage. There could have been a re-configuration of available memory by creating or moving lpars. The ** LDR_CNTRL environment setting on AIX may limit the ulimit -d (data) setting even if you have hard limit set higher.
Lookup file does not span across multiple scratch resource disks
Symptoms
The lookup file generated by a DataStage lookup does not span across multiple scratch resource disks that are defined per node when scratch fills up.
Causes
DataStage lookup files are memory mapped files, so there can only be one file per lookup process.

If the lookup file is too large, you can force the lookup to a keyed partition method like hash, which will reduce the amount of data per partition as long as you have more than one node. Note: The lookup stage uses "entire" partitioning by default, which will force all records into a single memory segment. Adding nodes to the configuration file will further reduce the amount of data per lookup process.
Sequential File Stage

Unable to view data or run a Parallel job with a Sequential File stage
In a DataStage Parallel job with a Sequential File stage, cannot view data or run the job.
Symptoms
The following error message is generated:
File archive: Trouble creating file
Parallel job with Sequential File stage plug-in. View data results in error like : IIS-DSEE-TFAR-00015 00:10:13 <main_program> File archive: Trouble creating file "/tmp/...." Run time results in errors like: Message Id: IIS-DSEE-TFAR-00015 Message: main_program: File archive: Trouble creating file "/tmp/...." Message Id: IIS-DSEE-TFPX-00002 Message: main_program: Fatal Error: Null archive.
Causes
The program is searching for relative path called "tmp" that is not present. This occurs on Windows installations when the project is not on the same drive as the engine.

61
Create a directory called "tmp" at the root of the drive where the DataStage project is located. For example, if the DataStage projects are on the D: drive, create the following directory: D:\tmp If the directory exists, check the remaining disk space on your drives to ensure that limited disk space is not the cause of the problem.
Teradata Connector Stage

Error opening message catalog: errmsg.cat
Symptoms
You receive the following error: file errmsg.cat cant be opened" from any of the Teradata Stages (Teradata plugins / TDEE / TDCC)
Causes
The issue is due to the incorrect settings of the Teradata COPLIB and COPERR environment variables.

The environment variable "COPLIB" should be set to point to the directory-path, which contains the file "clispb.dat". The environment variable "COPERR" should be set to point to the directory-path, which contains the files: "errmsg.txt" and "errmsg.cat". By default, Teradata client install-program installs all the clispb.dat, errmsg.txt, and errmsg.cat files into "/opt/teradata/client/lib" and creates links "in /usr/lib". On HP-Itanium, the files are copied to "/opt/teradata/client/lib64" with links in "/usr/lib/hpux64". Setting ENV-VARs: COPERR and COPLIB to the correct folder-path resolves the problem.
Unable to load the Teradata connector library when viewing data

Symptoms
You are unable to load the Teradata connector library when you view data with the connector. The runtime functions properly, but the view data and test connection are not working.
Causes
This problem is caused when the ASB agent running on the engine machine is unable to find the connector library or the Teradata libraries. The agent is started using the /local/IBM/InformationServer/ASBNode/bin/NodeAgents script. This script will source the /local/IBM/InformationServer/ASBNode/bin/ NodeAgents_env_DS.sh file for any DS specific environment. This script is sourcing the /local/IBM/InformationServer/Server/DSEngine/dsenv. The script will eventually invoke the Agent.sh script to start the agent.

If you modify the dsenv, restart the ASB agent.
Blocking and deadlock while stream jobs are running in parallel

Symptoms
You experience blocking or deadlocks while stream jobs are running in parallel.
Causes
When DataStage runs write operations in parallel mode through immediate-mode or through the stream-operator, row-hash collisions might occur. These collisions can cause blocking and deadlocks.
62
Any updates for a particular row must come from the same partition to avoid a deadlock. Each partition uses a separate connection to the database. If multiple updates for a row do not come from the same connection, it would cause blocking or deadlocks.

If you are using Auto partitioning, you must has the partition. Create a hash on the key that was used in the WHERE clause of the UPDATE or DELETE statement. For an INSERT statement, create a hash on the tables primary key. Hash partitioning helps to avoid blocking and deadlocks, but it's does not always resolve the problem. You might continue to encounter blocking and deadlocks if a key in partition 0 has the same row-hash as a key in partition 1. You can also reduce blocking and deadlocks by committing often. If the job has the record count property set to 0, then the commit happens at the end of the job. You can set the record count to be equal to the array-size and ensure that the commits happen at shorter intervals. You can also reduce both the array size and the commit interval to lower values. By committing often, the sessions will release their write locks and allow other sessions to write.
Sort Stage
DataStage outputs a warning message about a partition key Sort key "CO_ID" no longer exists in dataset schema
Symptoms
DataStage outputs a warning message about a partition key. main_program: Sort key "CO_ID" no longer exists in dataset schema. It will be dropped from the inserted sortmerge collector. main_program: There are no sort keys in the dataset schema. No parallel sortmerge operator will be inserted.

Add environment variable to job: APT_NO_SORT_INSERTION=1
Transformer Stage
Checking composite operator errors
Symptoms
DataStage jobs containing a transformer stage call an external C compiler during DataStage job compile correctly. Under some conditions, the compile might fail with a large number of composite operator errors that are similar to the following errors: v ##E IIS-DSEE-TBLD-00076 15:20:18(000) <main_program> Error when checking composite operator: Subprocess command failed with exit status 256. v ##W IIS-DSEE-TFTM-00012 15:20:18(002) <transform> Error when checking composite operator: The number of reject datasets "0" iless than the number of input datasets "1". v <p>##W IIS-DSEE-TBLD-00000 15:20:18(007) <main_program> Error when checking composite operator: Output from subprocess: Error 8: "/usr/include/machine/sys/_types.h", line 65 # Invalid type specifier combination in declaration: "short double". </p>
63
Causes
If a large number of errors occur when "checking composite operator", that is often an indication that the compiler that is used with DataStage is incompatible, unsupported, or uses the incorrect compiler or linker options.

Each version of Information Server DataStage expects to run with a specific set of compilers at a known, tested maintenance level. Compilers at higher or lower versions than the requested level may not function correctly. The links below list the compilers supported by Information Server releases 8.0, 8.1, 8.5 and 8.7: v Information Server 8.0 http://www-01.ibm.com/support/ docview.wss?uid=swg27008923 v Information Server 8.1 http://www-01.ibm.com/support/ docview.wss?uid=swg21315971 v Information Server 8.5 http://www-01.ibm.com/support/ docview.wss?uid=swg27016382 v Information Server 8.7 http://www-01.ibm.com/support/ docview.wss?uid=swg27021833 If the compile error only mentions a few "checking composite operator" errors but many errors for missing include files, review the following related question in the troubleshooting guide: https://www-304.ibm.com/support/ docview.wss?uid=swg21469811 Another similar compile failure can occur on operating systems which support multiple processor types if the compiler is not the correct edition for that processor type. For example, the errors shown in abstract can occur when compiling a DataStage job on an HP-UX Itanium machine if the /opt/aCC compiler is the PA-RISC edition of compiler instead of the Itanium edition of the compiler. You can use the file command on compiler executable to check the processor type file in /opt/aCC/bin/aCC.
DataStage jobs with transformer stage fail to compile on AIX due to many missing include files
Symptoms
When compiling a DataStage job containing transformer stage on AIX, the compile fails with the following errors: v ##W IIS-DSEE-TBLD-00000 17:52:00(010) <main_program> Error when checking composite operator: Output from subprocess: "/opt/IBM/InformationServer/ Server/PXEngine/include/apt_components/transformop/transformbasehdrs.h", line 41.10: 1540-0836 (S) The #include file <map> is not found. v "/opt/IBM/InformationServer/Server/PXEngine/include/apt_framework/ operator.h", line 70.10: 1540-0836 (S) The #include file <vector> is not found. v "/opt/IBM/InformationServer/Server/PXEngine/include/apt_util/custreport.h", line 36.10: 1540-0836 (S) The #include file <string> is not found.a v "/opt/IBM/InformationServer/Server/PXEngine/include/apt_util/ iostream_s.h", line 23.10: 1540-0836 (S) The #include file <iostream.h> is not found.

The following errors indicate that required include files which are normally part of the vacpp.cmp.include fileset were not installed: v The #include file <map> is not found.
64
v The #include file <vector> is not found. v The #include file <string> is not found. v The #include file <iostream.h> is not found.

If you are using the vacpp or XL C compiler on AIX, issue the following command to confirm if the required fileset was installed: lslpp -f vacpp.cmp.include If you receive the lslpp: 0504-132 Fileset vacpp.cmp.include not installed message, the AIX administrator must install the missing fileset. If the fileset is already installed, then the above command should give the location of the include files, usually /usr/vacpp/include/. Confirm that the missing include files which were listed in the compile errors do exist in that directory and have public read permission set.
Datastage error when importing job with Transformer stages into a different system
Symptoms
The following error message is received when importing job with Transformer stages into a different system:
RT_BP123.O/V0S11_TESTJOB1_Transformer.C is of unknown format Error processing file RT_123.O/V0S11_TESTJOB1_Transformer.C. file not modified Command CATALOG RT_BP123 V0S11_TESTJOB1_Transformer.C V0S11_TESTJOB1_Transformer.C LOCAL FORCE error: Program V0S11_TESTJOB1_Transformer.C was not compiled with a supported version of the BASIC compiler. It must be recompiled.
Causes
This issue is related to having the environment variable APT_TRANSFORM_OPERATOR_DEBUG set. Once this environment variable is set, the file with the "C" code is maintained in the BP.O directory. When you then export with Job Executables, the .C file will be included in the export. Once this binary section is included, you always get this error on import.

1. Remove the environment variable APT_TRANSFORM_OPERATOR_DEBUG on original system. 2. Recompile and re-export the job. 3. Import job into the production environment.
Transformer code cannot be loaded at runtime

Symptoms
You receive an error that is similar to the following error:
"V195S1_EMA_Bad_If_logic_Qubec" (for class "APT_TransformOperatorImplV195S1_EMA_Bad_If_logic_Quebec9_u") failed to load: Could not load "V195S1_EMA_Bad_If_logic_Qubec": Invalid argument.
Causes
NLS settings allow non US ASCII characters to be used in Transformer Stage Naming. However, the NFS or OS doesn't recognize non-US ASCII characters in module names.
65

Use only the character sets that are supported by the OS and NFS mounts so object modules are found and available on all compute nodes. Replacing with e in the transformer stage name resolves this issue.
Jobs with Transformer stage that use remote nodes abort with fatal error
Symptoms
Datastage Parallel job with Transformer stage using remote nodes fails with the following error:
Item #: 19 Event ID: 126 Timestamp: 2010-06-23 13:37:03 Type: Fatal User Name: t2etl01 Message: trn: Failed to distribute the shared library "/datastage/DataStage/Projects/ProjectName/RT_BP123.O/V10S0_xxxxxxxx_trn.o"to node "nodeName". [transform/transform.C:1827]
Causes
This error is a result of the projects directory either not in existence or not accessible on the remote node.

You can resolve this problem with one of the following actions: v Ensure that the Projects directory is NFS mounted to the remote node to avoid the need for copying of the transformer objects to the remote node. NOTE: This is the preferred method of resolution. v If you chose not to have the Projects directory NFS-mounted to the remote node, create a Projects directory structure on the remote node and ensure that permissions allow access to all Datastage users. Then ensure that the Datastage users have the ability to execute these commands. If you disallow rcp and require scp to be used, ensure that you have created $APT_ORCHHOME/etc/ remcp file to use scp instead of rcp. Transformer objects are copied to the remote node using either rcp or scp.
Director logs are not showing warning messages for all records dropped by the Transformer stage
Symptoms
DataStage Director logs are not showing warning messages for all records dropped by the Transformer stage.
Causes
DataStage displays only 50 warning messages in the Director log per node when records are dropped by the Transformer stage. When more than 50 records are dropped by a Transformer stage DataStage Director log displays up to only 50 warning messages per node before going into silent mode with the warning message "Warning, all other rejected records will be silent".

Increase or set the maximum reject messages to an unlimited value by altering the "Maximum log reject messages" property in the Transformer stage. 1. Right click on Transformer Stage and click on Properties 2. Open Stage properties (Leftmost Top button) 3. Go to Stage Tab -> General Tab
66
4. Set "Maximum log reject messages" to the required value or set to -1 for unlimited. If no value is specified, Maximum log reject messages defaults to 50 messages per node.
Unable to compile a transformer stage in a DataStage Parallel job

Symptoms
When trying to compile a transformer stage in a DataStage Parallel job, you receive the following error: Error compiling Parallel Transformer. The following error messages are in the compiler report:
<transform> Error when checking composite operator: The number of reject datasets "0" is less than the number of input datasets "1". <transform> Error when checking composite operator: Expected semi-colon;
Causes
This error may be caused by an "unbreakable space" being entered into the column definition of the transformer. This is typically caused by "Copying and Pasting" from a Microsoft Word or Excel document. The DataStage Designer prohibits inserting a normal space, but does not check for "unbreakable spaces".

Copy the full column name from the DataStage Designer client to a hex viewer. If you see the hex character "A0" you have an "unbreakable space" in your column name.

Remove the "unbreakable spaces" from the column names.
DataStage Compilation error for transformer with Smallint & Bigint

Symptoms
DataStage job transformer compile error after importing a DataStage job from a previous release to version 8.x that contains smallint and bigint data types. Compiling DataStage job that contains a transformer produces the following error:
Error when checking composite operator: Setting null to this non-nullable field:
Causes
DataStage 8.x has been coded to maintain backwards compatibility for how nulls are handled in the transformer stage. For previous releases that use a transformer stage and use smallint and bigint data types have produced an error message. This is caused by the transformer property called "Legacy null processing" that will be automatically checked/set when the job is imported to DataStage 8.x. DataStage jobs that are created using the transformer stage and not imported from a previous release will not have the "Legacy null processing" option set.

1. Right-click on the transformer stage and click properties or double-click to access properties. 2. In the upper left-corner, click the stage properties smart button. 3. Remove the check mark in the "Legacy null processing" field.
67
DataStage Parallel framework changes that require DataStage job modifications

Changes in the DataStage parallel framework might require job modifications when DataStage jobs are upgraded from earlier releases. IBM tries to avoid making code changes that require customers to modify their existing DataStage jobs. However, sometimes it is necessary to make such changes in order to introduce new features or to fix errors. This technote documents areas where changes have taken place in DataStage releases which may require customers to make changes to jobs that were created in earlier versions. Partitioning and Sort Insertion Information Server releases affected: 8.0.1 Fix Pack 1 and higher, 8.1 GA and higher, 8.5 GA For any stage that requires data to be hash partitioned and sorted (such as Join, Merge, Difference, Compare, ChangeCapture, ChangeApply) the parallel framework automatically inserts a hash partitioner and a sort on each input link to ensure that input data is partitioned and sorted properly. Prior to Information Server 8.0.1 Fix Pack 1, if the Preserve Partitioning flag was set on the input link, the parallel framework would not automatically insert the partitioner or sort. Not re-partitioning or re-sorting could result in unexpected results because input data might be partitioned and sorted using different keys from those specified by the stage. To avoid this problem, the parallel framework was changed in Information Server 8.0.1 Fix Pack 1 so that a hash partitioner and sort would automatically be inserted even in the presence of a (framework inserted) Preserve Partitioning flag, but not in the case of user-specified partitioning, as the latter takes higher precedence. However, the problem can still occur if the user-specified partitioning and sort keys don't match those required by the stage. Here are some example scenarios that may experience problems as a result of this change: v A Join stage has two keys "A" and "B". The user explicitly specifies a hash partitioning method and inserts a sort stage on the producing side of the primary link. The hash key is "A", and the sort keys are "A" and "B". Input data of the reference link has been partitioned or sorted upstream or in another job. The partitioning method on both the primary link and the reference link of Join is set to Auto. When the parallel framework analyzes partitioning and sort requirements at job startup time, it inserts hash and tsort stages on the reference link using the same two keys as specified by Join, and keeps what the user has defined on the primary link. This can cause data to be distributed to wrong partitions. v A Join stage has one key. The user explicitly specifies a hash partitioning method and inserts a sort stage upstream of the primary link. The hash and sort key is "A" with the case-sensitive property. Input data of the reference link is not pre-partitioned and pre-sorted. The partitioning method on both the primary link and the reference link of Join is set to Auto. When the parallel framework analyzes partitioning and sort requirements at job startup time, it inserts a hash partitioner on both links and the hash key does not have the case-sensitive property. The framework also inserts a tsort on the reference link, but not on the primary link because data has already been sorted. This can break the sort order of input data on the primary link. A sequential stage or a parallel stage running in a sequential mode will produce this warning
68
message if its producing stage is hash partitioned: "Sequential operator cannot preserve the partitioning of the parallel data set on input port 0. " v These issues can be worked around by setting the environment job parameters APT_NO_PART_INSERTION=True and APT_NO_SORT_INSERTION=True and then modifying the job to ensure that the partitioning and sorting requirements are met by explicit insertion. Default Decimal Separator Information Server releases affected: 8.0.1 Fix Pack 1 and higher, 8.1 Fix Pack 1 and higher, 8.5 GA Prior to Information Server Version 8.0.1 Fix Pack 1, the default decimal separator specified via Job Properties->Defaults was not recognized by the APT_Decimal class in the parallel framework. This caused problems for the DB2 API stage where decimals with a comma decimal point could not be processed correctly. This issue was fixed in release 8.0.1 Fix Pack 1, as APAR JR31597. The default decimal separator can be specified via a job parameter (e.g. #SEPARATOR#). However, if the job parameter does not contain any value, '#' will be taken as the decimal separator. This can cause the following error if the actual decimal separator is not '#':
Fatal Error: APT_Decimal::assignFromString: invalid format for the source string.
If you encounter this problem after upgrading, please make sure the job parameter representing the default decimal separator contains the actual decimal separator character used by input data. If changing the job parameter is not an option, you can set the environment variable APT_FORCE_DECIMAL_SEPARATOR. The value of APT_FORCE_DECIMAL_SEPARATOR overrides the value set for the "Decimal separator" property. If more than 1 character is set for this environment variable, the decimal separator will default to a dot character, . Embedded Nulls in Unicode Strings Information Server releases affected: 8.1 Fix Pack 1 and higher, 8.5 GA Prior to Information Server 8.1 Fix Pack 1, nulls embedded in Unicode strings were not treated as data, but rather they were treated as string terminators. This caused data after the first null to be truncated. The issue was fixed in Fix Pack 1, as APAR JR33408 for Unicode strings that were converted to or from UTF-8 strings. As a result of this change, you may observe a change in job behavior where a bounded-length string is padded with trailing nulls. These extra nulls can change the comparison result of two string fields, generate duplicate records, make data conversion fail, etc depending on the job logic. To solve this problem, the job should be modified to set APT_STRING_PADCHAR=0x20 and call Trim() in transformer stage if needed. Null Handling at column level Information Server releases affected: 8.1 GA and higher, 8.5 GA In parallel jobs, nullability is checked at runtime. It is possible for the user to set a column as nullable in the DataStage Designer, but at runtime the column is actually mapped as non-nullable (to match the actual database table) for example. Prior to 8.1 GA the parallel framework issued a warning for this mismatch, but the job would potentially crash with a segmentation violation as a result. The warning was changed to a fatal error in 8.1 GA as ECASE 124987 to prevent the job from aborting with
69
SIGSEGV. After this change, jobs that used to run with this warning present will now abort with a fatal error. For an example, this problem is often seen in the lookup stage. To solve the problem, modify the job to make sure the nullability of each input field of the lookup stage matches the nullability of the same output field of the stage which is upstream to the lookup. Transformer Stage: Run-Time Column Propagation (RCP) DataStage releases affected: 7.5 and higher Information Server releases affected: 8.0 GA and higher, 8.1 GA and higher, 8.5 GA When RCP is enabled at any DataStage 7.x release prior to 7.5, for an input field "A" which is mapped to an output field "B", both "A" and "B" are present in the output record. Starting with DataStage 7.5, it appears that "A" is simply being renamed to "B", so that only "B" appears in the output. In order to improve transform performance, a straight assignment like "B=A" is considered as renaming "A" to "B". Prior to the change, the straight assignment was considered as creating an additional field by copying "A" to "B". With this change in place, the user now needs to explicitly specify both "A" and "B" in the output schema in order to prevent "A" from being renamed to "B" and to create a new field "B". Refer to the following Transformer stage screen-shot that shows how to ensure that both values are propagated to the output link. Transformer Stage: Decimal Assignment Information Server releases affected: 8.0 GA and higher, 8.1 GA and higher, 8.5 GA The parallel framework used to issue a warning if the target decimal had smaller precision and scale than the source decimal. The warning was changed to an error in Information Server 8.0 GA, and as a result the input record will be dropped if a reject link is not present. This behavior change was necessary to catch the error earlier to avoid data corruption. The user should modify the job to make sure the target decimal is big enough to hold the decimal value. Alternatively, the user can add a reject link to prevent records from being dropped. Important: This change in behavior does not apply to any Linux platforms (Redhat, Suse or zLinux.) The parallel framework does not enable exception handling on Linux platforms, so the behavior remains the same as it was prior to 8.0 GA. Transformer Stage: Data Conversion Information Server releases affected: 8.0 GA and higher, 8.1 GA and higher, 8.5 GA Prior to Information Server 8.0 GA, an invalid data conversion in the transformer would result in the following behavior: v A warning message is issued to the DataStage job log v A default value was assigned to the destination field according to its data type v The record was written to the output link v If a reject link was present, nothing was sent to the reject link
70
The behavior has changed in the 8.0 GA release when a reject link is present. Instead of the record being written to the output link with a default value, it will be written to the reject link instead. This may lead to data loss if the job is expecting those records to be passed through to the output. To get to the original behavior of passing the records through, the job would need to be modified to remove the reject link. An environment variable was added along with this change, to add the capability of aborting the job. To use this option, ensure that there is no reject link and then set the environment variable APT_TRANSFORM_ABORT_ON_CONVERSION_ERROR=True. The job will now abort from an invalid data conversion scenario. Surrogate Key Generator Information Server releases affected: 8.0.1 Fix Pack1 and higher, 8.1 Fix Pack 1 and higher, 8.5 GA The surrogate key stage reserves keys in blocks. Prior to Information Server 8.1 Fix Pack 1, if only one record (suppose it was value 5, because an initial value was set) was generated, the surrogate key generator would use values beginning with 6 and greater as available keys for incoming records. The surrogate key generator was changed in 8.1 Fix Pack 1, as APAR JR29667. With this change, DataStage will now consider values 1 to 4 as well as any value 6 and greater as available keys. This behavior change may cause the SCD stage to produce incorrect results in the database or generate the wrong surrogate keys for the new records of the dimension. If required, the job can be modified to revert back to the old behavior (start generating keys from the highest key value last used) by setting option 'Generate Key From Last Highest Value' to Yes. This approach however may result in gaps in used keys. It is recommended that the user understand how the key file is initialized and decide if it is necessary to modify job based on business logic. Sequential File Format on Windows Information Server releases affected: (Windows Platforms) 8.1 GA and higher, 8.5 GA Prior to Information Server 8.1 GA, the default format for sequential files was Unix format which requires a newline character as the delimiter of a record. The default format for the Sequential File stage was changed to Windows format in the Information Server 8.1 GA release. Due to this change, data files previously created with UNIX format will not import properly. To solve this issue, set the environment variable APT_USE_CRLF=FALSE at the DataStage project level or within the system environment variables (requires a Windows reboot).
Troubleshooting for specific operating systems

Some issues are specific to the operating system on which DataStage is operating.
Troubleshooting slow jobs that use data sets in cluster environments

DataStage jobs can sometimes become very slow over a period, but you can add environment variables to correct this problem.
Symptoms
DataStage Jobs that use data sets become very slow over a period.
71
Causes
Datasets use a sync() call, but according to solaris it should be making fsync() call.
Environment
This problem occurs only in cluster envornments. The same jobs that are experiencing bottlenecks run normally in a non-cluster environment.

Check to see if your datasets are using a sync() call when they should use a fsync() call.

Add the APT_DATASET_FLUSH_NOSYNC AND APT_DATASET_FLUSH_NOFSYNC environment variables to disable sync() and fsync() system calls.
Table 3. APAR Information APAR number Reported component name Reported component Reported release Status PE HIPER Special Attention Submitted date Closed date Last modified date Table 4. Fix information Fixed component name Fixed component ID R753 PSN WIS DATASTAGE 5724Q36DS UP JR37466 WIS DATASTAGE ID5724Q36DS 753 CLOSED PER NoPE NoHIPER NoSpecatt 2010-08-20 2010-09-13 2011-05-27
Heap allocation errors with DataStage Parallel Jobs on the AIX platform
Symptoms
DataStage parallel jobs ends with the following error message:
APT_BadAlloc: Heap allocation failed.
Causes
AIX divides memory address space into segments. If the DataStage jobs need to allocate more memory than exists in the number of available segments, the job ends with a heap allocation or failure to allocate memory.

Verify that ulimit is not restricting the memory allocation of the process. Check the ulimit from within DataStage itself because the value is not the same as the value in the interactive shell. To capture ulimit for all nodes, use the following process:
72
1. Create a new Parallel job. 2. Add an External Source stage (under File on the palette) connected to a peek stage (under Development/Debug on the palette). 3. Access the advanced properties of the External Source stage, and make sure its running in Parallel mode. 4. In the External source stage, enter 'ulimit -a; ulimit -aH' without the quotations in the Source Program property and a column as VarChar with length of 255. Use a configuration file that includes at least one node for each fast name (host) in your cluster or GRID. 5. Compile the job, run it, and look in the Director log. The director contains the soft limits and the hard limits for each node in the configuration file. If the hard limit for data is too low you need to contact your AIX administrator to increase that value. This value can be set in the file /etc/security/limits. 6. After you increase the hard limit settings, you can set the ulimit settings for the user in the ds.rc file located under $DSHOME/sample. You can add a line like this ulimit -d unlimited at the beginning of the file, after the umask settings. The ds.rc file is owned by root, and writable only to root, so your system administrator must change the file permissions. For security reasons, DO NOT change the owner or grant write permission to any non-root user. Important: do not set the number of file descriptors (ulimit -n) to unlimited. That setting causes a problem with DataStage. Ensure that the value for this limit is set sufficiently high; ulimit -n 100000 is a safe value in nearly all situations. DataStage Version 7.5.x The DataStage software is a 32-bit application for all 7.5.x releases, even when installed onto an AIX server with a 64-bit kernel. To obtain the maximum amount of process address space for your parallel job processes, set the LDR_CNTRL variable with the following value: MAXDATA=0x80000000@DSA as the default value at the project level (for all jobs in a project) or within specific jobs. Do not add LDR_CNTRL to your dsenv file. That setting might interfere with the memory model used by the Server Engine. In DataStage Version 8.0.x, the DataStage software is a 32-bit application for all 8.0.x releases, even when installed onto an AIX server with a 64-bit kernel. Starting with the Information Server 8.0 GA release, DataStage now starts Java components to integrate with the services tier. For these Java components to function properly, the LDR_CNTRL=MAXDATA=0x60000000@USERREGS environment variable is added to the dsenv file. It is important that this variable is not removed or modified to ensure the proper operation of the Java components. For parallel jobs that require more than 1.5gb of memory per process, the LDR_CNTRL variable can be set to a larger value. This variable must be given a default value at the project level if you want it to take effect for all jobs in the project, or by leaving the project default value blank and assigning a value to specific jobs only. As stated previously, DO NOT alter LDR_CNTRL within the dsenv file. To obtain the maximum amount of process address space for your job processes, set the LDR_CNTRL variable with the following value: MAXDATA=0x80000000@DSA in your job or as project default. DataStage Version 8.1.x Starting with the 8.1 GA release, DataStage is now a 64-bit application and requires a 64-bit AIX kernel. The osh item is compiled with the MAXDATA=x80000000 property, so the amount of memory address space available to the parallel job process is limited to 2 GB in the default configuration. The improvement of being a 64-bit application allows for the allocation of more segments and a larger private memory address space. For situations where large
73
amounts of heap memory are required for each process, set LDR_CNTRL to the value MAXDATA=0x0000001000000000. This value allocates up to 64 Gb for private data for each process. Set this large value at the job level rather than at the project level to avoid large consumption of memory by jobs where you did not intentionally want this behavior. In DataStage Version 8.5, DataStage is a 64-bit application and will require a 64-bit AIX kernel just like release 8.1. A significant improvement at this release, is that the MAXDATA parameter has been removed from the executable. With this change, DataStage is now able to access all of the available memory address segments in the default configuration. Any jobs or projects that had LDR_CNTRL specified with the MAXDATA parameter should be modified to remove this parameter after you upgrade to 8.5 so that you are able to access all of the segments. Important: the LDR_CNTRL=USERREGS environment variable MUST NOT be removed from the dsenv; this is required for proper operation of java components loaded by DataStage processes. The USERREGS property will not impact the memory utilization of DataStage jobs.
Tuning engine parameters Using tunable parameters in the UVCONFIG file

The UVCONFIG file contains several parameters that you can configure to improve performance and avoid troubleshooting. The most commonly used parameters in the UVCONFIG file are the following parameters: MFILES This parameter defines the size of the server engine (DSEngine) rotating file pool. This is a per process pool for files such as sequential files that are opened by the DataStage server runtime. It does not include files opened directly at OS level by the parallel engine (PXEngine running osh). The server engine will logically open and close files at the DataStage application level and physically close them at the OS level when the need arises. Increase this value if DataStage jobs use a lot of files. Generally, a value of around 250 is suitable. If the value is set too low, then performance issues may occur, as the server engine will make more calls to open and close at the physical OS level in order to map the logical pool to the physical pool. Note: The OS parameter of nofiles must be set higher than MFILES. Ideally, it would be recommended that nofiles be at least 512. This will allow the DataStage process to open up to 512 - (MFILES + 8 ) files. On most UNIX systems, the proc file system can be used to monitor the file handles opened by a given process; for example:
ps -ef|grep dsrpcd root 23978 1 0 Jul08 ? 00:00:00 /opt/ds753/Ascential/DataStage/DSEngine/bin/accdsrpcd ls -l /proc/23978/fd lrwx-----1 root dstage 64 Sep 25 08:24 0 -> /dev/pts/1 (deleted)
74
l-wx-----l-wx-----lrwx------
1 root dstage 64 Sep 25 08:24 1 -> /dev/null 1 root dstage 64 Sep 25 08:24 2 -> /dev/null 1 root dstage 64 Sep 25 08:24 3 -> socket:[12928306]
The dsrpcd process (23978) has four files open. T30FILE This parameter determines the maximum number of dynamic hash files that can be opened system-wide on the DataStage system. If this value is too low, expect to find an error message similar to 'T30FILE table full'. The following engine command, executed from $DSHOME, shows the number of dynamic files in use:
echo "`bin/smat -d|wc -l` - 3"|bc
Use this command to assist with tuning the T30FILE parameter. See the following technote: https://www-304.ibm.com/support/ docview.wss?uid=swg21390117 Every running DataStage job requires at least 3 slots in this table. (RT_CONFIG, RT_LOG, RT_STATUS). Note, however, that multi-instance jobs share slots for these files, because although each job run instance creates a separate file handle, this just increments a usage counter in the table if the file is already open to another instance. Note that on AIX the T30FILE value should not be set higher than the system setting ulimit -n. GLTABSZ This parameter defines the size of a row in the group lock table. Tune this value if the number of group locks in a given slot is getting close to the value defined. Use the LIST.READU EVERY command from the server engine shell to assist with monitoring this value. LIST.READU lists the active file and record locks; the EVERY keyword lists the active group locks in addition. For example, with a Designer client and a Director client both logged in to a project named dstage0:
Active Group Locks: Record Group Group Group Device.... Inode..... Netnode Userno Lmode G-Address. Locks ...RD ...SH ...EX 838222719 2039334646 0 5620 62 IN 800 1 0 0 0 Active Record Locks: Device.... Inode..... Netnode Userno Lmode 838222719 2039334646 0 64332 62 RL 838222719 2039334646 0 62412 62 RL PID Item-ID..................... 1204 dstage0&!DS.ADMIN!& 3124 dstage0&!DS.ADMIN!&
Device A number that identifies the logical partition of the disk where the file system is located Inode A number that identifies the file that is being accessed Netnode A number that identifies the host from which the lock originated. 0 indicates a lock on the local machine, which will usually be the case for DataStage. If other than 0, then on Unix it is the last part of the TCP/IP host number specified in the /etc/hosts file; on
75
Windows it is either the last part of the TCP/IP host number or the LAN Manager node name, depending on the network transport used by the connection. Userno The phantom process that set the lock Pid A number that identifies the controlling process
Item-ID The record ID of the locked record Lmode The number assigned to the lock, and a code that describes its use G-Address Logical disk address of group, or its offset in bytes from the start of the file, in hex Record Locks The number of locked records in the group Group RD Number of readers in the group Group SH Number of shared group locks Group EX Number of exclusive group locks When the report describes record locks, it contains the following Lmode codes: FS, IX, CR Shared file locks FX, XU, XR Exclusive file locks When the report describes group locks, it contains the following Lmode codes: EX SH RD WR IN Exclusive lock Shared lock Read lock Write lock System information lock
When the report describes record locks, it contains the following Lmode codes: RL RU RLTABSZ This parameter defines the size of a row in the record lock table. From a DataStage job point of view, this value affects the number of concurrent DataStage jobs that can be executed, and the number of DataStage Clients that can connect. Shared record lock Update record lock
76
Use the LIST.READU command from the DSEngine shell to monitor the number of record locks in a given slot. With one Director client logged in to a project named dstage0, and two instances of a job in that project that are running, the active record locks are similar to the following example:
Active Record Locks: Device.... Inode..... Netnode Userno Lmode 838222719 2039334646 0 64332 62 RL 838222719 2039334646 0 62128 62 RL 838222719 2039334646 0 65252 62 RL 304877956 328255620 0 62128 62 RL 304877956 328255620 0 65252 62 RL Pid 1204 3408 284 3408 284 Item-ID............. dstage0&!DS.ADMIN!& dstage0&!DS.ADMIN!& dstage0&!DS.ADMIN!& RT_CONFIG456 RT_CONFIG456
In the above report, Item-ID=RT_CONFIG456 identifies that the running job is an instance of job number 456, whose compiled job file is locked while the instance is running so that, for example, it cannot be re-compiled in that time. A jobs number within its project can be seen in the Director job status view, the detail dialog, for a particular job. The unnamed column in-between UserNo and Lmode relates to a row number within the Record Lock table. Each row can hold RLTABSZ locks. In the above example, 3 slots out of 75 (Default value for RLTABSZ) have been used for row 62. When the number of entries for a given row gets close to the RLTABSZ value, it is time to consider re-tuning the system. Jobs can fail to start, or generate -14 errors, if RLTABSZ is being reached. DataStage Clients may see an error message similar to 'DataStage Project locked by Administrator' when attempting to connect. Note that the error message can be misleading - it means in this case that a lock cannot be acquired because the lock table is full, and not because another user already has the lock. MAXRLOCK This parameter must always be set to the value of RLTABSZ 1. Each DSD.RUN process takes a record lock on a key name <project>&!DS.ADMIN!& of the UV.ACCOUNT file in $DSHOME (as seen in the examples above). Each DataStage client connection (for example, Designer, Director, Administrator, dsjob command) takes this record lock as well. This is the mechanism by which DataStage determines whether operations such as project deletion are safe, operations cannot proceed while a project lock is held by any process. MAXRLOCK needs to be set to accommodate the maximum # of jobs and sequences plus client connections that will be used at any given time. And RLTABSZ needs to be set to MAXRLOCK + 1. Keep in mind that changing RLTABSZ greatly increases the amount of memory needed by the disk shared memory segment. Customer Support has reported in the past that using settings of 130/130/129 (for RLTABSZ/GLTABSZ/MAXRLOCK, respectively) work successfully on most customer installations. There have been reports of high-end customers using settings of 300/300/299, so this is environment specific. If sequencers or multi-instance jobs are used, start with the recommended settings of 130/130/129, and increase to 300/300/299 if necessary. Prior to DataStage v8.5 the following settings were pre-defined: v MFILES = 150
77
v v v v
T30FILE = 200 GLTABSZ = 75 RLTABSZ = 75 MAXRLOCK = 74 (75-1)
DataStage v8.5 has the following settings pre-defined: v MFILES = 150 v T30FILE = 512 v GLTABSZ = 75 v RLTABSZ = 150 v MAXRLOCK = 149 (150-1) These are the lowest suggested values to accommodate all system configurations, so tuning of these values is often necessary. DMEMOFF, PMEMOFF, CMEMOFF, NMEMOFF These are the shared memory address offset values for each of the four DataStage shared memory segments (Disk, Printer, Catalog, NLS). Depending upon the platform, PMEMOFF, CMEMOFF & NMEMOFF will need to be increased to allow for a large disk shared memory to be used. Where these values are set to 0x0 (on AIX for example), the OS takes care of managing these offsets. Otherwise, the PMEMOFF - DMEMOFF = largest disk shared memory segment size. Additionally, on Solaris for example, these values will be increased to allow for a greater heap size for the running DataStage job. Note that when running the shmtest utility, great care must be taken with interpreting its output. The utility tests the availability of memory that it can allocate at the time it runs, and this will be affected both by the current uvconfig settings, how much shared memory is already in use, and other activity on the machine at the time.
Using tunable parameters in the UVCONFIG file

The UVCONFIG file contains several parameters that you can configure to improve performance and avoid troubleshooting. The most commonly used parameters in the UVCONFIG file are the following parameters: MFILES This parameter defines the size of the server engine (DSEngine) rotating file pool. This is a per process pool for files such as sequential files that are opened by the DataStage server runtime. It does not include files opened directly at OS level by the parallel engine (PXEngine running osh). The server engine will logically open and close files at the DataStage application level and physically close them at the OS level when the need arises. Increase this value if DataStage jobs use a lot of files. Generally, a value of around 250 is suitable. If the value is set too low, then performance issues may occur, as the server engine will make more calls to open and close at the physical OS level in order to map the logical pool to the physical pool.
78
Note: The OS parameter of nofiles must be set higher than MFILES. Ideally, it would be recommended that nofiles be at least 512. This will allow the DataStage process to open up to 512 - (MFILES + 8 ) files. On most UNIX systems, the proc file system can be used to monitor the file handles opened by a given process; for example:
ps -ef|grep dsrpcd root 23978 1 0 Jul08 ? 00:00:00 /opt/ds753/Ascential/DataStage/DSEngine/bin/accdsrpcd ls -l /proc/23978/fd lrwx-----l-wx-----l-wx-----lrwx-----1 1 1 1 root root root root dstage dstage dstage dstage 64 64 64 64 Sep Sep Sep Sep 25 25 25 25 08:24 08:24 08:24 08:24 0 1 2 3 -> -> -> -> /dev/pts/1 (deleted) /dev/null /dev/null socket:[12928306]
The dsrpcd process (23978) has four files open. T30FILE This parameter determines the maximum number of dynamic hash files that can be opened system-wide on the DataStage system. If this value is too low, expect to find an error message similar to 'T30FILE table full'. The following engine command, executed from $DSHOME, shows the number of dynamic files in use:
echo "`bin/smat -d|wc -l` - 3"|bc
Use this command to assist with tuning the T30FILE parameter. See the following technote: https://www-304.ibm.com/support/ docview.wss?uid=swg21390117 Every running DataStage job requires at least 3 slots in this table. (RT_CONFIG, RT_LOG, RT_STATUS). Note, however, that multi-instance jobs share slots for these files, because although each job run instance creates a separate file handle, this just increments a usage counter in the table if the file is already open to another instance. Note that on AIX the T30FILE value should not be set higher than the system setting ulimit -n. GLTABSZ This parameter defines the size of a row in the group lock table. Tune this value if the number of group locks in a given slot is getting close to the value defined. Use the LIST.READU EVERY command from the server engine shell to assist with monitoring this value. LIST.READU lists the active file and record locks; the EVERY keyword lists the active group locks in addition. For example, with a Designer client and a Director client both logged in to a project named dstage0:
Active Group Locks: Record Group Group Group Device.... Inode..... Netnode Userno Lmode G-Address. Locks ...RD ...SH ...EX 838222719 2039334646 0 5620 62 IN 800 1 0 0 0 Active Record Locks: Device.... Inode..... Netnode Userno Lmode 838222719 2039334646 0 64332 62 RL 838222719 2039334646 0 62412 62 RL PID Item-ID..................... 1204 dstage0&!DS.ADMIN!& 3124 dstage0&!DS.ADMIN!&
79
Device A number that identifies the logical partition of the disk where the file system is located Inode A number that identifies the file that is being accessed Netnode A number that identifies the host from which the lock originated. 0 indicates a lock on the local machine, which will usually be the case for DataStage. If other than 0, then on Unix it is the last part of the TCP/IP host number specified in the /etc/hosts file; on Windows it is either the last part of the TCP/IP host number or the LAN Manager node name, depending on the network transport used by the connection. Userno The phantom process that set the lock Pid A number that identifies the controlling process
Item-ID The record ID of the locked record Lmode The number assigned to the lock, and a code that describes its use G-Address Logical disk address of group, or its offset in bytes from the start of the file, in hex Record Locks The number of locked records in the group Group RD Number of readers in the group Group SH Number of shared group locks Group EX Number of exclusive group locks When the report describes record locks, it contains the following Lmode codes: FS, IX, CR Shared file locks FX, XU, XR Exclusive file locks When the report describes group locks, it contains the following Lmode codes: EX SH RD WR IN Exclusive lock Shared lock Read lock Write lock System information lock
When the report describes record locks, it contains the following Lmode codes:
80
RL RU RLTABSZ
Shared record lock Update record lock
This parameter defines the size of a row in the record lock table. From a DataStage job point of view, this value affects the number of concurrent DataStage jobs that can be executed, and the number of DataStage Clients that can connect. Use the LIST.READU command from the DSEngine shell to monitor the number of record locks in a given slot. With one Director client logged in to a project named dstage0, and two instances of a job in that project that are running, the active record locks are similar to the following example:
Active Record Locks: Device.... Inode..... Netnode Userno Lmode 838222719 2039334646 0 64332 62 RL 838222719 2039334646 0 62128 62 RL 838222719 2039334646 0 65252 62 RL 304877956 328255620 0 62128 62 RL 304877956 328255620 0 65252 62 RL Pid 1204 3408 284 3408 284 Item-ID............. dstage0&!DS.ADMIN!& dstage0&!DS.ADMIN!& dstage0&!DS.ADMIN!& RT_CONFIG456 RT_CONFIG456
In the above report, Item-ID=RT_CONFIG456 identifies that the running job is an instance of job number 456, whose compiled job file is locked while the instance is running so that, for example, it cannot be re-compiled in that time. A jobs number within its project can be seen in the Director job status view, the detail dialog, for a particular job. The unnamed column in-between UserNo and Lmode relates to a row number within the Record Lock table. Each row can hold RLTABSZ locks. In the above example, 3 slots out of 75 (Default value for RLTABSZ) have been used for row 62. When the number of entries for a given row gets close to the RLTABSZ value, it is time to consider re-tuning the system. Jobs can fail to start, or generate -14 errors, if RLTABSZ is being reached. DataStage Clients may see an error message similar to 'DataStage Project locked by Administrator' when attempting to connect. Note that the error message can be misleading - it means in this case that a lock cannot be acquired because the lock table is full, and not because another user already has the lock. MAXRLOCK This parameter must always be set to the value of RLTABSZ 1. Each DSD.RUN process takes a record lock on a key name <project>&!DS.ADMIN!& of the UV.ACCOUNT file in $DSHOME (as seen in the examples above). Each DataStage client connection (for example, Designer, Director, Administrator, dsjob command) takes this record lock as well. This is the mechanism by which DataStage determines whether operations such as project deletion are safe, operations cannot proceed while a project lock is held by any process. MAXRLOCK needs to be set to accommodate the maximum # of jobs and sequences plus client connections that will be used at any given time. And RLTABSZ needs to be set to MAXRLOCK + 1. Keep in mind that changing RLTABSZ greatly increases the amount of memory needed by the disk shared memory segment.
81
Customer Support has reported in the past that using settings of 130/130/129 (for RLTABSZ/GLTABSZ/MAXRLOCK, respectively) work successfully on most customer installations. There have been reports of high-end customers using settings of 300/300/299, so this is environment specific. If sequencers or multi-instance jobs are used, start with the recommended settings of 130/130/129, and increase to 300/300/299 if necessary. Prior to DataStage v8.5 the following settings were pre-defined: v MFILES = 150 v T30FILE = 200 v GLTABSZ = 75 v RLTABSZ = 75 v MAXRLOCK = 74 (75-1) DataStage v8.5 has the following settings pre-defined: v MFILES = 150 v T30FILE = 512 v GLTABSZ = 75 v RLTABSZ = 150 v MAXRLOCK = 149 (150-1) These are the lowest suggested values to accommodate all system configurations, so tuning of these values is often necessary. DMEMOFF, PMEMOFF, CMEMOFF, NMEMOFF These are the shared memory address offset values for each of the four DataStage shared memory segments (Disk, Printer, Catalog, NLS). Depending upon the platform, PMEMOFF, CMEMOFF & NMEMOFF will need to be increased to allow for a large disk shared memory to be used. Where these values are set to 0x0 (on AIX for example), the OS takes care of managing these offsets. Otherwise, the PMEMOFF - DMEMOFF = largest disk shared memory segment size. Additionally, on Solaris for example, these values will be increased to allow for a greater heap size for the running DataStage job. Note that when running the shmtest utility, great care must be taken with interpreting its output. The utility tests the availability of memory that it can allocate at the time it runs, and this will be affected both by the current uvconfig settings, how much shared memory is already in use, and other activity on the machine at the time.
Enabling tracing for DataStage parallel jobs

You can enable tracing for DataStage parallel jobs to help you troubleshoot problems.
Procedure
1. Enable the following administrator project level parameters for the project or the job and set them to true v APT_DUMP_SCORE v APT_PM_SHOWRSH v APT_PM_SHOW_PIDS
82
v v v v v
APT_RECORD_COUNTS APT_SHOW_COMPONENT_CALLS APT_STARTUP_STATUS OSH_DUMP OSH_ECHO
v OSH_EXPLAIN APT_DISABLE_COMBINATION 2. Add a new user defined environment variable called DS_PXDEBUG in DS Administrator. The value must be undefined for the project. Leave that value blank or set itto 0 at the project level. Add this new environment variable to the job level and set the value to 1. The DS_PXDEBUG variable causes the job to report debugging information.
Results
Debug information is collected under a new project-level directory called Debugging. Subdirectories are created on a per-job basis, and are named after the job. Multi-instance jobs run with a non-empty invocation ID, and the directory is named with the job name and the invocation ID.
What to do next
Execute the job. Send an export of the job with the detailed job log and the project path/Debugging/Jobname folder to support.
83
84
Contacting IBM
You can contact IBM for customer support, software services, product information, and general information. You also can provide feedback to IBM about products and documentation. The following table lists resources for customer support, software services, training, and product and solutions information.
Table 5. IBM resources Resource IBM Support Portal Description and location You can customize support information by choosing the products and the topics that interest you at www.ibm.com/support/ entry/portal/Software/ Information_Management/ InfoSphere_Information_Server You can find information about software, IT, and business consulting services, on the solutions site at www.ibm.com/ businesssolutions/ You can manage links to IBM Web sites and information that meet your specific technical support needs by creating an account on the My IBM site at www.ibm.com/account/ You can learn about technical training and education services designed for individuals, companies, and public organizations to acquire, maintain, and optimize their IT skills at http://www.ibm.com/software/swtraining/ You can contact an IBM representative to learn about solutions at www.ibm.com/connect/ibm/us/en/
Software services
My IBM
Training and certification
IBM representatives
Providing feedback
The following table describes how to provide feedback to IBM about products and product documentation.
Table 6. Providing feedback to IBM Type of feedback Product feedback Action You can provide general product feedback through the Consumability Survey at www.ibm.com/software/data/info/ consumability-survey
85
Table 6. Providing feedback to IBM (continued) Type of feedback Documentation feedback Action To comment on the information center, click the Feedback link on the top right side of any topic in the information center. You can also send comments about PDF file books, the information center, or any other documentation in the following ways: v Online reader comment form: www.ibm.com/software/data/rcf/ v E-mail: comments@us.ibm.com
86
Accessing product documentation

Documentation is provided in a variety of locations and formats, including in help that is opened directly from the product client interfaces, in a suite-wide information center, and in PDF file books. The information center is installed as a common service with IBM InfoSphere Information Server. The information center contains help for most of the product interfaces, as well as complete documentation for all the product modules in the suite. You can open the information center from the installed product or from a Web browser.
Accessing the information center

You can use the following methods to open the installed information center. v Click the Help link in the upper right of the client interface. Note: From IBM InfoSphere FastTrack and IBM InfoSphere Information Server Manager, the main Help item opens a local help system. Choose Help > Open Info Center to open the full suite information center. v Press the F1 key. The F1 key typically opens the topic that describes the current context of the client interface. Note: The F1 key does not work in Web clients. v Use a Web browser to access the installed information center even when you are not logged in to the product. Enter the following address in a Web browser: http://host_name:port_number/infocenter/topic/ com.ibm.swg.im.iis.productization.iisinfsv.home.doc/ic-homepage.html. The host_name is the name of the services tier computer where the information center is installed, and port_number is the port number for InfoSphere Information Server. The default port number is 9080. For example, on a Microsoft Windows Server computer named iisdocs2, the Web address is in the following format: http://iisdocs2:9080/infocenter/topic/ com.ibm.swg.im.iis.productization.iisinfsv.nav.doc/dochome/ iisinfsrv_home.html. A subset of the information center is also available on the IBM Web site and periodically refreshed at http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r7/ index.jsp.
Obtaining PDF and hardcopy documentation

v A subset of the PDF file books are available through the InfoSphere Information Server software installer and the distribution media. The other PDF file books are available online and can be accessed from this support document: https://www.ibm.com/support/docview.wss?uid=swg27008803&wv=1. v You can also order IBM publications in hardcopy format online or through your local IBM representative. To order publications online, go to the IBM Publications Center at http://www.ibm.com/e-business/linkweb/publications/ servlet/pbi.wss.
87
Providing feedback about the documentation

You can send your comments about documentation in the following ways: v Online reader comment form: www.ibm.com/software/data/rcf/ v E-mail: comments@us.ibm.com
88
Product accessibility
You can get information about the accessibility status of IBM products. The IBM InfoSphere Information Server product modules and user interfaces are not fully accessible. The installation program installs the following product modules and components: v IBM InfoSphere Business Glossary v IBM InfoSphere Business Glossary Anywhere v IBM InfoSphere DataStage v IBM InfoSphere FastTrack v v v v IBM IBM IBM IBM InfoSphere InfoSphere InfoSphere InfoSphere Information Analyzer Information Services Director Metadata Workbench QualityStage
For information about the accessibility status of IBM products, see the IBM product accessibility information at http://www.ibm.com/able/product_accessibility/ index.html.
Accessible documentation
Accessible documentation for InfoSphere Information Server products is provided in an information center. The information center presents the documentation in XHTML 1.0 format, which is viewable in most Web browsers. XHTML allows you to set display preferences in your browser. It also allows you to use screen readers and other assistive technologies to access the documentation.
IBM and accessibility

See the IBM Human Ability and Accessibility Center for more information about the commitment that IBM has to accessibility.
89
90
Notices and trademarks

This information was developed for products and services offered in the U.S.A.
Notices
IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. For license inquiries regarding double-byte character set (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to: Intellectual Property Licensing Legal and Intellectual Property Law IBM Japan Ltd. 1623-14, Shimotsuruma, Yamato-shi Kanagawa 242-8502 Japan The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web
91
sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact: IBM Corporation J46A/G4 555 Bailey Avenue San Jose, CA 95141-1003 U.S.A. Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee. The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equivalent agreement between us. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. This information is for planning purposes only. The information herein is subject to change before the products described become available. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to
92
IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall not be liable for any damages arising out of your use of the sample programs. Each copy or any portion of these sample programs or any derivative work, must include a copyright notice as follows: (your company name) (year). Portions of this code are derived from IBM Corp. Sample Programs. Copyright IBM Corp. _enter the year or years_. All rights reserved. If you are viewing this information softcopy, the photographs and color illustrations may not appear.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at www.ibm.com/legal/copytrade.shtml. The following terms are trademarks or registered trademarks of other companies: Adobe is a registered trademark of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office UNIX is a registered trademark of The Open Group in the United States and other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom.
Notices and trademarks
93
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. The United States Postal Service owns the following trademarks: CASS, CASS Certified, DPV, LACSLink, ZIP, ZIP + 4, ZIP Code, Post Office, Postal Service, USPS and United States Postal Service. IBM Corporation is a non-exclusive DPV and LACSLink licensee of the United States Postal Service. Other company, product or service names may be trademarks or service marks of others.
94
Index A
at command 9 authentication errors 5
S
schedule log dsr_sched.log 6 viewing 6 scheduled jobs 5 AIX servers 9 checking user rights 7 localizing days of week 7 testing user name and password UNIX and Linux servers 9 scheduling Windows servers 6 software services contacting 85 support customer 85
C
cron command 9 customer support contacting 85
D
Designer client handling exceptions 14 viewing error reports 15 viewing log files 15
F
Failed to authenticate 1, 2, 5 failure to authenticate user 5 failure to connect 1, 2
T
trademarks list of 91
U
UNIX and Linux configuration problems 12, 13
J
job termination problems 10
L
legal notices 91
V
viewing scheduled jobs 9
O
ODBC connections checking symbolic links 12 shared library environment 11 UNIX and Linux systems 10 ODBC drivers UNIX and Linux systems 10
W
WebSphere application server fails to start 3 on AIX and Linux 3 1, 2, 5
P
product accessibility accessibility 89 product documentation accessing 87
R
running out of file units 12 running out of memory on AIX computers 13
95
96
Printed in USA
SC19-3804-00

Troubleshooting Guide

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

Troubleshooting Guide

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Troubleshooting Guide

Uploaded by

Copyright:

Available Formats

IBM InfoSphere DataStage

IBM InfoSphere DataStage

Contacting IBM . . . . . . . . . . . 85 Accessing product documentation. . . 87 Product accessibility . . . . . . . . 89

Notices and trademarks . . . . . . . 91 Index . . . . . . . . . . . . . . . 95

Copyright IBM Corp. 2008, 2012

Troubleshooting InfoSphere DataStage

Troubleshooting problems when starting an InfoSphere DataStage and QualityStage client

Failure to connect to services tier: invalid host name

Diagnosing the problem

Copyright IBM Corp. 2008, 2012

Resolving the problem

Failure to connect to services tier: invalid port

Diagnosing the problem

Resolving the problem

IBM WebSphere Application Server fails to start: AIX and Linux

Diagnosing the problem

Resolving the problem

Change the text to match this example:

4. Save the modified files and restart your system.

Cannot authenticate user

Diagnosing the problem

Resolving the problem

Troubleshooting scheduled jobs

Resolving scheduling problems on Windows engine tier hosts

Viewing the schedule log

Diagnosing the problem

Testing user name and password

Diagnosing the problem

Checking user rights

Resolving the problem

Resolving problems using the scheduler on non-English language systems

Resolving the problem

Resolving scheduling problems on UNIX and Linux servers

Viewing scheduled jobs

Diagnosing the problem

Dealing with scheduled jobs not running

Resolving the problem

Scheduled jobs do not run on an AIX server

Resolving the problem

Troubleshooting InfoSphere DataStage

Resolving job termination problems

Resolving the problem

4. Click Execute to run the command and clear the file.

Resolving problems with database stages on 64-bit systems

Resolving ODBC connection problems on UNIX and Linux systems

Testing ODBC driver connectivity

Diagnosing the problem

2. Start the engine shell:

3. In the engine shell, log to the project:

4. Get a list of available DSNs by typing:

5. Test the required connection by typing:

Checking the shared library environment

Diagnosing the problem

Resolving the problem

Troubleshooting InfoSphere DataStage

Checking symbolic links

Resolving the problem

Resolving configuration problems on UNIX systems