Designer Client Guide
Designer Client Guide
Version 8 Release 1
LC18-9893-01
Version 8 Release 1
LC18-9893-01
Note Before using this information and the product that it supports, read the information in Notices on page 245.
Ascential Software Corporation 1997, 2005. Copyright International Business Machines Corporation 2006, 2008. All rights reserved. US Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Chapter 1. Your first job . . . . . . . . 1
Setting up the exercise . . . . . . Starting the Designer client . . . . Lesson checkpoint . . . . . . Setting up your project . . . . . . Lesson checkpoint . . . . . . Creating a new job . . . . . . . Lesson checkpoint . . . . . . Adding stages and links to your job . Adding stages . . . . . . . . Adding links . . . . . . . . Renaming stages and links . . . Lesson checkpoint . . . . . . Configuring your job . . . . . . Configuring the data source stage . Configuring the Transformer stage. Configuring the target file stage . Lesson checkpoint . . . . . . Compiling your job . . . . . . . Lesson checkpoint . . . . . . Running your job and viewing results Running the job . . . . . . . Viewing results . . . . . . . Lesson checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 . 4 . 5 . 5 . 7 . 7 . 8 . 8 . 8 . 10 . 10 . 11 . 11 . 12 . 13 . 15 . 16 . 16 . 16 . 16 . 16 . 17 . 19 The Job Run Options dialog box Parameters page . . . . . Limits page . . . . . . General Page . . . . . . Creating jobs by using Assistants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 39 40 40 40
21
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 21 21 22 22 22 23 23 24 24 24 24 26 27 28 29 29 29 30 30 30 30 31 31 36 36 37 38 39
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
. 83 . 84
iii
Environment variables . . . . . . . . Specifying a job parameter for server jobs . . Using job parameters in server jobs . . . Environment variables . . . . . . . . Creating a parameter set . . . . . . . . Parameter Set dialog box - General page. . Parameter Set dialog box - Parameters page Parameter Set dialog box - Values page . . Using parameter sets in job designs . . . . Adding a parameter set to a job design . . Viewing a parameter set . . . . . . . Using a parameter from a parameter set in a Using parameter sets in shared containers . Specifying parameter values at run time . . . Running a job from the Designer or Director clients . . . . . . . . . . . . . Running a job from the command line . . Running a job from within a job sequence .
. . . . . . . . . . . job . . . . .
. . . . . . . . . . .
84 85 86 87 88 88 88 90 90 90 91 91 . 91 . 91
Setting National Language Support properties Optimizing job performance . . . . . . Configuring Mainframe jobs . . . . . . . Specifying general options . . . . . . . Specifying a job parameter in a mainframe job Controlling code generation . . . . . . Supplying extension variable values . . . . Configuring operational metadata . . . .
. 92 . 92 . 93
193
. . . . . . . . . . . . 193 193 194 194 194 195 195 195 196 196 196 197
141
. . . . . . . . . 141 141 143 143 143 144 144 144 144
199
. 200 . 201 . 201
iv
Triggers . . . . . . . . . . . Expressions . . . . . . . . . . Job sequence properties . . . . . . General page . . . . . . . . Parameters page . . . . . . . Job Control page . . . . . . . Dependencies page . . . . . . Activity properties . . . . . . . Job activity properties . . . . . Routine activity properties . . . . Email Notification activity properties Wait-For-File activity properties . . ExecCommand activity properties . Exception activity properties . . . Nested Condition activity properties Sequencer activity properties . . . Terminator activity properties . . . Start Loop activity properties . . . End Loop activity properties . . . User Variables activity properties . . Compiling the job sequence . . . . Restarting job sequences . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
202 203 204 204 205 206 206 207 208 209 210 211 212 212 212 213 215 216 219 219 221 221
Managing data sets . . . . . . . Structure of data sets . . . . . . Starting the Data Set Manager . . . Data set viewer . . . . . . . . Creating and editing configuration files Message Handler Manager . . . . . Using the Message Handler Manager Message Handler file format . . . JCL templates . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . 223
How to read syntax diagrams . . . . 241 Product accessibility . . . . . . . . 243 Notices . . . . . . . . . . . . . . 245
Trademarks . . . . . . . . . . . . . . 247
Index . . . . . . . . . . . . . . . 249
Contents
vi
The job that you create will perform the following tasks: 1. Extract the data from the file. 2. Convert (transform) the data in the DATE column from a complete date (YYYY-MM-DD) to a year and month (YYYY, MM) stored as two columns. 3. Write the transformed data to a new text file that is created when you run the job. The following table shows a sample of the source data that the job reads.
The following table shows the same data after it has been transformed by the job.
Learning objectives
As you work through the exercise, you will learn how to do the following tasks: v Set up your project. v Create a new job. v Develop the job by adding stages and links and editing them. v Compile the job. v Run the job.
Time required
This exercise takes approximately 60 minutes to finish. If you explore other concepts related to this exercise, it could take longer to complete.
Audience
New user of IBM Information Server.
System requirements
The exercise requires the following hardware and software: v IBM WebSphere DataStage clients installed on a Windows XP platform. v Connection to a WebSphere DataStage server on a Windows or UNIX platform (Windows servers can be on the same computer as the clients).
Prerequisites
Complete the following tasks before starting the exercise: v Obtain DataStage developer privileges from the WebSphere DataStage administrator. v Find out the name of the project that the administrator has created for you to work in. v Set up the exercise data as described in the first lesson.
3. Select your project from the Project list, and then click OK. 4. Click Cancel to close the New window. (You will create your job later in this exercise.) The Designer client is now ready for you to start work. The following figure shows the Designer client.
Lesson checkpoint
In this lesson, you started the Designer client. You learned the following tasks: v How to enter your user name and password in the Attach window. v How to select the project to open.
Before you create your job, you must set up your project by entering information about your data. This information includes the name and location of the tables or files that contain your data, and a definition of the columns that the tables or files contain. The information, also referred to as metadata, is stored in table definitions in the repository. The easiest way to enter a table definition is to import it directly from the source data. In this exercise you will define the table definition by importing details about the data directly from the data file. To define your table definition: 1. In the Designer client, select Import Table definitions Sequential File Definitions. 2. In the Import Metadata (Sequential) window, do the following steps: a. In the Directory field type, or browse for the exercise directory name. b. Click in the Files section. In the Files section, select Example1.txt. Click Import. the Define Sequential Metadata window, do the following tasks: In the Format page, select the First line is column names option. Click the Define tab. In the Define page, examine the column definitions. This is the metadata that will populate your table definition. d. Click OK. 4. In the Import Metadata (Sequential) window, click Close. 5. In the repository tree, open the Table Definitions\Sequential\Root folder. c. d. 3. In a. b. c. 6. Double-click the table definition object named Example1.txt to open it. 7. In the Table Definition, window, click the Columns tab. 8. Examine the column definitions in the Columns page. Note that these are the same as the column definitions that you looked at in the Define Sequential Metadata window. The following figure shows the column definitions. Compare these to the columns shown in the Figure 1 on page 2 figure.
Lesson checkpoint
In this lesson you defined a table definition. You learned the following tasks: v How to import metadata from a data file to create a table definition object in the repository. v How to open the table definition that you created it and examine it.
2. In the New window, select the Jobs folder in the left pane, and then select the parallel job icon in the right pane. 3. Click OK to open a new empty job design window in the design area. 4. Select File Save. 5. In the Save Parallel Job As window, right-click the Jobs folder and select New Folder from the menu. 6. Type a name for the folder, for example, My Folder, and then move the cursor to the Item name field. 7. Type the name of the job in the Item name field. Name the job Exercise. 8. Confirm that the Folder path field contains the path \Jobs\My Jobs, and then click Save. You have created a new parallel job named Exercise and saved it in the folder Jobs\My Jobs in the repository.
Lesson checkpoint
In this lesson you created a job and saved it to a specified place in the repository. You learned the following tasks: v How to create a job in the Designer client. v How to name the job and save it to a folder in the repository tree.
Adding stages
To add stages to your job: 1. In the Designer client palette area, click the File bar to open the file section of the palette. 2. In the file section of the palette, select the Sequential File stage icon and drag the stage to your open job. Position the stage on the right side of the job window. The figure shows the file section of the palette.
3. In the file section of the palette, select another Sequential File stage icon and drag the stage to your open job. Position the stage on the left side of the job window. 4. In the Designer client palette area, click the Processing bar to open the Processing section of the palette. 5. In the processing section of the palette, select the Transformer stage icon and drag the stage to your open job. Position the stage between the two Sequential File stages. The figure shows the Processing section of the palette.
Adding links
To add links to your job: 1. Right-click on the Sequential File stage on the left of your job and hold the right button down. A target is displayed next to the mouse pointer to indicate that you are adding a link. 2. Drag the target to the Transformer stage and release the mouse button. A black line, which represents the link, joins the two stages. Note: If the link is displayed as a red line, it means that it is not connected to the Transformer stage. Select the end of the link and drag it to the Transformer stage and release the link when it turns black. 3. Repeat steps 1 and 2 to connect the Transformer stage to the second Sequential File stage. 4. Select File Save to save the job.
10
1. Select each stage or link. 2. Right-click and select Rename. 3. Type the new name:
Stage Left Sequential File Stage Transformer Stage Right Sequential File Stage Left link Right link Suggested name Data_source Transform Data_target data_in data_out
Your job should look like the one in the following diagram:
Lesson checkpoint
You have now designed you first job. You learned the following tasks: v How to add stages to your job. v How to link the stages together. v How to give the stages and links meaningful names.
11
12
13. Click Close to close the Data Browser and OK to close the stage editor.
13
6. In the bottom left pane of the Transformer stage editor, add a new column to the data_out link by doing the following tasks: a. Double-click in the Column name field beneath the QTY column to add a new row. b. In the empty Column name field, type YEAR. c. In the SQL type field, select Integer from the list. d. In the Length field, type 10. e. Repeat these steps to add another new column named MONTH, also with an SQL type of Integer and a Length of 10. The two new columns named YEAR and MONTH are displayed in red in the data_out link in the top right pane. They are red because you have not yet defined where the data to write into them will come from. 7. To define the source data for the YEAR column, do the following tasks: a. Double-click the Derivation field to the left of YEAR in the data_out link to open the expression editor. b. In the expression editor, type YearFromDate(data_in.DATE). c. Click outside the expression editor to close it. You have specified that the YEAR column is populated by taking the data from the DATE column and using the predefined function YearFromDate to strip the year from it. The YEAR column is now black to indicate that it has a valid derivation. 8. To define the source data for the MONTH column, do the following tasks: a. Double-click the Derivation field to the left of MONTH in the data_out link to open the expression editor. b. In the expression editor, type MonthFromDate(data_in.DATE). c. Click outside the expression editor to close it. You have specified that the MONTH column is populated by taking the data from the DATE column and using the predefined function MonthFromDate to strip the month from it. The MONTH column is now black to indicate that it has a valid derivation.
14
9. Click OK to close the Transformer stage editor. You have configured the Transformer stage to read the data passed to it from the Sequential File stage, and transform the data to split it into separate month and year fields, and then pass the data to the target Sequential File stage.
15
You have configured the Sequential File stage to write the data passed to it from the Transformer stage to a new text file.
Lesson checkpoint
In this lesson, you configured your job. You learned the following tasks: v How to edit a Sequential File stage. v How to import metadata into a stage. v How to edit a Transformer stage.
Lesson checkpoint
In this lesson you compiled your job.
16
2. Select your job in the right pane of the Director client, and select Job Run Now 3. In the Job Run Options window, click Run. 4. When the job status changes to Finished, select View Log. 5. Examine the job log to see the type of information that the Director client reports as it runs a job. The messages that you see are either control or information type. Jobs can also have Fatal and Warning messages. The following figure shows the log view of the job.
Viewing results
To view the results of your job:
Chapter 1. Your first job
17
1. In the job in the Designer client, double-click the Sequential File stage named Data_target to open the stage editor. 2. In the stage editor, click View Data. 3. Click OK in the Data Browser window to accept the default settings. A window opens that shows up to 100 rows of the data written to the data set (if you want to view more than 100 rows in a data browser, change the default settings before you click OK). 4. Examine the data and observe that there are now five columns in the data named CODE, PRODUCT, QTY, MONTH, and YEAR.
5. Click Close to close the Data Browser window. 6. Click OK to close the Sequential File stage.
18
Lesson checkpoint
In this lesson you ran your job and looked at the results. You learned the following tasks: v How to start the Director client from the Designer client. v How to run a job and look at the log file. v How to view the data written by the job.
19
20
Creating a job
To 1. 2. 3. create a job: Choose File New from the Designer menu. The New dialog box appears. Choose the Jobs folder in the left pane. Select one of the icons, depending on the type of job or shared container you want to create. 4. Click OK. The Diagram window appears, in the right pane of the Designer, along with the palette for the chosen type of job. You can now save the job and give it a name.
21
Otherwise, to open a job, do one of the following: v Choose File Open... . v Click the Open button on the toolbar. The Open dialog box is displayed. This allows you to open a job (or any other object) currently stored in the repository. To open a job: 1. Select the folder containing the job (this might be the Job folder, but you can store a job in any folder you like). 2. Select the job in the tree. 3. Click OK. You can also find the job in the Repository tree and double-click it, or select it and choose Edit from its shortcut menu, or drag it onto the background to open it. The updated Designer window displays the chosen job in a Diagram window.
Saving a job
To 1. 2. 3. save the job: Choose File Save. The Save job as dialog box appears: Enter the name of the job in the Item name field. Select a folder in which to store the job from the tree structure by clicking it. It appears in the Folder path box. By default jobs are saved in the pre-configured Job folder, but you can store it in any folder you choose. 4. Click OK. If the job name is unique, the job is created and saved in the Repository. If the job name is not unique, a message box appears. You must acknowledge this message before you can enter an alternative name (a job name must be unique within the entire repository, not just the selected folder). To save an existing job with a different name choose File Save As... and fill in the Save job as dialog box, specifying the new name and the folder in which the job is to be saved. Organizing your jobs into folders gives faster operation of the WebSphere DataStage Director when displaying job status.
Naming a job
The following rules apply to the names that you can give WebSphere DataStage jobs: v Job names can be any length. v They must begin with an alphabetic character. v They can contain alphanumeric characters and underscores. Job folder names can be any length and consist of any characters, including spaces.
Stages
A job consists of stages linked together which describe the flow of data from a data source to a data target (for example, a final data warehouse).
22
A stage usually has at least one data input or one data output. However, some stages can accept more than one data input, and output to more than one stage. The different types of job have different stage types. The stages that are available in the Designer depend on the type of job that is currently open in the Designer.
Stages and links can be grouped in a shared container. Instances of the shared container can then be reused in different parallel jobs. You can also define a local container within a job; this groups stages and links into a single unit, but can only be used within the job in which it is defined. Each stage type has a set of predefined and editable properties. These properties are viewed or edited using stage editors. A stage editor exists for each stage type.
23
Links
Links join the various stages in a job together and are used to specify how data flows when the job is run.
24
There are rules covering how links are used, depending on whether the link is an input or an output and what type of stages are being linked. WebSphere DataStage parallel jobs support three types of link: v Stream. A link representing the flow of data. This is the principal type of link, and is used by all stage types. v Reference. A link representing a table lookup. Reference links can only be input to Lookup stages, they can only be output from certain types of stage. v Reject. Some parallel job stages allow you to output records that have been rejected for some reason onto an output link. Note that reject links derive their metadata from the associated output link and this cannot be edited. You can usually only have an input stream link or an output stream link on a File or Database stage, you cant have both together. The three link types are displayed differently in the Designer Diagram window: stream links are represented by solid lines, reference links by dotted lines, and reject links by dashed lines.
Link marking
For parallel jobs, metadata is associated with a link, not a stage. If you have link marking enabled, a small icon attaches to the link to indicate if metadata is currently associated with it. Link marking also shows you how data is partitioned or collected between stages, and whether data is sorted. The following diagram shows the different types of link marking. If you double click on a partitioning/collecting marker the stage editor for the stage the link is input to is opened on the Partitioning tab.
25
Repartitioning marker
Collection marker
Link marking is enabled by default. To disable it, click on the link mark icon in the Designer toolbar, or deselect it in the Diagram menu, or the Diagram shortcut menu.
Unattached links
You can add links that are only attached to a stage at one end, although they will need to be attached to a second stage before the job can successfully compile and run. Unattached links are shown in a special color (red by default - but you can change this using the Options dialog). By default, when you delete a stage, any attached links and their metadata are left behind, with the link shown in red. You can choose Delete including links from the Edit or shortcut menus to delete a selected stage along with its connected links.
26
Input links connected to the stage generally carry data to be written to the underlying data target. Output links carry data read from the underlying data source. The column definitions on an input link define the data that will be written to a data target. The column definitions on an output link define the data to be read from a data source. An important point to note about linking stages in server jobs is that column definitions actually belong to, and travel with, the links as opposed to the stages. When you define column definitions for a stages output link, those same column definitions will appear at the other end of the link where it is input to another stage. If you move either end of a link to another stage, the column definitions will appear on the new stage. If you change the details of a column definition at one end of a link, those changes will appear in the column definitions at the other end of the link. There are rules covering how links are used, depending on whether the link is an input or an output and what type of stages are being linked. WebSphere DataStage server jobs support two types of input link: v Stream. A link representing the flow of data. This is the principal type of link, and is used by both active and passive stages. v Reference. A link representing a table lookup. Reference links are only used by active stages. They are used to provide information that might affect the way data is changed, but do not supply the data to be changed. The two link types are displayed differently in the Designer Diagram window: stream links are represented by solid lines and reference links by dotted lines. There is only one type of output link, although some stages permit an output link to be used as a reference input to the next stage and some do not.
Link marking
For server jobs, metadata is associated with a link, not a stage. If you have link marking enabled, a small icon attaches to the link to indicate if metadata is currently associated with it. Link marking is enabled by default. To disable it, click on the link mark icon in the Designer toolbar, or deselect it in the Diagram menu, or the Diagram shortcut menu.
Unattached links
You can add links that are only attached to a stage at one end, although they will need to be attached to a second stage before the job can successfully compile and run. Unattached links are shown in a special color (red by default - but you can change this using the Options dialog). By default, when you delete a stage, any attached links and their metadata are left behind, with the link shown in red. You can choose Delete including links from the Edit or shortcut menus to delete a selected stage along with its connected links.
27
a target. The read/write link to the data source is represented by the stage itself, and connection details are given on the Stage general tabs. Links to and from source and target stages are used to carry data to or from a processing or post-processing stage. For source and target stage types, column definitions are associated with stages rather than with links. You decide what appears on the outputs link of a stage by selecting column definitions on the Selection page. You can set the Column Push Option to specify that stage column definitions be automatically mapped to output columns (this happens if you set the option, define the stage columns then click OK to leave the stage without visiting the Selection page). There are rules covering how links are used, depending on whether the link is an input or an output and what type of stages are being linked. Mainframe stages have only one type of link, which is shown as a solid line. (A table lookup function is supplied by the Lookup stage, and the input links to this which acts as a reference is shown with dotted lines to illustrate its function.)
Link marking
For mainframe jobs, metadata is associated with the stage and flows down the links. If you have link marking enabled, a small icon attaches to the link to indicate if metadata is currently associated with it. Link marking is enabled by default. To disable it, click on the link mark icon in the Designer toolbar, or deselect it in the Diagram menu, or the Diagram shortcut menu.
Unattached links
Unlike server and parallel jobs, you cannot have unattached links in a mainframe job; both ends of a link must be attached to a stage. If you delete a stage, the attached links are automatically deleted too.
Link ordering
The Transformer stage in server jobs and various processing stages in parallel jobs allow you to specify the execution order of links coming into or going out from the stage. When looking at a job design in WebSphere DataStage, there are two ways to look at the link execution order: v Place the mouse pointer over a link that is an input to or an output from a Transformer stage. A ToolTip appears displaying the message:
Input execution order = n
for output links. In both cases n gives the links place in the execution order. If an input link is no. 1, then it is the primary link. Where a link is an output from the Transformer stage and an input to another Transformer stage, then the output link information is shown when you rest the pointer over it.
28
v Select a stage and right-click to display the shortcut menu. Choose Input Links or Output Links to list all the input and output links for that Transformer stage and their order of execution.
Naming links
The following rules apply to the names that you can give WebSphere DataStage links: v Link names can be any length. v They must begin with an alphabetic character. v They can contain alphanumeric characters and underscores.
Adding Stages
There is no limit to the number of stages you can add to a job. We recommend you position the stages as follows in the Diagram window: v Parallel Jobs Data sources on the left Data targets on the right Processing stages in the middle of the diagram v Server jobs Data sources on the left Data targets on the right Transformer or Aggregator stages in the middle of the diagram v Mainframe jobs Source stages on the left Processing stages in the middle Target stages on the right There are a number of ways in which you can add a stage: v Click the stage icon on the tool palette. Click in the Diagram window where you want to position the stage. The stage appears in the Diagram window. v Click the stage icon on the tool palette. Drag it onto the Diagram window. v Select the desired stage type in the repository tree and drag it to the Diagram window. When you insert a stage by clicking (as opposed to dragging) you can draw a rectangle as you click on the Diagram window to specify the size and shape of the stage you are inserting as well as its location. Each stage is given a default name which you can change if required.
Chapter 2. Sketching your job designs
29
If you want to add more than one stage of a particular type, press Shift after clicking the button on the tool palette and before clicking on the Diagram window. You can continue to click the Diagram window without having to reselect the button. Release the Shift key when you have added the stages you need; press Esc if you change your mind.
Moving stages
After they are positioned, stages can be moved by clicking and dragging them to a new location in the Diagram window. If you have the Snap to Grid option activated, the stage is attached to the nearest grid position when you release the mouse button. If stages are linked together, the link is maintained when you move a stage.
Renaming stages
There are a number of ways to rename a stage: v You can change its name in its stage editor. v You can select the stage in the Diagram window, press Ctrl-R, choose Rename from its shortcut menu, or choose Edit Rename from the main menu and type a new name in the text box that appears beneath the stage. v Select the stage in the diagram window and start typing. v You can select the stage in the Diagram window and then edit the name in the Property Browser (if you are displaying it).
Deleting stages
Stages can be deleted from the Diagram window. Choose one or more stages and do one of the following: v Press the Delete key. v Choose Edit Delete. v Choose Delete from the shortcut menu. A message box appears. Click Yes to delete the stage or stages and remove them from the Diagram window. (This confirmation prompting can be turned off if required.) When you delete stages in mainframe jobs, attached links are also deleted. When you delete stages in server or parallel jobs, the links are left behind, unless you choose Delete including links from the edit or shortcut menu.
Linking stages
You can link stages in three ways: v Using the Link button. Choose the Link button from the tool palette. Click the first stage and drag the link to the second stage. The link is made when you release the mouse button. v Using the mouse. Select the first stage. Position the mouse cursor on the edge of a stage until the mouse cursor changes to a circle. Click and drag the mouse to the other stage. The link is made when you release the mouse button. v Using the mouse. Point at the first stage and right click then drag the link to the second stage and release it.
30
Moving links
Once positioned, a link can be moved to a new location in the Diagram window. You can choose a new source or destination for the link, but not both. To move a link: 1. Click the link to move in the Diagram window. The link is highlighted. 2. Click in the box at the end you want to move and drag the end to its new location. In server and parallel jobs you can move one end of a link without reattaching it to another stage. In mainframe jobs both ends must be attached to a stage.
Deleting links
Links can be deleted from the Diagram window. Choose the link and do one of the following: v Press the Delete key. v Choose Edit Delete. v Choose Delete from the shortcut menu. A message box appears. Click Yes to delete the link. The link is removed from the Diagram window. Note: For server jobs, metadata is associated with a link, not a stage. If you delete a link, the associated metadata is deleted too. If you want to retain the metadata you have defined, do not delete the link; move it instead.
Renaming links
There are a number of ways to rename a link: v You can select it and start typing in a name in the text box that appears. v You can select the link in the Diagram window and then edit the name in the Property Browser. v You can select the link in the Diagram window, press Ctrl-R, choose Rename from its shortcut menu, or choose Edit Rename from the main menu and type a new name in the text box that appears beneath the link. v Select the link in the diagram window and start typing.
Editing stages
After you add the stages and links to the Diagram window, you must edit the stages to specify the data you want to use and any aggregations or conversions required.
31
Data arrives into a stage on an input link and is output from a stage on an output link. The properties of the stage and the data on each input and output link are specified using a stage editor. To edit a stage, do one of the following: v Double-click the stage in the Diagram window. v Select the stage and choose Properties... from the shortcut menu. v Select the stage and choose Edit Properties. A dialog box appears. The content of this dialog box depends on the type of stage you are editing. See the individual stage descriptions for details. The data on a link is specified using column definitions. The column definitions for a link are specified by editing a stage at either end of the link. Column definitions are entered and edited identically for each stage type.
32
After you define the new row, you can right-click on it and drag it to a new position in the grid.
Naming columns
The rules for naming columns depend on the type of job the table definition will be used in:
Mainframe Jobs
Column names can be any length. They must begin with an alphabetic character and contain alphanumeric, underscore, #, @, and $ characters.
33
With WebSphere DataStage Release 8.0, the table definition can be located anywhere in the repository that you choose. For example, you might want a top level folder called Tutorial that contains all the jobs and table definitions concerned with the server job tutorial. To save the column definitions: 1. Click Save... . The Save Table Definition dialog box appears. 2. Enter a folder name or path in the Data source type field. The name entered here determines how the definition will be stored in the repository. By default, this field contains Saved. 3. Enter a name in the Data source name field. This forms the second part of the table definition identifier and is the name of the branch created under the data source type branch. By default, this field contains the name of the stage you are editing. 4. Enter a name in the Table/file name field. This is the last part of the table definition identifier and is the name of the leaf created under the data source name branch. By default, this field contains the name of the link you are editing. 5. Optionally enter a brief description of the table definition in the Short description field. By default, this field contains the date and time you clicked Save... . The format of the date and time depend on your Windows setup. 6. Optionally enter a more detailed description of the table definition in the Long description field. 7. Click OK. The column definitions are saved under the specified branches in the Repository.
34
5. Click OK. One of two things happens, depending on the type of stage you are editing: v If the stage type does not support selective metadata loading, all the column definitions from the chosen table definition are copied into the Columns grid. v If the stage type does support selective metadata loading, the Select Columns dialog box appears, allowing you to specify which column definitions you want to load. Use the arrow keys to move columns back and forth between the Available columns list and the Selected columns list. The single arrow buttons move highlighted columns, the double arrow buttons move all items. By default all columns are selected for loading. Click Find... to open a dialog box which lets you search for a particular column. The shortcut menu also gives access to Find... and Find Next. Click OK when you are happy with your selection. This closes the Select Columns dialog box and loads the selected columns into the stage. For mainframe stages and certain parallel stages where the column definitions derive from a CFD file, the Select Columns dialog box can also contain a Create Filler check box. This happens when the table definition the columns are being loaded from represents a fixed-width table. Select this to cause sequences of unselected columns to be collapsed into filler items. Filler columns are sized appropriately, their data type set to character, and name set to FILLER_XX_YY where XX is the start offset and YY the end offset. Using fillers results in a smaller set of columns, saving space and processing time and making the column set easier to understand. If you are importing column definitions that have been derived from a CFD file into server or parallel job stages, you are warned if any of the selected columns redefine other selected columns. You can choose to carry on with the load or go back and select columns again. 6. Click OK to proceed. If the stage you are loading already has column definitions of the same name, you are prompted to confirm that you want to overwrite them. The Merge Column Metadata check box is selected by default and specifies that, if you confirm the overwrite, the Derivation, Description, Display Size and Field Position from the existing definition will be preserved (these contain information that is not necessarily part of the table definition and that you have possibly added manually). Note that the behavior of the merge is affected by the settings of the Metadata options in the Designer Options dialog box. 7. Click Yes or Yes to All to confirm the load. Changes are saved when you save your job design.
35
Pre-configured stages
There is a special feature that you can use to paste components into a shared container and add the shared container to the palette.
36
This feature allows you to have pre-configured stages ready to drop into a job. To paste a stage into a new shared container, select Edit Paste Special Into new Shared Container. The Paste Special into new Shared Container dialog box appears. This allows you to select a folder and name for the new shared container, enter a description and optionally add a shortcut to the palette. If you want to cut or copy metadata along with the stages, you should select source and destination stages, which will automatically select links and associated metadata. These can then be cut or copied and pasted as a group.
37
v You can select any row or column, or any cell within a row or column, and press CTRL-C to copy it. v You can select the whole of a very wide row by selecting the first cell and then pressing SHIFT+END. v If a cell contains multiple lines, you can expand it by left-clicking while holding down the SHIFT key. Repeat this to shrink it again. You can view a row containing a specific data item using the Find... button. The Find dialog box will reposition the view to the row containing the data you are interested in. The search is started from the current row. The Display... button invokes the Column Display dialog box. This allows you to simplify the data displayed by the Data Browser by choosing to hide some of the columns. For server jobs, it also allows you to normalize multivalued data to provide a 1NF view in the Data Browser. This dialog box lists all the columns in the display, all of which are initially selected. To hide a column, clear it. For server jobs, the Normalize on drop-down list box allows you to select an association or an unassociated multivalued column on which to normalize the data. The default is Un-normalized, and choosing Un-normalized will display the data in NF2 form with each row shown on a single line. Alternatively you can select Un-Normalized (formatted), which displays multivalued rows split over several lines. In the example, the Data Browser would display all columns except STARTDATE. The view would be normalized on the association PRICES.
38
Appearance branch to view the default colors and change them if required. You can also set the refresh interval at which the monitor updates the information while the job is running.
Parameters page
The Parameters page lists any parameters or parameter sets that have been defined for the job. If default values have been specified, these are displayed too. You can enter a value in the Value column, edit the default, or accept the default as it is. Click Set to Default to set a parameter to its default value, or click All to Default to set all parameters to their default values. Click Property Help to display any help text that has been defined for the selected parameter (this button is disabled if no help has been defined). Click OK when you are satisfied with the values for the parameters. When setting a value for an environment variable, you can specify one of the following special values: v $ENV. Instructs WebSphere DataStage to use the current setting for the environment variable. v $PROJDEF. The current setting for the environment variable is retrieved and set in the jobs environment (so that value is used wherever in the job the environment variable is used). If the value of that environment variable is subsequently changed in the Administrator client, the job will pick up the new value without the need for recompiling. v $UNSET. Instructs WebSphere DataStage to explicitly unset the environment variable.
39
Note that you cannot use these special values when viewing data on Parallel jobs. You will be warned if you try to do this.
Limits page
The Limits page allows you to specify whether stages in the job should be limited in how many rows they process and whether run-time error warnings should be ignored.
To specify that the job should abort after a certain number of warnings:
1. Click the Abort job after option button. 2. Select the number of warnings from the drop-down list box.
General Page
Use the General page to specify that the job should generate operational metadata. You can also disable any message handlers that have been specified for this job run.
40
41
leave out some parameters; for example, you might not want to specify a password as part of the object. In this case you can specify the missing property when you design a job using this object. You can create data connection objects associated with the following types of stage: v Connector stages. You can create data connection objects associated with any of the connector stages. v Parallel job stages. You can create data connection objects associated with any of the following types of parallel job stages: DB2/UDB Enterprise stage Oracle Enterprise stage Informix Enterprise stage Teradata Enterprise stage v Server job stages. You can create data connection objects associated with any of the following types of server job stages: ODBC stage UniData 6 and UniData stages Universe stage v Supplementary stages. You can create data connection objects associated with any of the following types of supplementary stages: DRS stage DB2/UDB API stage Informix CLI stage MS OLEDB stage Oracle OCI 9i stage Sybase OC stage Teradata API stage To specify the data connection object parameters: 1. Choose the type of stage that the object relates to in the Connect using Stage Type field by clicking the browse button and selecting the stage type object from the repository tree. The Connection parameters list is populated with the parameters that the connector stage requires in order to make a connection. 2. For each of the parameters, choose whether you are going to supply a value as part of the object, and if so, supply that value. You can also specify that the values for the parameters will be supplied via a parameter set object. To specify a parameter set in this way, step 2 is as follows: v Click the arrow button next to the Parameter set field, then choose Create from the menu. The Parameter Set window opens. For more details about parameter set objects, see Parameter Sets.
42
v Import via connectors v Import via supplementary stages (Import Table Definitions Plug-in Metadata Definitions) v Import via ODBC definitions (Import Table Definitions ODBC Table Definitions) v Import from UniVerse table (Import Table Definitions UniVerse Table Definitions) v Import from UniData 6 table (Import Table Definitions UniData 6 Table Definitions) v Import from Orchestrate Schema (Import Table Definitions Orchestrate Schema Definitions) In some of these cases WebSphere DataStage provides a wizard to guide you through the import, in others the import is via a simple dialog box. The method for creating a data connection object from the import varies according to the import type.
43
44
3. Fill in the remaining details such as name and description, and folder to store the object in. 4. Click OK to close the Data Connection dialog box, and continue with your job design.
45
2. Choose from the data connection objects that you have currently defined for this type of connector stage. 3. Click OK. The data connection details are loaded. If the data connection object only supplies some of the connection properties required you will have to supply the rest yourself. For example, the password might be deliberately left out of the data connection object, in which case you can supply it in the job design or specify a job parameter and specify it at run time.
46
The appropriate imported wizard opens and guides you through the process of importing metadata.
47
48
General page
The General page contains general information about the table definition. The following fields are on this page: v Data source type. The type of data source, for example, UniVerse. v Data source name. If you imported the table definition, this contains a reference to where the original data is found. For UniVerse and ODBC data sources, this is the data source name. For hashed file data sources, this is an account name. For sequential file sources, this is the last component of the directory path where the sequential file is found.
Copyright IBM Corp. 2006, 2008
49
v Table definition. The name of the table definition. v Mainframe platform type. The type of mainframe platform that the table definition applies to. Where the table definition does not apply to a mainframe data source, it displays <Not applicable>. v Mainframe access type. Where the table definition has been imported from a mainframe or is applicable to a mainframe, this specifies the type of database. If it is not a mainframe-type table definition, the field is set to <Not applicable>. v Metadata supports Multi-valued fields. Select this check box if the metadata supports multivalued data. If the check box is selected, three extra grid columns used for multivalued data support will appear on the Columns page. The check box is disabled for ODBC, mainframe, and stored procedure table definitions. v Fully Qualified Table Name. This read-only field shows the fully qualified table name, as derived from the locator (see Locator Page). v ODBC quote character. Allows you to specify what character an ODBC data source uses as a quote character. Specify 000 to suppress the quote character. v Short description. A brief description of the data. v Long description. A full description of the data. The combination of the data source type, data source name, and table or file name forms a unique identifier for the table definition. The entire identifier is shown at the top of the General page. No two table definitions can have the same identifier. The table definition can be located anywhere in the repository that you choose. For example, you might want a top level folder called Tutorial that contains all the jobs and table definitions concerned with the server job tutorial.
Columns page
The Columns page contains a grid displaying the column definitions for each column in the table definition. The grid has these columns: v Column name. The name of the column. v Name alias. This field appears if it is enabled in the Grid Properties window (it is not visible by default). It displays a the name alias for the column, if one has been defined. Name aliases are only available in table definitions that are linked to tables in the shared repository. v Key. Indicates whether the column is part of the primary key. v SQL type. The SQL data type. v Length. The data precision. This is the length for CHAR data and the maximum length for VARCHAR data. v Scale. The data scale factor. v Nullable. Specifies whether the column can contain null values. This is set to indicate whether the column is subject to a NOT NULL constraint. It does not itself enforce a NOT NULL constraint. v Display. The maximum number of characters required to display the column data. v Data element. The type of data in the column. v Description. A text description of the column. The following columns appear if you selected the Meta data supports Multi-valued fields check box on the General page:
50
v Association. The name of the association (if any) that the column belongs to. v Position. The field number. v Type. The nesting type, which can be S, M, MV, or MS. The following column might appear if NLS is enabled: v NLS Map. This property is visible only if NLS is enabled and Allow per-column mapping has been selected on the NLS page of the Table Definition dialog box. It allows you to specify a separate character set map for a column (which overrides the map set for the project or table). The following columns appear if the table definition is derived from a COBOL file definition mainframe data source: v Level number. The COBOL level number. Mainframe table definitions also have the following columns, but due to space considerations, these are not displayed on the columns page. To view them, choose Edit Row... from the Columns page shortcut menu, the Edit Column Metadata dialog appears, displaying the following field in the COBOL tab: v Occurs. The COBOL OCCURS clause. v Sign indicator. Indicates whether the column can be signed or not. v Sign option. If the column is signed, gives the location of the sign in the data. v Sync indicator. Indicates whether this is a COBOL-synchronized clause or not. v Usage. The COBOL USAGE clause. v Redefined field. The COBOL REDEFINED clause. v Depending on. A COBOL OCCURS-DEPENDING-ON clause. v Storage length. Gives the storage length in bytes of the column as defined. v Picture. The COBOL PICTURE clause. The Columns page for each link also contains a Clear All and a Load... button. The Clear All button deletes all the column definitions. The Load... button loads (copies) the column definitions from a table definition elsewhere in the Repository. A shortcut menu available in grids allows you to edit a cell, delete a row, or add a row.
Format page
The Format page contains file format parameters for sequential files used in server jobs. These fields are automatically set when you import a table definition from a sequential file. There are three check boxes on this page: v Fixed-width columns. Specifies whether the sequential file contains fixed-width fields. This check box is cleared by default, that is, the file does not contain fixed-width fields. When this check box is selected, the Spaces between columns field is enabled. v First line is column names. Specifies whether the first line in the file contains the column names. This check box is cleared by default, that is, the first row in the file does not contain the column names.
51
v Omit last new-line. Specifies whether the last newline character in the file is ignored. By default this check box is cleared, that is, if a newline character exists in the file, it is used. The rest of this page contains five fields. The available fields depend on the settings for the check boxes. v Spaces between columns. Specifies the number of spaces used between the columns in the file. This field appears when you select Fixed-width columns. v Delimiter. Contains the delimiter that separates the data fields. By default this field contains a comma. You can enter a single printable character or a decimal or hexadecimal number to represent the ASCII code for the character you want to use. Valid ASCII codes are in the range 1 to 253. Decimal values 1 through 9 must be preceded with a zero. Hexadecimal values must be prefixed with &h. Enter 000 to suppress the delimiter v Quote character. Contains the character used to enclose strings. By default this field contains a double quotation mark. You can enter a single printable character or a decimal or hexadecimal number to represent the ASCII code for the character you want to use. Valid ASCII codes are in the range 1 to 253. Decimal values 1 through 9 must be preceded with a zero. Hexadecimal values must be prefixed with &h. Enter 000 to suppress the quote character. v NULL string. Contains characters that are written to the file when a column contains SQL null values. v Padding character. Contains the character used to pad missing columns. This is # by default. The Sync Parallel button is only visible if your system supports parallel jobs. It causes the properties set on the Parallel tab to mirror the properties set on this page when the button is pressed. A dialog box appears asking you to confirm this action, if you do the Parallel tab appears and lets you view the settings.
NLS page
If NLS is enabled, this page contains the name of the map to use for the table definitions. The map should match the character set used in the definitions. By default, the list box shows all the maps that are loaded and ready to use with server jobs. Show all Server maps lists all the maps that are shipped with WebSphere DataStage. Show all Parallel maps lists the maps that are available for use with parallel jobs Note: You cannot use a server map unless it is loaded into WebSphere DataStage. You can load different maps using the Administrator client. Select Allow per-column mapping if you want to assign different character set maps to individual columns.
Relationships page
The Relationships page shows you details of any relationships this table definition has with other tables, and allows you to define new relationships. The page contains two grids: v Foreign Keys. This shows which columns in the table definition are foreign keys and which columns and tables they reference. You can define foreign keys manually by entering the information yourself. The table you reference does not
52
have to exist in the WebSphere DataStage Repository, but you will be informed if it doesnt. Referencing and referenced table do have to be in the same category. v Tables which reference this table. This gives details of where other table definitions in the Repository reference this one using a foreign key. You cannot edit the contents of this grid.
Parallel page
This page is used when table definitions are used in parallel jobs and gives detailed format information for the defined meta data. The information given here is the same as on the Format tab in one of the following parallel job stages: v Sequential File Stage v v v v v File Set Stage External Source Stage External Target Stage Column Import Stage Column Export Stage
The Defaults button gives access to a shortcut menu offering the choice of: v Save current as default. Saves the settings you have made in this dialog box as the default ones for your table definition. v Reset defaults from factory settings. Resets to the defaults that WebSphere DataStage came with. v Set current from default. Set the current settings to the default (this could be the factory default, or your own default if you have set one up). Click the Show schema button to open a window showing how the current table definition is generated into an OSH schema. This shows how WebSphere DataStage will interpret the column definitions and format properties of the table definition in the context of a parallel job stage.
Layout page
The Layout page displays the schema format of the column definitions in a table. Select a button to view the data representation in one of three formats: v Parallel. Displays the OSH record schema. You can right-click to save the layout as a text file in *.osh format. v COBOL. Displays the COBOL representation, including the COBOL picture clause, starting and ending offsets, and column storage length. You can right-click to save the file view layout as an HTML file. v Standard. Displays the SQL representation, including SQL type, length, and scale.
Locator page
Use the Locator page to view and edit the data resource locator associated with the table definition.
53
The data resource locator is a property of the table definition that describes the real world object that the table definition was imported from. Note the following points: v Table definitions are only visible in the SQL Builder if they have a locator defined. v When capturing process metadata, you define a table containing the locator information in the source or target database. This table provides some of the information displayed in the Locator page. v Locators are completed when table definitions are imported using metadata import, and locators are changed when table definitions are copied, renamed, or moved. The fields can be edited in the Locator page. v Locators are used by the Shared Table Creation wizard when comparing table definitions in the DataStage repository with tables in the shared repository. The labels and contents of the fields in this page vary according to the type of data source/target the locator originates from. If the import data connection details were saved in a data connection object when the table definition was created, then the data connection object is identified by the Data Connection field. If the table definition is related to a shared table, the name of the shared table is given in the Created from Data Collection field. If the table definition is related to a shared table with a Name Alias, then the alias is listed in the Name alias field.
54
v v v v
UniVerse files UniVerse tables Web services WSDL definitions XML table definitions
WebSphere DataStage connects to the specified data source and extracts the required table definition metadata. You can use the Data Browser to view the actual data in data sources from which you are importing table definitions. To import table definitions in this way: 1. Choose Import Table Definitions Data Source Type from the main menu. For most data source types, a dialog box appears enabling you to connect to the data source (for some sources, a wizard appears and guides you through the process). 2. Fill in the required connection details and click OK. Once a connection to the data source has been made successfully, the updated dialog box gives details of the table definitions available for import. 3. Select the required table definitions and click OK. The table definition metadata is imported into the repository. Specific information about importing from particular types of data source is in WebSphere DataStage Developers Help.
55
You can view a row containing a specific data item using the Find... button. The Find dialog box repositions the view to the row containing the data you are interested in. The search is started from the current row. The Display... button opens the Column Display dialog box. It allows you to simplify the data displayed by the Data Browser by choosing to hide some of the columns. It also allows you to normalize multivalued data to provide a 1NF view in the Data Browser. This dialog box lists all the columns in the display, and initially these are all selected. To hide a column, clear it. The Normalize on drop-down list box allows you to select an association or an unassociated multivalued column on which to normalize the data. The default is Un-Normalized, and choosing Un-Normalized will display the data in NF2 form with each row shown on a single line. Alternatively you can select Un-Normalize (formatted), which displays multivalued rows split over several lines.
Shared metadata
You can share metadata between the local project repository and the suite-wide shared repository. When you are working in a project repository, the metadata that is displayed in the repository tree is local to that project. The metadata cannot be used by another project or another suite component unless you make the metadata available as a table in the shared repository. You can share metadata across the suite in a number of ways: v You can import metadata by using connectors and store it in the shared repository where it can be shared by projects and suite components. This metadata can be made available as table definitions within the projects local repositories. You can use the advanced find facilities to find all the table definitions in a project that are derived from the same source. v You can create table definitions in projects from metadata held in the shared repository. v You can make table definitions in your project tree available as metadata in the shared repository. You can manage shared metadata using a tool in the Designer client. The shared metadata is stored in a hierarchy of objects that reflect the data sources from which the metadata was derived. The hierarchy has one of the following structures:
56
57
5. On the Data source location page, select the host name and database to identify where you want to store the metadata in the shared repository. If the lists are not populated, click New location to start the Shared Metadata Management tool so that you can create host and database objects in the repository that correspond to the data source that you are importing metadata from. 6. Click Next. 7. Confirm the import details and click Import. 8. Browse the repository tree and select the location in the project repository for the table definition that you are creating, and then click OK.
Table definitions can be linked for any of the following reasons: v Because the table definition was used to create a table in the shared repository. v Because the table definition was imported using a connector and a table in the shared repository was created automatically at the same time. v Because the table definition was created from a table in the shared repository. Table definitions that are local to the project are identified by the following icon:
To create a table definition in your project from a table in the shared repository: 1. In the Designer client, select Repository Metadata Sharing Create Table Definition from Table from the main menu. 2. Browse the tree in the Create Table Definition from Table window and select the tables from which you want to build table definitions in your project. You can select individual tables or select a schema, database, or host that is higher in the tree to select all the contained tables.
58
3. In the Folder in which to create Table Definitions field, specify the folder in your project repository where you want to store the table definitions. 4. Click Create.
Synchronizing metadata
You can check that the table definition in your project repository is synchronized with the table in the shared repository. A table definition is in the synchronized state when its modification time and date match the modification time and date of the table in the shared repository to which it is linked. The synchronization state is checked whenever the project repository view is refreshed. You can also check the synchronization state manually to ensure that no changes have occurred since the last repository refresh. A table definition that is no longer synchronized is identified by the following icon:
You can check the synchronization state manually to ensure that no changes have occurred since the last repository refresh: 1. Select one or more table definitions in the project repository tree. 2. Select Repository Metadata Sharing Update table definition from shared table from the main menu. 3. If any of the table definitions are not synchronized with the tables, you can do one of the following actions for that table definition. You can perform these actions on multiple tables if required: v Click Update or Update All to update the table definition or table definitions from the table or tables. v Click Remove or Remove All to remove the link between the table definition or table definitions and the table or tables. 4. If the table definitions are synchronized with the tables, you can either close the window or click Remove to remove the link between the table definition and the table.
59
The new host system object is shown in the tree in the Shared Metadata Management tool. The details that you enter are shown in the right pane of the Shared Metadata Management tool whenever this host system object is selected in the tree. To add a new host system to the tree: 1. Select Repository Metadata Sharing Management from the main menu to open the Shared Metadata Management tool. 2. Click the repository icon at the top of the tree. 3. Select Add Add new host system. 4. In the Add new host system window, specify information about your host system. The Name field and Network Node fields are mandatory; the other fields are optional. 5. Click OK.
60
61
4. Where the Data source type specifies a relational database, type the name of the database owner in Owner. 5. If you are entering a mainframe table definition, choose the platform type from the Mainframe platform type drop-down list, and the access type from the Mainframe access type drop-down list. Otherwise leave both of these items set to <Not applicable>. 6. Select the Metadata supports Multi-valued fields check box if the metadata supports multivalued data. 7. If required, specify what character an ODBC data source uses as a quote character in ODBC quote character. 8. Enter a brief description of the data in the Short description field. This is an optional field. 9. Enter a more detailed description of the data in the Long description field. This is an optional field. 10. Click the Columns tab. The Columns page appears at the front of the Table Definition dialog box. You can now enter or load column definitions for your data.
62
v Nullable. Select Yes or No from the drop-down list. This is set to indicate whether the column is subject to a NOT NULL constraint. It does not itself enforce a NOT NULL constraint. v Date format. Choose the date format that the column uses from the drop-down list of available formats. v Description. Type in a description of the column.
63
DISPLAY-1 - double-byte zone decimal, used with Graphic_G or Graphic_N v Sign indicator. Choose Signed or blank from the drop-down list to specify whether the column can be signed or not. The default is blank. v Sign option. If the column is signed, choose the location of the sign in the data from the drop-down list. Choose from the following: LEADING - the sign is the first byte of storage TRAILING - the sign is the last byte of storage LEADING SEPARATE - the sign is in a separate byte that has been added to the beginning of storage TRAILING SEPARATE - the sign is in a separate byte that has been added to the end of storage Selecting either LEADING SEPARATE or TRAILING SEPARATE will increase the storage length of the column by one byte. v Sync indicator. Choose SYNC or blank from the drop-down list to indicate whether this is a COBOL-synchronized clause or not. v Redefined field. Optionally specify a COBOL REDEFINES clause. This allows you to describe data in the same storage area using a different data description. The redefining column must be the same length, or smaller, than the column it redefines. Both columns must have the same level, and a column can only redefine the immediately preceding column with that level. v Depending on. Optionally choose a COBOL OCCURS-DEPENDING ON clause from the drop-down list. v Storage length. Gives the storage length in bytes of the column as defined. The field cannot be edited. v Picture. Gives the COBOL PICTURE clause, which is derived from the column definition. The field cannot be edited. The Server tab is still accessible, but the Server page only contains the Data Element and Display fields. The following table shows the relationships between native COBOL types and SQL types:
Table 1. Relationships between native COBOL types and SQL types Native Data Type BINARY Native Length (bytes) 248 COBOL Usage Representation SQL Type PIC S9 to S9(4) SmallInt COMP PIC Integer S9(5) to S9(9) Decimal COMP PIC S9(10) to S9(18) COMP n PIC X(n) PIC S9(p)V9(s) COMP-3 PIC S9(p)V9(s) PIC COMP-1 PIC COMP-2 PIC G(n) DISPLAY-1 PIC N(n) Char Decimal Decimal Decimal Decimal NChar NChar Precision (p) Scale (s) Storage Length (bytes) 248
n (p+s)/2+1 p+s
64
Table 1. Relationships between native COBOL types and SQL types (continued) Native Data Type GROUP Native Length (bytes) n (sum of all the column lengths that make up the group) 248 COBOL Usage Representation SQL Type Char Precision (p) n Scale (s) n/a Storage Length (bytes) n
NATIVE BINARY
PIC S9 to S9(4) SmallInt COMP-5 PIC Integer S9(5) to S9(9) Decimal COMP-5 PIC S9(10) to S9(18) COMP-5 PIC S9(4) COMP PIC X(n) PIC S9(4) COMP PIC G(n) DISPLAY-1 PIC S9(4) COMP PIC N(n) VarChar
248
VARCHAR
n+2
n+2
n/a
n+2
VARGRAPHIC (n*2)+2 _G
NVarChar
n+2
n/a
(n*2)+2
VARGRAPHIC (n*2)+2 _N
NVarChar
n+2
n/a
(n*2)+2
65
v Prefix bytes. Specifies that this column is prefixed by 1, 2, or 4 bytes containing, as a binary value, either the columns length or the tag value for a tagged column. You can use this option with variable-length fields. Variable-length fields can be either delimited by a character or preceded by a 1-, 2-, or 4-byte prefix containing the field length. WebSphere DataStage inserts the prefix before each field. This property is mutually exclusive with the Delimiter, Quote, and Final Delimiter properties, which are used by default. v Print field. This property is intended for use when debugging jobs. Set it to have WebSphere DataStage produce a message for each of the columns it reads. The message has the format: Importing N: D where: N is the column name. D is the imported data of the column. Non-printable characters contained in D are prefixed with an escape character and written as C string literals; if the column contains binary data, it is output in octal format. v Quote. Specifies that variable length columns are enclosed in single quotes, double quotes, or another ASCII character or pair of ASCII characters. Choose Single or Double, or enter a character. v Start position. Specifies the starting position of a column in the record. The starting position can be either an absolute byte offset from the first record position (0) or the starting position of another column. v Tag case value. Explicitly specifies the tag value corresponding to a subfield in a tagged subrecord. By default the fields are numbered 0 to N-1, where N is the number of fields. (A tagged subrecord is a column whose type can vary. The subfields of the tagged subrecord are the possible types. The tag case value of the tagged subrecord selects which of those types is used to interpret the columns value for the record.) String Type This has the following properties: v Character Set. Choose from ASCII or EBCDIC (not available for ustring type (Unicode)). v Default. The default value for a column. This is used for data written by a Generate stage. It also supplies the value to substitute for a column that causes an error (whether written or read). v Export EBCDIC as ASCII. Select this to specify that EBCDIC characters are written as ASCII characters (not available for ustring type (Unicode)). v Is link field. Selected to indicate that a column holds the length of another, variable-length column of the record or of the tag value of a tagged record field. v Import ASCII as EBCDIC. Select this to specify that ASCII characters are read as EBCDIC characters (not available for ustring type (Unicode)). v Field max width. The maximum number of bytes in a column represented as a string. Enter a number. This is useful where you are storing numbers as text. If you are using a fixed-width character set, you can calculate the length exactly. If you are using variable-length character set, calculate an adequate maximum width for your fields. Applies to fields of all data types except date, time, timestamp, and raw; and record, subrec, or tagged if they contain at least one field of this type.
66
v Field width. The number of bytes in a column represented as a string. Enter a number. This is useful where you are storing numbers as text. If you are using a fixed-width charset, you can calculate the number of bytes exactly. If its a variable length encoding, base your calculation on the width and frequency of your variable-width characters. Applies to fields of all data types except date, time, timestamp, and raw; and record, subrec, or tagged if they contain at least one field of this type. v Pad char. Specifies the pad character used when strings or numeric values are written to an external string representation. Enter a character (single-byte for strings, can be multi-byte for ustrings) or choose null or space. The pad character is used when the external string representation is larger than required to hold the written field. In this case, the external string is filled with the pad character to its full length. Space is the default. Applies to string, ustring, and numeric data types and record, subrec, or tagged types if they contain at least one field of this type. Date Type This has the following properties: v Byte order. Specifies how multiple byte data types are ordered. Choose from: little-endian. The high byte is on the right. big-endian. The high byte is on the left. native-endian. As defined by the native format of the machine. v Character Set. Choose from ASCII or EBCDIC. v Days since. Dates are written as a signed integer containing the number of days since the specified date. Enter a date in the form %yyyy-%mm-%dd or in the default date format if you have defined a new one on an NLS system. v Data Format. Specifies the data representation format of a column. Choose from: binary text For dates, binary is equivalent to specifying the julian property for the date field, text specifies that the data to be written contains a text-based date in the form %yyyy-%mm-%dd or in the default date format if you have defined a new one on an NLS system. v Default. The default value for a column. This is used for data written by a Generate stage. It also supplies the value to substitute for a column that causes an error (whether written or read). v Format string. The string format of a date. By default this is %yyyy-%mm-%dd. The Format string can contain one or a combination of the following elements: %dd: A two-digit day. %mm: A two-digit month. %year_cutoffyy: A two-digit year derived from yy and the specified four-digit year cutoff, for example %1970yy. %yy: A two-digit year derived from a year cutoff of 1900. %yyyy: A four-digit year. %ddd: Day of year in three-digit form (range of 1- 366). %mmm: Three-character month abbreviation. The format_string is subject to the following restrictions: It cannot have more than one element of the same type, for example it cannot contain two %dd elements.
Chapter 4. Defining your data
67
It cannot have both %dd and %ddd. It cannot have both %yy and %yyyy. It cannot have both %mm and %ddd. It cannot have both %mmm and %ddd. It cannot have both %mm and %mmm. If it has %dd, it must have %mm or %mmm. It must have exactly one of %yy or %yyyy. When you specify a date format string, prefix each component with the percent symbol (%). Separate the strings components with any character except the percent sign (%). If this format string does not include a day, it is set to the first of the month in the destination field. If the format string does not include the month and day, they default to January 1. Note that the format string must contain a month if it also contains a day; that is, you cannot omit only the month. The year_cutoff is the year defining the beginning of the century in which all twodigit years fall. By default, the year cutoff is 1900; therefore, a two-digit year of 97 represents 1997. You can specify any four-digit year as the year cutoff. All two-digit years then specify the next possible year ending in the specified two digits that is the same or greater than the cutoff. For example, if you set the year cutoff to 1930, the two-digit year 30 corresponds to 1930, and the two-digit year 29 corresponds to 2029. v Is Julian. Select this to specify that dates are written as a numeric value containing the Julian day. A Julian day specifies the date as the number of days from 4713 BCE January 1, 12:00 hours (noon) GMT. Time Type This has the following properties: v Byte order. Specifies how multiple byte data types are ordered. Choose from: little-endian. The high byte is on the right. big-endian. The high byte is on the left. native-endian. As defined by the native format of the machine. v Character Set. Choose from ASCII or EBCDIC. v Default. The default value for a column. This is used for data written by a Generate stage. It also supplies the value to substitute for a column that causes an error (whether written or read). v Data Format. Specifies the data representation format of a column. Choose from: binary text For time, binary is equivalent to midnight_seconds, text specifies that the field represents time in the text-based form %hh:%nn:%ss or in the default date format if you have defined a new one on an NLS system. v Format string. Specifies the format of columns representing time as a string. By default this is %hh-%mm-%ss. The possible components of the time format string are: %hh: A two-digit hours component. %nn: A two-digit minute component (nn represents minutes because mm is used for the month of a date). %ss: A two-digit seconds component.
68
%ss.n: A two-digit seconds plus fractional part, where n is the number of fractional digits with a maximum value of 6. If n is 0, no decimal point is printed as part of the seconds component. Trailing zeros are not suppressed. You must prefix each component of the format string with the percent symbol. Separate the strings components with any character except the percent sign (%). v Is midnight seconds. Select this to specify that times are written as a binary 32-bit integer containing the number of seconds elapsed from the previous midnight. Timestamp Type This has the following properties: v Byte order. Specifies how multiple byte data types are ordered. Choose from: little-endian. The high byte is on the right. big-endian. The high byte is on the left. native-endian. As defined by the native format of the machine. v Character Set. Choose from ASCII or EBCDIC. v Data Format. Specifies the data representation format of a column. Choose from: binary text For timestamp, binary specifies that the first integer contains a Julian day count for the date portion of the timestamp and the second integer specifies the time portion of the timestamp as the number of seconds from midnight. A binary timestamp specifies that two 32-but integers are written. Text specifies a text-based timestamp in the form %yyyy-%mm-%dd %hh:%nn:%ss or in the default date format if you have defined a new one on an NLS system. v Default. The default value for a column. This is used for data written by a Generate stage. It also supplies the value to substitute for a column that causes an error (whether written or read). v Format string. Specifies the format of a column representing a timestamp as a string. Defaults to %yyyy-%mm-%dd %hh:%nn:%ss. Specify the format as follows: For the date: %dd: A two-digit day. %mm: A two-digit month. %year_cutoffyy: A two-digit year derived from yy and the specified four-digit year cutoff. %yy: A two-digit year derived from a year cutoff of 1900. %yyyy: A four-digit year. %ddd: Day of year in three-digit form (range of 1 - 366) For the time: %hh: A two-digit hours component. %nn: A two-digit minute component (nn represents minutes because mm is used for the month of a date). %ss: A two-digit seconds component. %ss.n: A two-digit seconds plus fractional part, where n is the number of fractional digits with a maximum value of 6. If n is 0, no decimal point is printed as part of the seconds component. Trailing zeros are not suppressed.
69
You must prefix each component of the format string with the percent symbol (%). Separate the strings components with any character except the percent sign (%). Integer Type This has the following properties: v Byte order. Specifies how multiple byte data types are ordered. Choose from: little-endian. The high byte is on the right. big-endian. The high byte is on the left. native-endian. As defined by the native format of the machine. v Character Set. Choose from ASCII or EBCDIC. v C_format. Perform non-default conversion of data from a string to integer data. This property specifies a C-language format string used for reading/writing integer strings. This is passed to sscanf() or sprintf(). v Default. The default value for a column. This is used for data written by a Generate stage. It also supplies the value to substitute for a column that causes an error (whether written or read). v Data Format. Specifies the data representation format of a column. Choose from: binary text v Field max width. The maximum number of bytes in a column represented as a string. Enter a number. Enter a number. This is useful where you are storing numbers as text. If you are using a fixed-width character set, you can calculate the length exactly. If you are using variable-length character set, calculate an adequate maximum width for your fields. Applies to fields of all data types except date, time, timestamp, and raw; and record, subrec, or tagged if they contain at least one field of this type. v Field width. The number of bytes in a column represented as a string. Enter a number. This is useful where you are storing numbers as text. If you are using a fixed-width charset, you can calculate the number of bytes exactly. If its a variable length encoding, base your calculation on the width and frequency of your variable-width characters. Applies to fields of all data types except date, time, timestamp, and raw; and record, subrec, or tagged if they contain at least one field of this type. v In_format. Format string used for conversion of data from string to integer. This is passed to sscanf(). By default, WebSphere DataStage invokes the C sscanf() function to convert a numeric field formatted as a string to either integer or floating point data. If this function does not output data in a satisfactory format, you can specify the in_format property to pass formatting arguments to sscanf(). v Is link field. Selected to indicate that a column holds the length of another, variable-length column of the record or of the tag value of a tagged record field. v Out_format. Format string used for conversion of data from integer to a string. This is passed to sprintf(). By default, WebSphere DataStage invokes the C sprintf() function to convert a numeric field formatted as integer data to a string. If this function does not output data in a satisfactory format, you can specify the out_format property to pass formatting arguments to sprintf(). v Pad char. Specifies the pad character used when the integer is written to an external string representation. Enter a character (single-bye for strings, can be multi-byte for ustrings) or choose null or space. The pad character is used when
70
the external string representation is larger than required to hold the written field. In this case, the external string is filled with the pad character to its full length. Space is the default. Decimal Type This has the following properties: v Allow all zeros. Specifies whether to treat a packed decimal column containing all zeros (which is normally illegal) as a valid representation of zero. Select Yes or No. v Character Set. Choose from ASCII or EBCDIC. v Decimal separator. Specify the character that acts as the decimal separator (period by default). v Default. The default value for a column. This is used for data written by a Generate stage. It also supplies the value to substitute for a column that causes an error (whether written or read). v Data Format. Specifies the data representation format of a column. Choose from: binary text For decimals, binary means packed. Text represents a decimal in a string format with a leading space or - followed by decimal digits with an embedded decimal point if the scale is not zero. The destination string format is: [+ | -]ddd.[ddd] and any precision and scale arguments are ignored. v Field max width. The maximum number of bytes in a column represented as a string. Enter a number. Enter a number. This is useful where you are storing numbers as text. If you are using a fixed-width character set, you can calculate the length exactly. If you are using variable-length character set, calculate an adequate maximum width for your fields. Applies to fields of all data types except date, time, timestamp, and raw; and record, subrec, or tagged if they contain at least one field of this type. v Field width. The number of bytes in a column represented as a string. Enter a number. This is useful where you are storing numbers as text. If you are using a fixed-width charset, you can calculate the number of bytes exactly. If its a variable length encoding, base your calculation on the width and frequency of your variable-width characters. Applies to fields of all data types except date, time, timestamp, and raw; and record, subrec, or tagged if they contain at least one field of this type. v Packed. Select an option to specify what the decimal columns contain, choose from: v Yes to specify that the decimal columns contain data in packed decimal format (the default). This has the following sub-properties: v Check. Select Yes to verify that data is packed, or No to not verify. v Signed. Select Yes to use the existing sign when writing decimal columns. Select No to write a positive sign (0xf) regardless of the columns actual sign value. v No (separate) to specify that they contain unpacked decimal with a separate sign byte. This has the following sub-property: v Sign Position. Choose leading or trailing as appropriate. v No (zoned) to specify that they contain an unpacked decimal in either ASCII or EBCDIC text. This has the following sub-property: v Sign Position. Choose leading or trailing as appropriate.
71
v No (overpunch) to specify that the field has a leading or end byte that contains a character which specifies both the numeric value of that byte and whether the number as a whole is negatively or positively signed. This has the following sub-property: v Sign Position. Choose leading or trailing as appropriate. v Precision. Specifies the precision where a decimal column is represented in text format. Enter a number. When a decimal is written to a string representation, WebSphere DataStage uses the precision and scale defined for the source decimal field to determine the length of the destination string. The precision and scale properties override this default. When they are defined, WebSphere DataStage truncates or pads the source decimal to fit the size of the destination string. If you have also specified the field width property, WebSphere DataStage truncates or pads the source decimal to fit the size specified by field width. v Rounding. Specifies how to round the source field to fit into the destination decimal when reading a source field to a decimal. Choose from: up (ceiling). Truncate source column towards positive infinity. This mode corresponds to the IEEE 754 Round Up mode. For example, 1.4 becomes 2, -1.6 becomes -1. down (floor). Truncate source column towards negative infinity. This mode corresponds to the IEEE 754 Round Down mode. For example, 1.6 becomes 1, -1.4 becomes -2. nearest value. Round the source column towards the nearest representable value. This mode corresponds to the COBOL ROUNDED mode. For example, 1.4 becomes 1, 1.5 becomes 2, -1.4 becomes -1, -1.5 becomes -2. truncate towards zero. This is the default. Discard fractional digits to the right of the right-most fractional digit supported by the destination, regardless of sign. For example, if the destination is an integer, all fractional digits are truncated. If the destination is another decimal with a smaller scale, truncate to the scale size of the destination decimal. This mode corresponds to the COBOL INTEGER-PART function. Using this method 1.6 becomes 1, -1.6 becomes -1. v Scale. Specifies how to round a source decimal when its precision and scale are greater than those of the destination. By default, when the WebSphere DataStage writes a source decimal to a string representation, it uses the precision and scale defined for the source decimal field to determine the length of the destination string. You can override the default by means of the precision and scale properties. When you do, WebSphere DataStage truncates or pads the source decimal to fit the size of the destination string. If you have also specified the field width property, WebSphere DataStage truncates or pads the source decimal to fit the size specified by field width. Specifies how to round a source decimal when its precision and scale are greater than those of the destination. Float Type This has the following properties: v C_format. Perform non-default conversion of data from a string to floating-point data. This property specifies a C-language format string used for reading floating point strings. This is passed to sscanf(). v Character Set. Choose from ASCII or EBCDIC. v Default. The default value for a column. This is used for data written by a Generate stage. It also supplies the value to substitute for a column that causes an error (whether written or read). v Data Format. Specifies the data representation format of a column. Choose from:
72
v v
binary text Field max width. The maximum number of bytes in a column represented as a string. Enter a number. Enter a number. This is useful where you are storing numbers as text. If you are using a fixed-width character set, you can calculate the length exactly. If you are using variable-length character set, calculate an adequate maximum width for your fields. Applies to fields of all data types except date, time, timestamp, and raw; and record, subrec, or tagged if they contain at least one field of this type. Field width. The number of bytes in a column represented as a string. Enter a number. This is useful where you are storing numbers as text. If you are using a fixed-width charset, you can calculate the number of bytes exactly. If its a variable length encoding, base your calculation on the width and frequency of your variable-width characters. Applies to fields of all data types except date, time, timestamp, and raw; and record, subrec, or tagged if they contain at least one field of this type. In_format. Format string used for conversion of data from string to floating point. This is passed to sscanf(). By default, WebSphere DataStage invokes the C sscanf() function to convert a numeric field formatted as a string to floating point data. If this function does not output data in a satisfactory format, you can specify the in_format property to pass formatting arguments to sscanf(). Is link field. Selected to indicate that a column holds the length of a another, variable-length column of the record or of the tag value of a tagged record field. Out_format. Format string used for conversion of data from floating point to a string. This is passed to sprintf(). By default, WebSphere DataStage invokes the C sprintf() function to convert a numeric field formatted as floating point data to a string. If this function does not output data in a satisfactory format, you can specify the out_format property to pass formatting arguments to sprintf(). Pad char. Specifies the pad character used when the floating point number is written to an external string representation. Enter a character (single-bye for strings, can be multi-byte for ustrings) or choose null or space. The pad character is used when the external string representation is larger than required to hold the written field. In this case, the external string is filled with the pad character to its full length. Space is the default.
Nullable This appears for nullable fields. v Actual field length. Specifies the number of bytes to fill with the Fill character when a field is identified as null. When WebSphere DataStage identifies a null field, it will write a field of this length full of Fill characters. This is mutually exclusive with Null field value. v Null field length. The length in bytes of a variable-length field that contains a null. When a variable-length field is read, a length of null field length in the source field indicates that it contains a null. When a variable-length field is written, WebSphere DataStage writes a length value of null field length if the field contains a null. This property is mutually exclusive with null field value. v Null field value. Specifies the value given to a null field if the source is set to null. Can be a number, string, or C-type literal escape character. For example, you can represent a byte value by \ooo, where each o is an octal digit 0 - 7 and the first o is < 4, or by \xhh, where each h is a hexadecimal digit 0 - F. You must use this form to encode non-printable byte values.
73
This property is mutually exclusive with Null field length and Actual length. For a fixed width data representation, you can use Pad char (from the general section of Type defaults) to specify a repeated trailing character if the value you specify is shorter than the fixed width of the field. On reading, specifies the value given to a field containing a null. On writing, specifies the value given to a field if the source is set to null. Can be a number, string, or C-type literal escape character. Generator If the column is being used in a Row Generator or Column Generator stage, this allows you to specify extra details about the mock data being generated. The exact fields that appear depend on the data type of the column being generated. They allow you to specify features of the data being generated, for example, for integers they allow you to specify if values are random or whether they cycle. If they cycle you can specify an initial value, an increment, and a limit. If they are random, you can specify a seed value for the random number generator, whether to include negative numbers, and a limit. Vectors If the row you are editing represents a column which is a variable length vector, tick the Variable check box. The Vector properties appear, these give the size of the vector in one of two ways: v Link Field Reference. The name of a column containing the number of elements in the variable length vector. This should have an integer or float type, and have its Is Link field property set. v Vector prefix. Specifies 1-, 2-, or 4-byte prefix containing the number of elements in the vector. If the row you are editing represents a column which is a vector of known length, enter the number of elements in the Vector Occurs box. Subrecords If the row you are editing represents a column which is part of a subrecord the Level Number column indicates the level of the column within the subrecord structure. If you specify Level numbers for columns, the column immediately preceding will be identified as a subrecord. Subrecords can be nested, so can contain further subrecords with higher level numbers (that is, level 06 is nested within level 05). Subrecord fields have a Tagged check box to indicate that this is a tagged subrecord. Extended For certain data types the Extended check box appears to allow you to modify the data type as follows: v Char, VarChar, LongVarChar. Select to specify that the underlying data type is a ustring. v Time. Select to indicate that the time field includes microseconds. v Timestamp. Select to indicate that the timestamp field includes microseconds.
74
v TinyInt, SmallInt, Integer, BigInt types. Select to indicate that the underlying data type is the equivalent uint field. Use the buttons at the bottom of the Edit Column Metadata dialog box to continue adding or editing columns, or to save and close. The buttons are: v Previous and Next. View the metadata in the previous or next row. These buttons are enabled only where there is a previous or next row enabled. If there are outstanding changes to the current row, you are asked whether you want to save them before moving on. v Close. Close the Edit Column Metadata dialog box. If there are outstanding changes to the current row, you are asked whether you want to save them before closing. v Apply. Save changes to the current row. v Reset. Remove all changes made to the row since the last time you applied changes. Click OK to save the column definitions and close the Edit Column Metadata dialog box. Remember, you can also edit a columns definition grid using the general grid editing controls .
75
results in a smaller set of columns, saving space and processing time and making the column set easier to understand. If you are importing column definitions that have been derived from a CFD file into server or parallel job stages, you are warned if any of the selected columns redefine other selected columns. You can choose to carry on with the load or go back and select columns again. 3. Save the table definition by clicking OK. You can edit the table definition to remove unwanted column definitions, assign data elements, or change branch names.
76
Propagating values
You can propagate the values for the properties set in a column to several other columns. Select the column whose values you want to propagate, then hold down shift and select the columns you want to propagate to. Choose Propagate values... from the shortcut menu to open the dialog box. In the Property column, click the check box for the property or properties whose values you want to propagate. The Usage field tells you if a particular property is applicable to certain types of job only (for example server, mainframe, or parallel) or certain types of table definition (for example COBOL). The Value field shows the value that will be propagated for a particular property.
77
1. ChooseImport Table Definitions Stored Procedure Definitions... from the main menu. A dialog box appears enabling you to connect to the data source containing the stored procedures. 2. Fill in the required connection details and click OK. Once a connection to the data source has been made successfully, the updated dialog box gives details of the stored procedures available for import. 3. Select the required stored procedures and click OK. The stored procedures are imported into the WebSphere DataStage Repository. Specific information about importing stored procedures is in WebSphere DataStage Developers Help.
78
Data element. The type of data in the column. Description. A text description of the column. v Format. Contains file format parameters for sequential files. This page is not used for a stored procedure definition. v NLS. Contains the name of the character set map to use with the table definitions. v Error codes. The Error Codes page allows you to specify which raiserror calls within the stored procedure produce a fatal error and which produce a warning. This page has the following fields: Fatal errors. Enter the raiserror values that you want to be regarded as a fatal error. The values should be separated by a space. Warnings. Enter the raiserror values that you want to be regarded as a warning. The values should be separated by a space.
79
Note: You do not need a result set if the stored procedure is used for input (writing to a database). However, in this case, you must have input parameters.
2. 3. 4. 5. 6.
7. 8.
9. 10.
80
To view a stored procedure definition, select it in the repository tree and do one of the following: v Choose Properties... from the shortcut menu. v Double-click the stored procedure definition in the display area. The Table Definition dialog box appears. You can edit or delete any of the column or parameter definitions.
81
82
83
v v v v v
Parameter name. The name of the parameter. Prompt. Text used as the field name in the run-time dialog box. Type. The type of the parameter (to enable validation). Default Value. The default setting for the parameter. Help text. The text that appears if a user clicks Property Help in the Job Run Options dialog box when running the job.
Specify the type of the parameter by choosing one of the following from the drop-down list in the Type column: v String. The default type. v Encrypted. Used to specify a password. The default value is set by double-clicking the Default Value cell to open the Setup Password dialog box. Type the password in the Encrypted String field and retype it in the Confirm Encrypted String field. It is displayed as asterisks. v Integer. Long int (-2147483648 to +2147483647). v Float. Double (1.79769313486232E308 to -4.94065645841247E-324 and 4.94065645841247E-324 to -1.79769313486232E308). v Pathname. Enter a default pathname or file name by typing it into Default Value or double-click the Default Value cell to open the Browse dialog box. v List. A list of valid string variables. To set up a list, double-click the Default Value cell to open the Setup List and Default dialog box. Build a list by typing in each item into the Value field, then clicking Add. The item then appears in the List box. To remove an item, select it in the List box and click Remove. Select one of the items from the Set Default drop-down list box to be the default. v Date. Date in the ISO format yyyy-mm-dd. v Time. Time in the format hh:mm:ss. WebSphere DataStage uses the parameter type to validate any values that are subsequently supplied for that parameter, be it in the Director or the Designer. You can supply default values for parameters, which are used unless another value is specified when the job is run. For most parameter types, you simply type an appropriate default value into the Default Value cell. When specifying a password or a list variable, double-click the Default Value cell to open further dialog boxes which allow you to supply defaults.
Environment variables
You can define a environment variable as a job parameter. When you run the job, you specify a value for the environment variable. To set a runtime value for an environment variable:
84
1. Click Add Environment Variable... at the bottom of the Parameters page. The Choose environment variable list appears. This shows a list of the available environment variables. 2. Click on the environment variable you want to override at runtime. It appears in the parameter grid, distinguished from job parameters by being preceded by a $. You can also click New... at the top of the list to define a new environment variable. A dialog box appears allowing you to specify name and prompt. The new variable is added to the Choose environment variable list and you can click on it to add it to the parameters grid. 3. Set the required value in the Default Value column. This is the only field you can edit for an environment variable. Depending on the type of variable, a further dialog box might appear to help you enter a value. When you run the job and specify a value for the environment variable, you can specify one of the following special values: v $ENV. Instructs WebSphere DataStage to use the current setting for the environment variable. v $PROJDEF. The current setting for the environment variable is retrieved and set in the jobs environment (so that value is used wherever in the job the environment variable is used). If the value of that environment variable is subsequently changed in the Administrator client, the job will pick up the new value without the need for recompiling. v $UNSET. Instructs WebSphere DataStage to explicitly unset the environment variable. Environment variables are set up using the Administrator client.
85
v Help text. The text that appears if a user clicks Property Help in the Job Run Options dialog box when running the job. Specify the type of the parameter by choosing one of the following from the drop-down list in the Type column: v String. The default type. v Encrypted. Used to specify a password. The default value is set by double-clicking the Default Value cell to open the Setup Password dialog box. Type the password in the Encrypted String field and retype it in the Confirm Encrypted String field. It is displayed as asterisks. v Integer. Long int (-2147483648 to +2147483647). v Float. Double (1.79769313486232E308 to -4.94065645841247E-324 and 4.94065645841247E-324 to -1.79769313486232E308). v Pathname. Enter a default pathname or file name by typing it into Default Value or double-click the Default Value cell to open the Browse dialog box. v List. A list of valid string variables. To set up a list, double-click the Default Value cell to open the Setup List and Default dialog box. Build a list by typing in each item into the Value field, then clicking Add. The item then appears in the List box. To remove an item, select it in the List box and click Remove. Select one of the items from the Set Default drop-down list box to be the default. v Date. Date in the ISO format yyyy-mm-dd. v Time. Time in the format hh:mm:ss. WebSphere DataStage uses the parameter type to validate any values that are subsequently supplied for that parameter, be it in the Director or the Designer. You can supply default values for parameters, which are used unless another value is specified when the job is run. For most parameter types, you simply type an appropriate default value into the Default Value cell. When specifying a password or a list variable, double-click the Default Value cell to open further dialog boxes which allow you to supply defaults.
86
v In ODBC or UniVerse stages. You can use job parameters in the following fields in the stage dialog box: Data source name field on the General tab on the Stage page User name and Password fields on the General tab on the Stage page Account name or Use directory path fields on the Details tab on the Stage page (UniVerse stage only) Table name field on the General tab on the Inputs or Outputs page WHERE clause field on the Selection tab on the Outputs page Value cell on the Parameters tab, which appears in the Outputs page when you use a stored procedure (ODBC stage only) Expression field on the Derivation dialog box, opened from the Derivation column in the Outputs page of a UniVerse or ODBC Stage dialog box v In Hashed File stages. You can use job parameters in the following fields in the Hashed File Stage dialog box: Use account name or Use directory path fields on the Stage page File name field on the General tab on the Inputs or Outputs page v In UniData stages. You can use job parameters in the following fields in the UniData Stage dialog box: Server, Database, User name, and Password fields on the Stage page File name field on the General tab on the Inputs or Outputs page v In Folder stages. You can use job parameters in the following fields in the Folder stage dialog box: Properties in the Properties tab of the Stage page Properties in the Properties tab of the Outputs page v Before and after subroutines. You can use job parameters to specify argument values for before and after subroutines.
Environment variables
You can define a environment variable as a job parameter. When you run the job, you specify a value for the environment variable. To set a runtime value for an environment variable: 1. Click Add Environment Variable... at the bottom of the Parameters page. The Choose environment variable list appears. This shows a list of the available environment variables. 2. Click on the environment variable you want to override at runtime. It appears in the parameter grid, distinguished from job parameters by being preceded by a $. You can also click New... at the top of the list to define a new environment variable. A dialog box appears allowing you to specify name and prompt. The new variable is added to the Choose environment variable list and you can click on it to add it to the parameters grid. 3. Set the required value in the Default Value column. This is the only field you can edit for an environment variable. Depending on the type of variable, a further dialog box might appear to help you enter a value. When you run the job and specify a value for the environment variable, you can specify one of the following special values: v $ENV. Instructs WebSphere DataStage to use the current setting for the environment variable.
Chapter 5. Making your jobs adaptable
87
v $PROJDEF. The current setting for the environment variable is retrieved and set in the jobs environment (so that value is used wherever in the job the environment variable is used). If the value of that environment variable is subsequently changed in the Administrator client, the job will pick up the new value without the need for recompiling. v $UNSET. Instructs WebSphere DataStage to explicitly unset the environment variable. Environment variables are set up using the Administrator client.
88
v List. A list of valid string variables. To set up a list, double-click the Default Value cell to open the Setup List and Default dialog box. Build a list by typing in each item into the Value field, then clicking Add. The item then appears in the List box. To remove an item, select it in the List box and click Remove. Select one of the items from the Set Default drop-down list box to be the default. v Date. Date in the ISO format yyyy-mm-dd. v Time. Time in the format hh:mm:ss. 4. Optionally enter a default value for the parameter in the Default Value field. When someone runs a job using this parameter they can accept the default value for the parameter, pick up a value from a value set file (see Parameter Set Dialog Box - Values Page), or specify a new one. For most parameter types you can type directly into this field. When specifying an encrypted string or a list variable, double-click the Default Value field to open further dialog boxes which allow you to supply defaults. 5. Optionally provide some help text for the parameter in the Help Text field. When someone runs a job that uses this parameter, they can click Help when prompted for a value. The text you enter here will be displayed.
89
(Assuming the server install directory is C:\IBM\InformationServer\Server.) To specify a value file: 1. Enter the name of the value file in the Value File name field. The name must start with an alphabetic character and comprise alphanumeric and underscore characters. The maximum length is 255 characters. 2. The remaining fields are the parameters you specified on the Parameters page. For each file you specify, enter a value for each of the parameters. You will be offered the default values you specified on the Parameters page and can accept these if you want. 3. Repeat for each value file you want to specify.
90
91
Any of these individual parameter values can be overridden if required. If you want to use the values in a different value file, you can choose from a drop-down list of available files.
The job would run with the default values specified with the parameter set Runschedule:
92
If the Default Value for the parameter set is set to CarlMonday then the job would run with the default values defined in the file CarlMonday:
DayOfWeek = Mon Operator = Carl Password - ****** Temporary Directory = c:\temp
You can specify a new value file or override individual parameters if required. If you wanted to use the values specified in the BethWed file, but override the setting of the DayOfWeek parameter, then you would start the job on the command line using the following command:
dsjob -run -param Runschedule=BethWed -param Runschedule.DayOfWeek = sat dstage Build_Mart_OU
93
2. Select the parameter set in the grid and click Insert Parameter Value. The parameter set value will be filled in for you.
94
Local containers
The main purpose of using a WebSphere DataStage local container is to simplify a complex design visually to make it easier to understand in the Diagram window. If the job has lots of stages and links, it might be easier to create additional containers to describe a particular sequence of steps. Containers are linked to other stages or containers in the job by input and output stages. You can create a local container from scratch, or place a set of existing stages and links within a container. A local container is only accessible to the job in which it is created.
95
warned if any link naming conflicts occur when the container is constructed. The new container is opened and focus shifts onto its tab.
You can edit the stages and links in a container in the same way you do for a job. See Using Input and Output Stages for details on how to link the container to other stages in the job.
96
You can do this regardless of whether you created it from a group in the first place. To deconstruct a local container, do one of the following: v Select the container stage in the Job Diagram window and choose Deconstruct from the shortcut menu. v Select the container stage in the Job Diagram window and choose Edit Deconstruct Container from the main menu. WebSphere DataStage prompts you to confirm the action (you can disable this prompt if required). Click OK and the constituent parts of the container appear in the Job Diagram window, with existing stages and links shifted to accommodate them. If any name conflicts arise during the deconstruction process between stages from the container and existing ones, you are prompted for new names. You can click the Use Generated Names checkbox to have WebSphere DataStage allocate new names automatically from then on. If the container has any unconnected links, these are discarded. Connected links remain connected. Deconstructing a local container is not recursive. If the container you are deconstructing contains other containers, they move up a level but are not themselves deconstructed.
Shared containers
Shared containers help you to simplify your design but, unlike local containers, they are reusable by other jobs. You can use shared containers to make common job components available throughout the project. You can create a shared container from a stage and associated metadata and add the shared container to the palette to make this pre-configured stage available to other jobs. You can also insert a server shared container into a parallel job as a way of making server job functionality available. For example, you could use it to give the parallel job access to the functionality of a server transform function. (Note that you can only use server shared containers on SMP systems, not MPP or cluster systems.) Shared containers comprise groups of stages and links and are stored in the Repository like WebSphere DataStage jobs. When you insert a shared container into a job, WebSphere DataStage places an instance of that container into the design. When you compile the job containing an instance of a shared container, the code for the container is included in the compiled job. You can use the WebSphere DataStage debugger on instances of shared containers used within server jobs. When you add an instance of a shared container to a job, you will need to map metadata for the links into and out of the container, as these can vary in each job in which you use the shared container. If you change the contents of a shared container, you will need to recompile those jobs that use the container in order for the changes to take effect. For parallel shared containers, you can take advantage of runtime column propagation to avoid the need to map the metadata. If you enable runtime column propagation, then, when the jobs runs, metadata will be automatically propagated across the boundary between the shared container and the stage(s) to which it connects in the job.
97
Note that there is nothing inherently parallel about a parallel shared container although the stages within it have parallel capability. The stages themselves determine how the shared container code will run. Conversely, when you include a server shared container in a parallel job, the server stages have no parallel capability, but the entire container can operate in parallel because the parallel job can execute multiple instances of it. You can create a shared container from scratch, or place a set of existing stages and links within a shared container. Note: If you encounter a problem when running a job which uses a server shared container in a parallel job, you could try increasing the value of the DSIPC_OPEN_TIMEOUT environment variable in the Parallel Operator specific category of the environment variable dialog box in the WebSphere DataStage Administrator.
98
v v v v
Select its icon in the repository tree and select Edit from the shortcut menu. Drag its icon from the Designer repository tree to the diagram area. Select its icon in the job design and select Open from the shortcut menu. Choose File Open from the main menu and select the shared container from the Open dialog box.
A Diagram window appears, showing the contents of the shared container. You can edit the stages and links in a container in the same way you do for a job. Note: The shared container is edited independently of any job in which it is used. Saving a job, for example, will not save any open shared containers used in that job.
99
v Help text. The text that appears in the Job Container Stage editor to help the designer add a value for the parameter in a job design (see Using a Shared Container in a Job). v View parameter set. This button is available when you have added a parameter set to the grid and selected it. Click this button to open a window showing details of the selected parameter set. v Add parameter set. Click this button to add a parameters set to the container.
Stage page
v Stage Name. The name of the instance of the shared container. You can edit this if required. v Shared Container Name. The name of the shared container of which this is an instance. You cannot change this. The General tab enables you to add an optional description of the container instance. The Properties tab allows you to specify values for container parameters. You need to have defined some parameters in the shared container properties for this tab to appear. v Name. The name of the expected parameter. v Value. Enter a value for the parameter. You must enter values for all expected parameters here as the job does not prompt for these at run time. (You can leave string parameters blank, an empty string will be inferred.) v Insert Parameter. You can use a parameter from a parent job (or container) to supply a value for a container parameter. Click Insert Parameter to be offered a list of available parameters from which to choose.
100
The Advanced tab appears when you are using a server shared container within a parallel job. It has the same fields and functionality as the Advanced tab on all parallel stage editors.
Inputs page
When inserted in a job, a shared container instance already has metadata defined for its various links. This metadata must match that on the link that the job uses to connect to the container exactly in all properties. The inputs page enables you to map metadata as required. The only exception to this is where you are using runtime column propagation (RCP) with a parallel shared container. If RCP is enabled for the job, and specifically for the stage whose output connects to the shared container input, then metadata will be propagated at run time, so there is no need to map it at design time. In all other cases, in order to match, the metadata on the links being matched must have the same number of columns, with corresponding properties for each. The Inputs page for a server shared container has an Input field and two tabs, General and Columns. The Inputs page for a parallel shared container, or a server shared container used in a parallel job, has an additional tab: Partitioning. v Input. Choose the input link to the container that you want to map. The General page has these fields: v Map to Container Link. Choose the link within the shared container to which the incoming job link will be mapped. Changing the link triggers a validation process, and you will be warned if the metadata does not match and are offered the option of reconciling the metadata as described below. v Validate. Click this to request validation of the metadata on the two links. You are warned if validation fails and given the option of reconciling the metadata. If you choose to reconcile, the metadata on the container link replaces that on the job link. Surplus columns on the job link are removed. Job link columns that have the same name but different properties as a container column will have the properties overwritten, but derivation information preserved. Note: You can use a Transformer stage within the job to manually map data between a job stage and the container stage in order to supply the metadata that the container requires. v Description. Optional description of the job input link. The Columns page shows the metadata defined for the job stage link in a standard grid. You can use the Reconcile option on the Load button to overwrite metadata on the job stage link with the container link metadata in the same way as described for the Validate option. The Partitioning tab appears when you are using a server shared container within a parallel job. It has the same fields and functionality as the Partitioning tab on all parallel stage editors. The Advanced tab appears for parallel shared containers and when you are using a server shared container within a parallel job. It has the same fields and functionality as the Advanced tab on all parallel stage editors.
101
Outputs page
The Outputs page enables you to map metadata between a container link and the job link which connects to the container on the output side. It has an Outputs field and a General tab, Columns tab, and Advanced tab, that perform equivalent functions as described for the Inputs page. The columns tab for parallel shared containers has a Runtime column propagation check box. This is visible provided RCP is enabled for the job. It shows whether RCP is switched on or off for the link the container link is mapped onto. This removes the need to map the metadata.
Pre-configured components
You can use shared containers to make pre-configured stages available to other jobs. To do this: 1. Select a stage and relevant input/output link (you need the link too in order to retain metadata). 2. Choose Copy from the shortcut menu, or select Edit Copy. 3. Select Edit Paste special Into new shared container... . The Paste Special into new Shared Container dialog box appears). 4. Choose to create an entry for this container in the palette (the dialog will do this by default). To use the pre-configured component, select the shared container in the palette and Ctrl+drag it onto canvas. This deconstructs the container so the stage and link appear on the canvas.
Converting containers
You can convert local containers to shared containers and vice versa. By converting a local container to a shared one you can make the functionality available to all jobs in the project. You might want to convert a shared container to a local one if you want to slightly modify its functionality within a job. You can also convert a shared container to a local container and then deconstruct it into its constituent parts as described in Deconstructing a Local Container. To convert a container, select its stage icon in the job Diagram window and do one of the following: v Choose Convert from the shortcut menu. v Choose Edit Convert Container from the main menu. WebSphere DataStage prompts you to confirm the conversion. Containers nested within the container you are converting are not affected. When converting from shared to local, you are warned if link name conflicts occur and given a chance to resolve them.
102
A shared container cannot be converted to a local container if it has a parameter with the same name as a parameter in the parent job (or container) which is not derived from the parents corresponding parameter. You are warned if this occurs and must resolve the conflict before the container can be converted. Note: Converting a shared container instance to a local container has no affect on the original shared container.
103
104
Parallel routines
Parallel jobs can execute routines before or after a processing stage executes (a processing stage being one that takes input, processes it then outputs it in a single stage). These routines are defined and stored in the repository, and then called in the Triggers page of the particular Transformer stage Properties dialog box. These routines must be supplied in a UNIX shared library or an object file, and do not return a value (any values returned are ignored). Example parallel routines are supplied on the WebSphere DataStage Installation CD in the directory Samples/TrxExternalFunctions. The readme file explains how to use the examples on each platform.
105
and you must ensure that the shared library is available at run time. For the Library invocation method the routine must be provided in a shared library rather than an object file. If you choose Object the function is linked into the job, and so does not need to be available at run time. The routine can be contained in a shared library or an object file. Note that, if you use the Object option, and subsequently update the function, the job will need to be recompiled to pick up the update. If you choose the Library option, you must enter the pathname of the shared library file in the Library path field. If you choose the Object option you must enter the pathname of the object file in the Library path field. v External subroutine name. The C function that this routine is calling (must be the name of a valid routine in a shared library). v Return Type. Choose the type of the value that the function will return. The drop-down list offers a choice of native C types. This is unavailable for External Before/After Routines, which do not return a value. Note that the actual type definitions in function implementations might vary depending on platform type. This particularly applies to `long and `unsigned long C native types. These can be defined as `long and `unsigned long on Tru64 platforms, but for all other platforms should be defined as `long long and `unsigned long long in the actual code. Similarly a return type of `char should be defined as `signed char in the code on all platforms. v Library path. If you have specified the Library option, type or browse on the server for the pathname of the shared library that contains the function. This is used at compile time to locate the function. The pathname should be the exact name of the library or object file, and must have the prefix lib and the appropriate suffix, for example, /disk1/userlibs/libMyFuncs.so, /disk1/userlibs/MyStaticFuncs.o. Suffixes are as follows: Solaris .so or .a AIX .so or .a HPUX .a or .sl Tru64 .so or .a If you have specified the Object option, enter the pathname of the object file. Typically the file will be suffixed with .o. This file must exist and be a valid object file for the linker. v Short description. Type an optional brief description of the routine. v Long description. Type an optional detailed description of the routine. 3. Next, select the Creator page to enter creator information: The Creator page allows you to specify information about the creator and version number of the routine, as follows: v Vendor. Type the name of the company who created the routine. v Author. Type the name of the person who created the routine. v Version. Type the version number of the routine. This is used when the routine is imported. The Version field contains a three-part version number, for example, 3.1.1. The first part of this number is an internal number used to check compatibility between the routine and the WebSphere DataStage system, and cannot be changed. The second part of this number represents the release number. This number should be incremented when major changes are made to the routine definition or the underlying code. The new release of the routine supersedes any previous release. Any jobs using the routine use the new release. The last part of this number marks intermediate releases when a minor change or fix has taken place. v Copyright. Type the copyright information. 4. The last step is to define routine arguments by selecting the Arguments page. The underlying functions for External Functions can have any number of arguments, with each argument name being unique within the function
106
definition. The underlying functions for External Before/After routines can have up to eight arguments, with each argument name being unique within the function definition. In both cases these names must conform to C variable name standards. Expected arguments for a routine appear in the expression editor, or on the Triggers page of the transformer stage Properties dialog box, delimited by % characters (for example, %arg1%). When actually using a routine, substitute the argument value for this string. Fill in the following fields: v Argument name. Type the name of the argument to be passed to the routine. v I/O type. All arguments are input arguments, so the I/O type can only be set to I. v Native type. Offers a choice of the C native types in a drop-down list. Note that the actual type definitions in function implementations might vary depending on platform type. This particularly applies to `long and `unsigned long C native types. These can be defined as `long and `unsigned long on Tru64 platforms, but for all other platforms should be defined as `long long and `unsigned long long in the actual code. Similarly a return type of `char should be defined as `signed char in the code on all platforms. v Description. Type an optional description of the argument. 5. When you are happy with your routine definition, Click OK. The Save As dialog box appears. 6. Select the folder in the repository tree where you want to store the routine and click OK.
107
The stage will be available to all jobs in the project in which the stage was defined. You can make it available to other projects using the Designer Export/Import facilities. The stage is automatically added to the job palette. To define a custom stage type: 1. Do one of: a. Choose File New from the Designer menu. The New dialog box appears. b. Open the Other folder and select the Parallel Stage Type icon. c. Click OK. The Parallel Routine dialog box appears, with the General page on top. Or: d. Select a folder in the repository tree. e. Choose New Other Parallel Stage Custom from the shortcut menu. The Stage Type dialog box appears, with the General page on top. 2. Fill in the fields on the General page as follows: v Stage type name. This is the name that the stage will be known by to WebSphere DataStage. Avoid using the same name as existing stages. v Parallel Stage type. This indicates the type of new Parallel job stage you are defining (Custom, Build, or Wrapped). You cannot change this setting. v Execution Mode. Choose the execution mode. This is the mode that will appear in the Advanced tab on the stage editor. You can override this mode for individual instances of the stage as required, unless you select Parallel only or Sequential only. v Mapping. Choose whether the stage has a Mapping tab or not. A Mapping tab enables the user of the stage to specify how output columns are derived from the data produced by the stage. Choose None to specify that output mapping is not performed, choose Default to accept the default setting that WebSphere DataStage uses. v Preserve Partitioning. Choose the default setting of the Preserve Partitioning flag. This is the setting that will appear in the Advanced tab on the stage editor. You can override this setting for individual instances of the stage as required. v Partitioning. Choose the default partitioning method for the stage. This is the method that will appear in the Inputs page Partitioning tab of the stage editor. You can override this method for individual instances of the stage as required. v Collecting. Choose the default collection method for the stage. This is the method that will appear in the Inputs page Partitioning tab of the stage editor. You can override this method for individual instances of the stage as required. v Operator. Enter the name of the Orchestrate operator that you want the stage to invoke. v Short Description. Optionally enter a short description of the stage. v Long Description. Optionally enter a long description of the stage. 3. Go to the Links page and specify information about the links allowed to and from the stage you are defining. Use this to specify the minimum and maximum number of input and output links that your custom stage can have, and to enable the ViewData feature for target data (you cannot enable target ViewData if your stage has any output links). When the stage is used in a job design, a ViewData button appears on
108
the Input page, which allows you to view the data on the actual data target (provided some has been written there). In order to use the target ViewData feature, you have to specify an Orchestrate operator to read the data back from the target. This will usually be different to the operator that the stage has used to write the data (that is, the operator defined in the Operator field of the General page). Specify the reading operator and associated arguments in the Operator and Options fields. If you enable target ViewData, a further field appears in the Properties grid, called ViewData. 4. Go to the Creator page and optionally specify information about the stage you are creating. We recommend that you assign a version number to the stage so you can keep track of any subsequent changes. You can specify that the actual stage will use a custom GUI by entering the ProgID for a custom GUI in the Custom GUI Prog ID field. You can also specify that the stage has its own icon. You need to supply a 16 x 16 bit bitmap and a 32 x 32 bit bitmap to be displayed in various places in the WebSphere DataStage user interface. Click the 16 x 16 Bitmap button and browse for the smaller bitmap file. Click the 32 x 32 Bitmap button and browse for the large bitmap file. Note that bitmaps with 32-bit color are not supported. Click the Reset Bitmap Info button to revert to using the default WebSphere DataStage icon for this stage. 5. Go to the Properties page. This allows you to specify the options that the Orchestrate operator requires as properties that appear in the Stage Properties tab. For custom stages the Properties tab always appears under the Stage page. 6. Fill in the fields as follows: v Property name. The name of the property. v Data type. The data type of the property. Choose from: Boolean Float Integer String Pathname List Input Column Output Column If you choose Input Column or Output Column, when the stage is included in a job a drop-down list will offer a choice of the defined input or output columns. If you choose list you should open the Extended Properties dialog box from the grid shortcut menu to specify what appears in the list. Prompt. The name of the property that will be displayed on the Properties tab of the stage editor. Default Value. The value the option will take if no other is specified. Required. Set this to True if the property is mandatory. Repeats. Set this true if the property repeats (that is, you can have multiple instances of it). Use Quoting. Specify whether the property will haves quotes added when it is passed to the Orchestrate operator.
Chapter 7. Defining special components
v v v v v
109
v Conversion. Specifies the type of property as follows: -Name. The name of the property will be passed to the operator as the option value. This will normally be a hidden property, that is, not visible in the stage editor. -Name Value. The name of the property will be passed to the operator as the option name, and any value specified in the stage editor is passed as the value. -Value. The value for the property specified in the stage editor is passed to the operator as the option name. Typically used to group operator options that are mutually exclusive. Value only. The value for the property specified in the stage editor is passed as it is. Input Schema. Specifies that the property will contain a schema string whose contents are populated from the Input page Columns tab. Output Schema. Specifies that the property will contain a schema string whose contents are populated from the Output page Columns tab. None. This allows the creation of properties that do not generate any osh, but can be used for conditions on other properties (for example, for use in a situation where you have mutually exclusive properties, but at least one of them must be specified). v Schema properties require format options. Select this check box to specify that the stage being specified will have a Format tab. If you have enabled target ViewData on the Links page, the following property is also displayed: v ViewData. Select Yes to indicate that the value of this property should be used when viewing data. For example, if this property specifies a file to write to when the stage is used in a job design, the value of this property will be used to read the data back if ViewData is used in the stage. If you select a conversion type of Input Schema or Output Schema, you should note the following: v Data Type is set to String. v Required is set to Yes. v The property is marked as hidden and will not appear on the Properties page when the custom stage is used in a job design. If your stage can have multiple input or output links there would be a Input Schema property or Output Schema property per-link. When the stage is used in a job design, the property will contain the following OSH for each input or output link:
-property_name record {format_props} ( column_definition {format_props}; ...)
Where: v property_name is the name of the property (usually `schema) v format_properties are formatting information supplied on the Format page (if the stage has one). v there is one column_definition for each column defined in the Columns tab for that link. The format_props in this case refers to per-column format information specified in the Edit Column Metadata dialog box. Schema properties are mutually exclusive with schema file properties. If your custom stage supports both, you should use the Extended Properties
110
dialog box to specify a condition of schemafile= for the schema property. The schema property is then only valid provided the schema file property is blank (or does not exist). 7. If you want to specify a list property, or otherwise control how properties are handled by your stage, choose Extended Properties from the Properties grid shortcut menu to open the Extended Properties dialog box. The settings you use depend on the type of property you are specifying: v Specify a category to have the property appear under this category in the stage editor. By default all properties appear in the Options category. v Specify that the property will be hidden and not appear in the stage editor. This is primarily intended to support the case where the underlying operator needs to know the JobName. This can be passed using a mandatory String property with a default value that uses a DS Macro. However, to prevent the user from changing the value, the property needs to be hidden. v If you are specifying a List category, specify the possible values for list members in the List Value field. v If the property is to be a dependent of another property, select the parent property in the Parents field. v Specify an expression in the Template field to have the actual value of the property generated at compile time. It is usually based on values in other properties and columns. v Specify an expression in the Conditions field to indicate that the property is only valid if the conditions are met. The specification of this property is a bar | separated list of conditions that are ANDed together. For example, if the specification was a=b|c!=d, then this property would only be valid (and therefore only available in the GUI) when property a is equal to b, and property c is not equal to d. 8. If your custom stage will create columns, go to the Mapping Additions page. It contains a grid that allows for the specification of columns created by the stage. You can also specify that column details are filled in from properties supplied when the stage is used in a job design, allowing for dynamic specification of columns. The grid contains the following fields: v Column name. The name of the column created by the stage. You can specify the name of a property you specified on the Property page of the dialog box to dynamically allocate the column name. Specify this in the form #property_name#, the created column will then take the value of this property, as specified at design time, as the name of the created column. v Parallel type. The type of the column (this is the underlying data type, not the SQL data type). Again you can specify the name of a property you specified on the Property page of the dialog box to dynamically allocate the column type. Specify this in the form #property_name#, the created column will then take the value of this property, as specified at design time, as the type of the created column. (Note that you cannot use a repeatable property to dynamically allocate a column type in this way.) v Nullable. Choose Yes or No to indicate whether the created column can contain a null. v Conditions. Allows you to enter an expression specifying the conditions under which the column will be created. This could, for example, depend on the setting of one of the properties specified in the Property page.
111
You can propagate the values of the Conditions fields to other columns if required. Do this by selecting the columns you want to propagate to, then right-clicking in the source Conditions field and choosing Propagate from the shortcut menu. A dialog box asks you to confirm that you want to propagate the conditions to all columns. 9. Click OK when you are happy with your custom stage definition. he Save As dialog box appears. 10. Select the folder in the repository tree where you want to store the stage type and click OK.
112
Pre-loop code executed before any records are processed Per-record code. Used to process each record
Transfer directly copies records from input buffer to output buffer. Records can still be accessed by code while in the buffer.
Build Stage
The following shows a build stage in diagrammatic form: To define a Build stage: 1. Do one of: a. Choose File New from the Designer menu. The New dialog box appears. b. Open the Other folder and select the Parallel Stage Type icon. c. Click OK. The Parallel Routine dialog box appears, with the General page on top. Or: d. Select a folder in the repository tree. e. Choose New Other Parallel Stage Build from the shortcut menu. The Stage Type dialog box appears, with the General page on top. 2. Fill in the fields on the General page as follows: v Stage type name. This is the name that the stage will be known by to WebSphere DataStage. Avoid using the same name as existing stages. v Class Name. The name of the C++ class. By default this takes the name of the stage type. v Parallel Stage type. This indicates the type of new parallel job stage you are defining (Custom, Build, or Wrapped). You cannot change this setting. v Execution mode. Choose the default execution mode. This is the mode that will appear in the Advanced tab on the stage editor. You can override this mode for individual instances of the stage as required, unless you select Parallel only or Sequential only. v Preserve Partitioning. This shows the default setting of the Preserve Partitioning flag, which you cannot change in a Build stage. This is the setting that will appear in the Advanced tab on the stage editor. You can override this setting for individual instances of the stage as required. v Partitioning. This shows the default partitioning method, which you cannot change in a Build stage. This is the method that will appear in the Inputs Page Partitioning tab of the stage editor. You can override this method for individual instances of the stage as required.
Chapter 7. Defining special components
113
v Collecting. This shows the default collection method, which you cannot change in a Build stage. This is the method that will appear in the Inputs Page Partitioning tab of the stage editor. You can override this method for individual instances of the stage as required. v Operator. The name of the operator that your code is defining and which will be executed by the WebSphere DataStage stage. By default this takes the name of the stage type. v Short Description. Optionally enter a short description of the stage. v Long Description. Optionally enter a long description of the stage. 3. Go to the Creator page and optionally specify information about the stage you are creating. We recommend that you assign a release number to the stage so you can keep track of any subsequent changes. You can specify that the actual stage will use a custom GUI by entering the ProgID for a custom GUI in the Custom GUI Prog ID field. You can also specify that the stage has its own icon. You need to supply a 16 x 16 bit bitmap and a 32 x 32 bit bitmap to be displayed in various places in the WebSphere DataStage user interface. Click the 16 x 16 Bitmap button and browse for the smaller bitmap file. Click the 32 x 32 Bitmap button and browse for the large bitmap file. Note that bitmaps with 32-bit color are not supported. Click the Reset Bitmap Info button to revert to using the default WebSphere DataStage icon for this stage. 4. Go to the Properties page. This allows you to specify the options that the Build stage requires as properties that appear in the Stage Properties tab. For custom stages the Properties tab always appears under the Stage page. Fill in the fields as follows: v Property name. The name of the property. This will be passed to the operator you are defining as an option, prefixed with `- and followed by the value selected in the Properties tab of the stage editor. v Data type. The data type of the property. Choose from: Boolean Float Integer String Pathname List Input Column Output Column If you choose Input Column or Output Column, when the stage is included in a job a drop-down list will offer a choice of the defined input or output columns. If you choose list you should open the Extended Properties dialog box from the grid shortcut menu to specify what appears in the list. Prompt. The name of the property that will be displayed on the Properties tab of the stage editor. Default Value. The value the option will take if no other is specified. Required. Set this to True if the property is mandatory. Conversion. Specifies the type of property as follows: -Name. The name of the property will be passed to the operator as the option value. This will normally be a hidden property, that is, not visible in the stage editor.
v v v v
114
-Name Value. The name of the property will be passed to the operator as the option name, and any value specified in the stage editor is passed as the value. -Value. The value for the property specified in the stage editor is passed to the operator as the option name. Typically used to group operator options that are mutually exclusive. Value only. The value for the property specified in the stage editor is passed as it is. 5. If you want to specify a list property, or otherwise control how properties are handled by your stage, choose Extended Properties from the Properties grid shortcut menu to open the Extended Properties dialog box. The settings you use depend on the type of property you are specifying: v Specify a category to have the property appear under this category in the stage editor. By default all properties appear in the Options category. v If you are specifying a List category, specify the possible values for list members in the List Value field. v If the property is to be a dependent of another property, select the parent property in the Parents field. v Specify an expression in the Template field to have the actual value of the property generated at compile time. It is usually based on values in other properties and columns. v Specify an expression in the Conditions field to indicate that the property is only valid if the conditions are met. The specification of this property is a bar | separated list of conditions that are ANDed together. For example, if the specification was a=b|c!=d, then this property would only be valid (and therefore only available in the GUI) when property a is equal to b, and property c is not equal to d. Click OK when you are happy with the extended properties. 6. Click on the Build page. The tabs here allow you to define the actual operation that the stage will perform. The Interfaces tab enable you to specify details about inputs to and outputs from the stage, and about automatic transfer of records from input to output. You specify port details, a port being where a link connects to the stage. You need a port for each possible input link to the stage, and a port for each possible output link from the stage. You provide the following information on the Input sub-tab: v Port Name. Optional name for the port. The default names for the ports are in0, in1, in2 ... . You can refer to them in the code using either the default name or the name you have specified. v Alias. Where the port name contains non-ascii characters, you can give it an alias in this column (this is only available where NLS is enabled). v AutoRead. This defaults to True which means the stage will automatically read records from the port. Otherwise you explicitly control read operations in the code. v Table Name. Specify a table definition in the WebSphere DataStage Repository which describes the metadata for the port. You can browse for a table definition by choosing Select Table from the menu that appears when you click the browse button. You can also view the schema corresponding to this table definition by choosing View Schema from the same menu. You do not have to supply a Table Name. If any of the columns in your table definition have names that contain non-ascii characters, you should choose
Chapter 7. Defining special components
115
v v
Column Aliases from the menu. The Build Column Aliases dialog box appears. This lists the columns that require an alias and let you specify one. RCP. Choose True if runtime column propagation is allowed for inputs to this port. Defaults to False. You do not need to set this if you are using the automatic transfer facility. You provide the following information on the Output sub-tab: Port Name. Optional name for the port. The default names for the links are out0, out1, out2 ... . You can refer to them in the code using either the default name or the name you have specified. Alias. Where the port name contains non-ascii characters, you can give it an alias in this column. AutoWrite. This defaults to True which means the stage will automatically write records to the port. Otherwise you explicitly control write operations in the code. Once records are written, the code can no longer access them.
v Table Name. Specify a table definition in the WebSphere DataStage Repository which describes the metadata for the port. You can browse for a table definition. You do not have to supply a Table Name. A shortcut menu accessed from the browse button offers a choice of Clear Table Name, Select Table, Create Table,View Schema, and Column Aliases. The use of these is as described for the Input sub-tab. v RCP. Choose True if runtime column propagation is allowed for outputs from this port. Defaults to False. You do not need to set this if you are using the automatic transfer facility. The Transfer sub-tab allows you to connect an input buffer to an output buffer such that records will be automatically transferred from input to output. You can also disable automatic transfer, in which case you have to explicitly transfer data in the code. Transferred data sits in an output buffer and can still be accessed and altered by the code until it is actually written to the port. You provide the following information on the Transfer tab: v Input. Select the input port to connect to the buffer from the list. If you have specified an alias, this will be displayed here. v Output. Select an output port from the list. Records are transferred from the output buffer to the selected output port. If you have specified an alias for the output port, this will be displayed here. v Auto Transfer. This defaults to False, which means that you have to include code which manages the transfer. Set to True to have the transfer carried out automatically. v Separate. This is False by default, which means this transfer will be combined with other transfers to the same port. Set to True to specify that the transfer should be separate from other transfers. The Logic tab is where you specify the actual code that the stage executes. The Definitions sub-tab allows you to specify variables, include header files, and otherwise initialize the stage before processing any records. The Pre-Loop sub-tab allows you to specify code which is executed at the beginning of the stage, before any records are processed. The Per-Record sub-tab allows you to specify the code which is executed once for every record processed. The Post-Loop sub-tab allows you to specify code that is executed after all the records have been processed.
116
v v v v
You can type straight into these pages or cut and paste from another editor. The shortcut menu on the Pre-Loop, Per-Record, and Post-Loop pages gives access to the macros that are available for use in the code. The Advanced tab allows you to specify details about how the stage is compiled and built. Fill in the page as follows: Compile and Link Flags. Allows you to specify flags that are passed to the C++ compiler. Verbose. Select this check box to specify that the compile and build is done in verbose mode. Debug. Select this check box to specify that the compile and build is done in debug mode. Otherwise, it is done in optimize mode. Suppress Compile. Select this check box to generate files without compiling, and without deleting the generated files. This option is useful for fault finding.
v Base File Name. The base filename for generated files. All generated files will have this name followed by the appropriate suffix. This defaults to the name specified under Operator on the General page. v Source Directory. The directory where generated .c files are placed. This defaults to the buildop folder in the current project directory. You can also set it using the DS_OPERATOR_BUILDOP_DIR environment variable in the Administrator client. v Header Directory. The directory where generated .h files are placed. This defaults to the buildop folder in the current project directory. You can also set it using the DS_OPERATOR_BUILDOP_DIR environment variable environment variable in the Administrator client. v Object Directory. The directory where generated .so files are placed. This defaults to the buildop folder in the current project directory. You can also set it using the DS_OPERATOR_BUILDOP_DIR environment variable environment variable in the Administrator client. v Wrapper directory. The directory where generated .op files are placed. This defaults to the buildop folder in the current project directory. You can also set it using the DS_OPERATOR_BUILDOP_DIR environment variable environment variable in the Administrator client. 7. When you have filled in the details in all the pages, click Generate to generate the stage. A window appears showing you the result of the build.
117
The UNIX command that you wrap can be a built-in command, such as grep, a utility, such as SyncSort, or your own UNIX application. The only limitation is that the command must be `pipe-safe (to be pipe-safe a UNIX command reads its input sequentially, from beginning to end). You need to define metadata for the data being input to and output from the stage. You also need to define the way in which the data will be input or output. UNIX commands can take their inputs from standard in, or another stream, a file, or from the output of another command via a pipe. Similarly data is output to standard out, or another stream, to a file, or to a pipe to be input to another command. You specify what the command expects. WebSphere DataStage handles data being input to the Wrapped stage and will present it in the specified form. If you specify a command that expects input on standard in, or another stream, WebSphere DataStage will present the input data from the jobs data flow as if it was on standard in. Similarly it will intercept data output on standard out, or another stream, and integrate it into the jobs data flow. You also specify the environment in which the UNIX command will be executed when you define the wrapped stage. To define a Wrapped stage: 1. Do one of: a. Choose File New from the Designer menu. The New dialog box appears. b. Open the Other folder and select the Parallel Stage Type icon. c. Click OK. The Parallel Routine dialog box appears, with the General page on top. Or: d. Select a folder in the repository tree. 2. Choose New Other Parallel Stage Wrapped from the shortcut menu. The Stage Type dialog box appears, with the General page on top. 3. Fill in the fields on the General page as follows: v Stage type name. This is the name that the stage will be known by to WebSphere DataStage. Avoid using the same name as existing stages or the name of the actual UNIX command you are wrapping. v Category. The category that the new stage will be stored in under the stage types branch. Type in or browse for an existing category or type in the name of a new one. The category also determines what group in the palette the stage will be added to. Choose an existing category to add to an existing group, or specify a new category to create a new palette group. v Parallel Stage type. This indicates the type of new Parallel job stage you are defining (Custom, Build, or Wrapped). You cannot change this setting. v Wrapper Name. The name of the wrapper file WebSphere DataStage will generate to call the command. By default this will take the same name as the Stage type name. v Execution mode. Choose the default execution mode. This is the mode that will appear in the Advanced tab on the stage editor. You can override this mode for individual instances of the stage as required, unless you select Parallel only or Sequential only. v Preserve Partitioning. This shows the default setting of the Preserve Partitioning flag, which you cannot change in a Wrapped stage. This is the
118
setting that will appear in the Advanced tab on the stage editor. You can override this setting for individual instances of the stage Advanced Tabas required. v Partitioning. This shows the default partitioning method, which you cannot change in a Wrapped stage. This is the method that will appear in the Inputs Page Partitioning tab of the stage editor. You can override this method for individual instances of the stage as required. See in WebSphere DataStage Parallel Job Developer Guide for a description of the partitioning methods. v Collecting. This shows the default collection method, which you cannot change in a Wrapped stage. This is the method that will appear in the Inputs Page Partitioning tab of the stage editor. You can override this method for individual instances of the stage as required. v Command. The name of the UNIX command to be wrapped, plus any required arguments. The arguments that you enter here are ones that do not change with different invocations of the command. Arguments that need to be specified when the Wrapped stage is included in a job are defined as properties for the stage. v Short Description. Optionally enter a short description of the stage. v Long Description. Optionally enter a long description of the stage. 4. Go to the Creator page and optionally specify information about the stage you are creating. We recommend that you assign a release number to the stage so you can keep track of any subsequent changes. You can specify that the actual stage will use a custom GUI by entering the ProgID for a custom GUI in the Custom GUI Prog ID field. You can also specify that the stage has its own icon. You need to supply a 16 x 16 bit bitmap and a 32 x 32 bit bitmap to be displayed in various places in the WebSphere DataStage user interface. Click the 16 x 16 Bitmap button and browse for the smaller bitmap file. Click the 32 x 32 Bitmap button and browse for the large bitmap file. Note that bitmaps with 32-bit color are not supported. Click the Reset Bitmap Info button to revert to using the default WebSphere DataStage icon for this stage. 5. Go to the Properties page. This allows you to specify the arguments that the UNIX command requires as properties that appear in the stage Properties tab. For wrapped stages the Properties tab always appears under the Stage page. Fill in the fields as follows: v Property name. The name of the property that will be displayed on the Properties tab of the stage editor. v Data type. The data type of the property. Choose from: Boolean Float Integer String Pathname List Input Column Output Column If you choose Input Column or Output Column, when the stage is included in a job a list will offer a choice of the defined input or output columns. If you choose list you should open the Extended Properties dialog box from the grid shortcut menu to specify what appears in the list.
Chapter 7. Defining special components
119
v Prompt. The name of the property that will be displayed on the Properties tab of the stage editor. v Default Value. The value the option will take if no other is specified. v Required. Set this to True if the property is mandatory. v Repeats. Set this true if the property repeats (that is you can have multiple instances of it). v Conversion. Specifies the type of property as follows: -Name. The name of the property will be passed to the command as the argument value. This will normally be a hidden property, that is, not visible in the stage editor. -Name Value. The name of the property will be passed to the command as the argument name, and any value specified in the stage editor is passed as the value. -Value. The value for the property specified in the stage editor is passed to the command as the argument name. Typically used to group operator options that are mutually exclusive. Value only. The value for the property specified in the stage editor is passed as it is. 6. If you want to specify a list property, or otherwise control how properties are handled by your stage, choose Extended Properties from the Properties grid shortcut menu to open the Extended Properties dialog box. The settings you use depend on the type of property you are specifying: v Specify a category to have the property appear under this category in the stage editor. By default all properties appear in the Options category. v If you are specifying a List category, specify the possible values for list members in the List Value field. v If the property is to be a dependent of another property, select the parent property in the Parents field. v Specify an expression in the Template field to have the actual value of the property generated at compile time. It is usually based on values in other properties and columns. v Specify an expression in the Conditions field to indicate that the property is only valid if the conditions are met. The specification of this property is a bar | separated list of conditions that are ANDed together. For example, if the specification was a=b|c!=d, then this property would only be valid (and therefore only available in the GUI) when property a is equal to b, and property c is not equal to d. Click OK when you are happy with the extended properties. 7. Go to the Wrapped page. This allows you to specify information about the command to be executed by the stage and how it will be handled. The Interfaces tab is used to describe the inputs to and outputs from the stage, specifying the interfaces that the stage will need to function. Details about inputs to the stage are defined on the Inputs sub-tab: v Link. The link number, this is assigned for you and is read-only. When you actually use your stage, links will be assigned in the order in which you add them. In this example, the first link will be taken as link 0, the second as link 1 and so on. You can reassign the links using the stage editors Link Ordering tab on the General page. v Table Name. The metadata for the link. You define this by loading a table definition from the Repository. Type in the name, or browse for a table definition. Alternatively, you can specify an argument to the UNIX command
120
which specifies a table definition. In this case, when the wrapped stage is used in a job design, the designer will be prompted for an actual table definition to use. v Stream. Here you can specify whether the UNIX command expects its input on standard in, or another stream, or whether it expects it in a file. Click on the browse button to open the Wrapped Stream dialog box. In the case of a file, you should also specify whether the file to be read is given in a command line argument, or by an environment variable. Details about outputs from the stage are defined on the Outputs sub-tab: v Link. The link number, this is assigned for you and is read-only. When you actually use your stage, links will be assigned in the order in which you add them. In this example, the first link will be taken as link 0, the second as link 1 and so on. You can reassign the links using the stage editors Link Ordering tab on the General page. v Table Name. The metadata for the link. You define this by loading a table definition from the Repository. Type in the name, or browse for a table definition. v Stream. Here you can specify whether the UNIX command will write its output to standard out, or another stream, or whether it outputs to a file. Click on the browse button to open the Wrapped Stream dialog box. In the case of a file, you should also specify whether the file to be written is specified in a command line argument, or by an environment variable. The Environment tab gives information about the environment in which the command will execute. Set the following on the Environment tab: v All Exit Codes Successful. By default WebSphere DataStage treats an exit code of 0 as successful and all others as errors. Select this check box to specify that all exit codes should be treated as successful other than those specified in the Failure codes grid. v Exit Codes. The use of this depends on the setting of the All Exits Codes Successful check box. If All Exits Codes Successful is not selected, enter the codes in the Success Codes grid which will be taken as indicating successful completion. All others will be taken as indicating failure. If All Exits Codes Successful is selected, enter the exit codes in the Failure Code grid which will be taken as indicating failure. All others will be taken as indicating success. v Environment. Specify environment variables and settings that the UNIX command requires in order to run. 8. When you have filled in the details in all the pages, click Generate to generate the stage.
121
Server routines
The designer also allows you to define your own custom routines that can be used in various places in your server job designs. Server routines are stored in the repository, where you can create, view, or edit them using the Routine dialog box. The following program components are classified as routines: v Transform functions. These are functions that you can use when defining custom transforms. WebSphere DataStage has a number of built-in transform functions but you can also define your own transform functions in the Routine dialog box. v Before/After subroutines. When designing a job, you can specify a subroutine to run before or after the job, or before or after an active stage. WebSphere DataStage has a number of built-in before/after subroutines but you can also define your own before/after subroutines using the Routine dialog box. v Custom UniVerse functions. These are specialized BASIC functions that have been defined outside WebSphere DataStage. Using the Routine dialog box, you can get WebSphere DataStage to create a wrapper that enables you to call these functions from within WebSphere DataStage. These functions are stored under the Routines branch in the Repository. You specify the category when you create the routine. If NLS is enabled, you should be aware of any mapping requirements when using custom UniVerse functions. If a function uses data in a particular character set, it is your responsibility to map the data to and from Unicode. v ActiveX (OLE) functions. You can use ActiveX (OLE) functions as programming components within WebSphere DataStage. Such functions are made accessible to WebSphere DataStage by importing them. This creates a wrapper that enables you to call the functions. After import, you can view and edit the BASIC wrapper using the Routine dialog box. By default, such functions are located in the Routines Class name branch in the Repository, but you can specify your own category when importing the functions. v Web Service routines. You can use operations imported from a web service as programming components within WebSphere DataStage. Such routines are created by importing from a web service WSDL file. When using the Expression Editor in the server job, all of these components appear under the DS Routines... command on the Suggest Operand menu.
122
v Routine name. The name of the function or subroutine. Routine names can be any length. They must begin with an alphabetic character and can contain alphanumeric and period characters. v Type. The type of routine. There are three types of routine: Transform Function, Before/After Subroutine, or Custom UniVerse Function. v External Catalog Name. This is only visible if you have chosen Custom UniVerse Function from the Type box. Enter the cataloged name of the external routine. v Short description. An optional brief description of the routine. v Long description. An optional detailed description of the routine. 3. Next, select the Creator page to enter creator information: The Creator page allows you to specify information about the creator and version number of the routine, as follows: v Vendor. Type the name of the company who created the routine. v Author. Type the name of the person who created the routine. v Version. Type the version number of the routine. This is used when the routine is imported. The Version field contains a three-part version number, for example, 3.1.1. The first part of this number is an internal number used to check compatibility between the routine and the WebSphere DataStage system, and cannot be changed. The second part of this number represents the release number. This number should be incremented when major changes are made to the routine definition or the underlying code. The new release of the routine supersedes any previous release. Any jobs using the routine use the new release. The last part of this number marks intermediate releases when a minor change or fix has taken place. v Copyright. Type the copyright information. 4. Next, select the Arguments page to define any arguments for your routine: The default argument names and whether you can add or delete arguments depends on the type of routine you are editing: v Before/After subroutines. The argument names are InputArg and Error Code. You can edit the argument names and descriptions but you cannot delete or add arguments. v Transform Functions and Custom UniVerse Functions. By default these have one argument called Arg1. You can edit argument names and descriptions and add and delete arguments. There must be at least one argument, but no more than 255. 5. Next, select the Code page to define the code for your routine: The Code page is used to view or write the code for the routine. The toolbar contains buttons for cutting, copying, pasting, and formatting code, and for activating Find (and Replace). The main part of this page consists of a multiline text box with scroll bars. For more information on how to use this page, see Entering Code. Note: This page is not available if you selected Custom UniVerse Function on the General page. 6. When you are happy with your code, you should save, compile and test it (see Saving Code, Compiling Code, and Testing a Routine). 7. Select the Dependencies page to define the dependencies of your routine. The Dependencies page allows you to enter any locally or globally cataloged functions or routines that are used in the routine you are defining. This is to
123
ensure that, when you package any jobs using this routine for deployment on another system, all the dependencies will be included in the package. The information required is as follows: v Type. The type of item upon which the routine depends. Choose from the following: Local Locally cataloged BASIC functions and subroutines. Global Globally cataloged BASIC functions and subroutines. File A standard file. ActiveX An ActiveX (OLE) object (not available on UNIX- based systems). Web service A web service. v Name. The name of the function or routine. The name required varies according to the type of dependency: Local The catalog name. Global The catalog name. File The file name. ActiveX The Name entry is actually irrelevant for ActiveX objects. Enter something meaningful to you (ActiveX objects are identified by the Location field). v Location. The location of the dependency. A browse dialog box is available to help with this. This location can be an absolute path, but it is recommended you specify a relative path using the following environment variables: %SERVERENGINE% - WebSphere DataStage engine account directory (normally C:\IBM\InformationServer\Server\DSEngine on Windows and /opt/IBM/InformationServer/Server/DSEngine on UNIX). %PROJECT% - Currentproject directory. %SYSTEM% - System directory on Windows or /usr/lib on UNIX. Entering code: You can enter or edit code for a routine on the Code page in the Server Routine dialog box. The first field on this page displays the routine name and the argument names. If you want to change these properties, you must edit the fields on the General and Arguments pages. The main part of this page contains a multiline text entry box, in which you must enter your code. To enter code, click in the box and start typing. You can use the following standard Windows edit functions in this text box: v Delete using the Del key v Cut using Ctrl-X v Copy using Ctrl-C v Paste using Ctrl-V v Go to the end of the line using the End key v Go to the beginning of the line using the Home key v Select text by clicking and dragging or double-clicking Some of these edit functions are included in a shortcut menu which you can display by right clicking. You can also cut, copy, and paste code using the buttons in the toolbar.
124
Your code must only contain BASIC functions and statements supported by WebSphere DataStage. If NLS is enabled, you can use non-English characters in the following circumstances: v In comments v In string data (that is, strings contained in quotation marks) The use of non-English characters elsewhere causes compilation errors. If you want to format your code, click the Format button on the toolbar. The return field on this page displays the return statement for the function or subroutine. You cannot edit this field. Saving code: When you have finished entering or editing your code, the routine must be saved. A routine cannot be compiled or tested if it has not been saved. To save a routine, click Save in the Server Routine dialog box. The routine properties (its name, description, number of arguments, and creator information) and the associated code are saved in the Repository. Compiling code: When you have saved your routine, you must compile it. To compile a routine, click Compile... in the Server Routine dialog box. The status of the compilation is displayed in the lower window of the Server Routine dialog box. If the compilation is successful, the routine is marked as built in the Repository and is available for use. If the routine is a Transform Function, it is displayed in the list of available functions when you edit a transform. If the routine is a Before/After Subroutine, it is displayed in the drop-down list box of available subroutines when you edit an Aggregator, Transformer, or plug-in stage, or define job properties. To troubleshoot any errors, double-click the error in the compilation output window. WebSphere DataStage attempts to find the corresponding line of code that caused the error and highlights it in the code window. You must edit the code to remove any incorrect statements or to correct any syntax errors. If NLS is enabled, watch for multiple question marks in the Compilation Output window. This generally indicates that a character set mapping error has occurred. When you have modified your code, click Save then Compile... . If necessary, continue to troubleshoot any errors, until the routine compiles successfully. Once the routine is compiled, you can use it in other areas of WebSphere DataStage or test it. Testing a routine: Before using a compiled routine, you can test it using the Test... button in the Server Routine dialog box.
Chapter 7. Defining special components
125
The Test... button is activated when the routine has been successfully compiled. Note: The Test... button is not available for a Before/After Subroutine. Routines of this type cannot be tested in isolation and must be executed as part of a running job. When you click Test..., the Test Routine dialog box appears: This dialog box contains a grid and buttons. The grid has a column for each argument and one for the test result. You can add and edit rows in the grid to specify the values for different test cases. To run a test with a chosen set of values, click anywhere in the row you want to use and click Run. If you want to run tests using all the test values, click Run All. The Result... column is populated as each test is completed. To see more details for a particular test, double-click the Result... cell for the test you are interested in. The Test Output window appears, displaying the full test results: Click Close to close this window. If you want to delete a set of test values, click anywhere in the row you want to remove and press the Delete key or choose Delete row from the shortcut menu. When you have finished testing the routine, click Close to close the Test Routine dialog box. Any test values you entered are saved when you close the dialog box.
Copying a routine
You can copy an existing routine using the Designer. To copy a routine: 1. Select it in the repository tree 2. Choose Create copy from the shortcut menu.
126
The routine is copied and a new routine is created in the same folder in the project tree. By default, the name of the copy is called CopyOfXXX, where XXX is the name of the chosen routine. An edit box appears allowing you to rename the copy immediately. The new routine must be compiled before it can be used.
Custom transforms
You can create, view or edit custom transforms for server jobs using the Transform dialog box. Transforms specify the type of data transformed, the type it is transformed into, and the expression that performs the transformation. The WebSphere DataStage Expression Editor helps you to enter correct expressions when you define custom transforms in the WebSphere DataStage Director. The Expression Editor can: v Facilitate the entry of expression elements v Complete the names of frequently used variables v Validate variable names and the complete expression When you are entering expressions, the Expression Editor offers choices of operands and operators from context-sensitive shortcut menus. WebSphere DataStage is supplied with a number of built-in transforms (which you cannot edit). You can also define your own custom transforms, which are stored in the repository and can be used by other WebSphere DataStage server jobs (or by parallel jobs using server shared containers or BASIC transformer stages). When using the Expression Editor, the transforms appear under the DS Transform... command on the Suggest Operand menu. Transforms are used in the Transformer stage to convert your data to a format you want to use in the final data mart. Each transform specifies the BASIC function used to convert the data from one type to another. There are a number of built-in transforms supplied with WebSphere DataStage, but if these are not suitable or you want a specific transform to act on a specific data element, you can create custom transforms in the Designer. The advantage of creating a custom transform over just entering the required expression in the Transformer Editor is that, once defined, the transform is available for use from anywhere within the project. It can also be easily exported to other WebSphere DataStage projects. To provide even greater flexibility, you can also define your own custom routines and functions from which to build custom transforms. There are three ways of doing this: v Entering the code within WebSphere DataStage (using BASIC functions). v Creating a reference to an externally cataloged routine. v Importing external ActiveX (OLE) functions or web services routines. (See Server Routines.)
127
2.
3. 4. 5. 6.
b. Open the Other folder and select the Transform icon. c. Click OK. The Transform dialog box appears, with the General page on top. Or: d. Select a folder in the repository tree. e. Choose New Other Transform from the shortcut menu. The Transform dialog box appears, with the General page on top. This dialog box has two pages: f. General. Displayed by default. Contains general information about the transform. g. Details. Allows you to specify source and target data elements, the function, and arguments to use. Enter the name of the transform in the Transform name field. The name entered here must be unique; as no two transforms can have the same name. Also note that the transform should not have the same name as an existing BASIC function; if it does, the function will be called instead of the transform when you run the job. Optionally enter a brief description of the transform in the Short description field. Optionally enter a detailed description of the transform in the Long description field. Once this page is complete, you can specify how the data is converted. Click the Details tab. The Details page appears at the front of the Transform dialog box. Optionally choose the data element you want as the target data element from the Target data element list box. (Using a target and a source data element allows you to apply a stricter data typing to your transform. See Data Elements for a description of data elements.)
7. Specify the source arguments for the transform in the Source Arguments grid. Enter the name of the argument and optionally choose the corresponding data element from the drop-down list. 8. Use the Expression Editor in the Definition field to enter an expression which defines how the transform behaves. The Suggest Operand menu is slightly different when you use the Expression Editor to define custom transforms and offers commands that are useful when defining transforms. 9. Click OK to save the transform and close the Transform dialog box. You can then use the new transform from within the Transformer Editor. Note: If NLS is enabled, avoid using the built-in Iconv and Oconv functions to map data unless you fully understand the consequences of your actions.
128
Data elements
Each column within a table definition can have a data element assigned to it. A data element specifies the type of data a column contains, which in turn determines the transforms that can be applied in a Transformer stage. The use of data elements is optional. You do not have to assign a data element to a column, but it enables you to apply stricter data typing in the design of server jobs. The extra effort of defining and applying data elements can pay dividends in effort saved later on when you are debugging your design. You can choose to use any of the data elements supplied with WebSphere DataStage, or you can create and use data elements specific to your application. For a list of the built-in data elements, see Built-In Data Elements. Application-specific data elements allow you to describe the data in a particular column in more detail. The more information you supply to WebSphere DataStage about your data, the more WebSphere DataStage can help to define the processing needed in each Transformer stage. For example, if you have a column containing a numeric product code, you might assign it the built-in data element Number. There is a range of built-in transforms associated with this data element. However, all of these would be unsuitable, as it is unlikely that you would want to perform a calculation on a product code. In this case, you could create a new data element called PCode. Each data element has its own specific set of transforms which relate it to other data elements. When the data elements associated with the columns of a target table are not the same as the data elements of the source data, you must ensure that you have the transforms needed to convert the data as required. For each target column, you should have either a source column with the same data element, or a source column that you can convert to the required data element. For example, suppose that the target table requires a product code using the data element PCode, but the source table holds product data using an older product numbering scheme. In this case, you could create a separate data element for old-format product codes called Old_PCode, and you then create a custom transform to link the two data elements; that is, its source data element is Old_PCode, while its target data element is PCode. This transform, which you could call Convert_PCode, would convert an old product code to a new product code.
129
A data element can also be used to stamp a column with SQL properties when you manually create a table definition or define a column definition for a link in a job.
130
6. Click OK to save the data element and to close the Data Element dialog box. You must edit your table definition to assign this new data element. Naming Data Elements: The rules for naming data elements are as follows: v Data element names can be any length. v They must begin with an alphabetic character. v They can contain alphanumeric, period, and underscore characters. Data element category names can be any length and consist of any characters, including spaces.
131
132
Mainframe routines
There are three types of mainframe routine: v External Routine. Calls a COBOL library function. v External Source Routine. Calls a user-supplied program that allows a WebSphere DataStage job to access an external data source as the job runs on the mainframe. v External Target Routine. Calls a user-supplied program that allows a WebSphere DataStage job to write to an external data source as the job runs on the mainframe. The External Routine stage in a WebSphere DataStage mainframe job enables you to call a COBOL subroutine that exists in a library external to WebSphere DataStage in your job. You must first define the routine, details of the library, and its input and output arguments. The routine definition is stored in the WebSphere DataStage Repository and can be referenced from any number of External Routine stages in any number of mainframe jobs. The External Source stage in a WebSphere DataStage mainframe job allows you to read data from file types that are not supported in Enterprise MVS Edition. After you write an external source program, you create an external source routine in the WebSphere DataStage Repository. The external source routine specifies the attributes of the external source program. The External Target stage in a mainframe job allows you to write data to file types that are not supported in Enterprise MVS Edition. After you write an external target program, you create an external target routine in the WebSphere DataStage Repository. The external target routine specifies the attributes of the external target program.
133
When you create, view, or edit a mainframe routine, the Mainframe Routine dialog box appears. This dialog box has up to four pages: General, Creator, and Arguments, plus a JCL page if you are editing an External Source or External Target routine. There are three buttons in the Mainframe Routine dialog box: v Close. Closes the Routine dialog box. If you have any unsaved changes, you are prompted to save them. v Save. Saves the routine. v Help. Starts the Help system. Naming routines: Routine names can be one to eight characters in length. They must begin with an alphabetic character.
Creating a routine
To create a new routine: 1. Do one of: a. Choose File New from the Designer menu. The New dialog box appears. b. Open the Routines folder and select the Mainframe Routine icon.Click OK. The Mainframe Routine dialog box appears, with the General page on top. Or: c. Select a folder in the repository tree. d. Choose New Mainframe Routine from the shortcut menu. The Mainframe Routine dialog box appears, with the General page on top. 2. Enter general information about the routine, as follows: v Routine name. Type the name (up to 8 characters) of the function or subroutine. In mainframe terms, the routine name is the name of an entry point in a member of a load or object library. The library member might also contain other entry points with other names. The routine name must match the external subroutine name if dynamic invocation (the default) is selected, and automatically appears in the External subroutine name field. v Type. Choose External Routine, External Source Routine or External Target Routine from the drop-down list. v Platform. Select the operating system that the COBOL subroutine will run on. (OS/390 is the only platform currently supported.) v External subroutine name. Type the name of the load or object library member that contains the subroutine or function entry point. If dynamic invocation is selected, then the external subroutine name must match the routine name and is entered automatically. If the invocation method is static, then the two names need not match. v Library path. Type the pathname of the library that contains the routine member. v Invocation method. Select the invocation method for the routine. Dynamic invocation calls the routine at runtime. Static invocation embeds the routine within a program. Dynamic is the default. v Short description. Type an optional brief description of the routine. The text entered in this field appears in the External Routine stage editor. v Long description. Type an optional detailed description of the routine. 3. Select the Creator page:
134
Enter information as follows: v Vendor. Type the name of the company who created the routine. v Author. Type the name of the person who created the routine. v Version. Type the version number of the routine. This is used when the routine is imported. The Version field contains a three-part version number, for example, 3.1.1. The first part of this number is an internal number used to check compatibility between the routine and the WebSphere DataStage system, and cannot be changed. The second part of this number represents the release number. This number should be incremented when major changes are made to the routine definition or the underlying code. The new release of the routine supersedes any previous release. Any jobs using the routine use the new release. The last part of this number marks intermediate releases when a minor change or fix has taken place. v Copyright. Type the copyright information. 4. Next define any required routine arguments on the Arguments page: Arguments are optional for mainframe routines. To load arguments from an existing table definition, click Load. To create a new argument, type directly in the Arguments page grid or, if you need to specify COBOL attributes, do one of the following: v Right-click in the column area and select Edit row... from the shortcut menu. v Press Ctrl-E. The Edit Routine Argument Metadata dialog box appears. The top pane contains the same fields that appear on the Arguments page grid. Enter the information for each argument you want to define as follows: v Argument name. Type the name of the argument to be passed to the routine. v I/O type. Only appears for External routines. Select the direction to pass the data. There are three options: Input. A value is passed from the data source to the external routine. The value is mapped to an input row element. Output. A value is returned from the external routine to the stage. The value is mapped to an output column. Both. A value is passed from the data source to the external routine and returned from the external routine to the stage. The value is mapped to an input row element, and later mapped to an output column. v Native type. Select the native data type of the argument value from the drop-down list. v Length. Type a number representing the length or precision of the argument. v Scale. If the argument is numeric, type a number to define the number of decimal places. v Nullable. Only appears for External Source and Target routines. Select Yes, No, or Unknown from the drop-down list to specify whether the argument can contain null values. The default is No on the Edit Routine Argument Meta Data dialog box. v Date Format. Only appears for External Source and Target routines. Choose the date format from the drop-down list of available formats. v Description. Type an optional description of the argument. The bottom pane of the Edit Routine Argument Metadata dialog box displays the COBOL page by default. Use this page to enter any required COBOL information for the mainframe argument:
135
v Level Number. Only appears for External Source routines. Type in a number giving the COBOL level number in the range 02 - 49. The default value is 05. v Occurs. Only appears for External Source routines. Type in a number giving the COBOL occurs clause. If the argument defines a group, gives the number of elements in the group. v Usage. Select the COBOL usage clause from the drop-down list. v Sign indicator. Select Signed or blank from the drop-down list to specify whether the argument can be signed or not. The default is blank. v Sign option. If the argument is signed, select the location of the sign in the data from the drop-down list. v Sync indicator. Select SYNC or blank from the drop-down list to indicate whether this is a COBOL-synchronized clause or not. The default is blank. v Redefined Field. Only appears for External Source routines. Optionally specify a COBOL REDEFINES clause. This allows you to describe data in the same storage area using a different data description. v Depending on. Only appears for External Source routines. Optionally choose a COBOL OCCURS-DEPENDING ON clause from the drop-down list. v Storage length. Gives the storage length in bytes of the argument as defined. This field is derived and cannot be edited. v Picture. Gives the COBOL PICTURE clause, which is derived from the argument definition and cannot be edited. 5. If you are editing an External Source or Target routine, click the JCL tab to go to the JCL page. This allows you to supply any additional JCL that your routine might require. Type in the JCL or click Load JCL to load it from a file. 6. Click Save when you are finished to save the routine definition.
Copying a routine
You can copy an existing routine using the Designer. To copy a routine: 1. Select it in the repository tree 2. Choose Create copy from the shortcut menu. The routine is copied and a new routine is created in the same folder in the project tree. By default, the name of the copy is called CopyOfXXX, where XXX is the name of the chosen routine. An edit box appears allowing you to rename the copy immediately.
136
Renaming a routine
You can rename any user-written routines using the Designer. To rename an item, select it in the repository tree and do one of the following: v Click the routine again. An edit box appears and you can enter a different name or edit the existing one. Save the new name by pressing Enter or by clicking outside the edit box. v Choose Rename from the shortcut menu. An edit box appears and you can enter a different name or edit the existing one. Save the new name by pressing Enter or by clicking outside the edit box. v Double-click the routine. The Mainframe Routine dialog box appears and you can edit the Routine name field. Click Save, then Close.
Machine profiles
Mainframe machine profiles are used when WebSphere DataStage uploads generated code to a mainframe. They are also used by the mainframe FTP stage. They provide a reuseable way of defining the mainframe WebSphere DataStage is uploading code or FTPing to. You can create mainframe machine profiles and store them in the WebSphere DataStage repository. You can create, copy, rename, move, and delete them in the same way as other repository objects. To create a machine profile: 1. Do one of: a. Choose File New from the Designer menu. The New dialog box appears. b. Open the Other folder and select the Machine Profile icon. c. Click OK. The Machine Profile dialog box appears, with the General page on top. Or: d. Select a folder in the repository tree. e. Choose New Other Machine Profile from the shortcut menu. The Machine Profile dialog box appears, with the General page on top. 2. Supply general details as follows: a. Enter the name of the machine profile in the Machine profile name field. The name entered here must be unique as no two machine profiles can have the same name. b. Choose the type of platform for which you are defining a profile from the Platform type drop-down list. c. Optionally enter a brief description of the profile in the Short description field. d. Optionally enter a detailed description of the data in the Long description field. This description is displayed only when you view the properties of a machine profile. 3. Click the Connection tab to go to the Connection page. Fill in the fields as follows: v Specify the IP Host name/address for the machine. v Specify the Port to connect to. The default port number is 21. v Choose an Ftp transfer type of ASCII or Binary.
137
v Specify a user name and password for connecting to the machine. The password is stored in encrypted form. v Click Active or Passive as appropriate for the FTP service. v If you are generating process metadata from mainframe jobs, specify the target directory and dataset name for the XML file which will record the operational metadata. 4. Click the Libraries tab to go to the Libraries page. Fill in the fields as follows: v In Source library specify the destination for the generated code. v In Compile JCL library specify the destination for the compile JCL file. v In Run JCL library specify the destination for the run JCL file. v In Object library specify the location of the object library. This is where compiler output is transferred. v In DBRM library specify the location of the DBRM library. This is where information about a DB2 program is transferred. v In Load library specify the location of the Load library. This is where executable programs are transferred. v In Jobcard accounting information specify the location of identification information for the jobcard. 5. Click OK to save the machine profile and to close the Machine Profile dialog box.
138
Depending on the type of IMS item you selected, either the IMS Database dialog box appears or the IMS Viewset dialog box appears. Remember that, if you edit the definitions, this will not affect the actual database it describes. IMS database editor: The IMS Database editor allows you to view, edit, or create IMS database objects. This dialog box is divided into two panes. The left pane displays the IMS database, segments, and datasets in a tree, and the right pane displays the properties of selected items. Depending on the type of item selected, the right pane has up to two pages: v Database. There are two pages for database properties: General. Displays the general properties of the database including the name, version number, access type, organization, and short and long descriptions. All of these fields are read-only except for the short and long descriptions. Hierarchy. Displays the segment hierarchy of the database. You can right-click to view the hierarchy in detailed mode. This diagram is read-only. v Segment. There are two pages for segment properties: General. Displays the segment name, the parent segment, its minimum and maximum size in bytes, and a description. All of these fields are read-only except for the description. Fields. Displays the fields of the selected segment. The field descriptions are read-only. v Dataset. Properties are displayed on one page and include the DD names that are used in the JCL to read the file. These names are read-only. You can optionally enter a description of the dataset. IMS viewset editor: The IMS Viewset editor allows you to view, edit, or create IMS viewset objects. This dialog box is divided into two panes. The left pane contains a tree structure displaying the IMS viewset (PSB), its views (PCBs), and the sensitive segments. The right pane displays the properties of selected items. It has up to three pages depending on the type of item selected: v Viewset. Properties are displayed on one page and include the PSB name. This field is read-only. You can optionally enter short and long descriptions. v View. There are two pages for view properties: General. Displays the PCB name, DBD name, type, and an optional description. If you did not create associated tables during import or you want to change which tables are associated with PCB segments, click the Segment/Table Mapping... button. The Segment/Associated Table Mapping dialog box appears. To create a table association for a segment, select a table in the left pane and drag it to the segment in the right pane. The left pane displays available tables in the Repository which are of type QSAM_SEQ_COMPLEX. The right pane displays the segment names and the tables currently associated with them; you can right-click to clear one or all of the current table mappings. Click OK when you are done with the mappings, or click Cancel to discard any changes you have made and revert back to the original table associations. Hierarchy. Displays the PCB segment hierarchy in a read-only diagram. You can right-click to view the hierarchy in detailed mode.
Chapter 7. Defining special components
139
v Sensitive Segment. There are three pages for sensitive segment properties: General. Displays the segment name and its associated table. If you want to change the associated table, click the browse button next to the Associate table field to select another table. Sen Fields. Displays the sensitive fields associated with the sensitive segment. These fields are read-only. Columns. Displays the columns of the associated table. The column descriptions are read-only.
140
141
If you installed or imported a job, the Before-job subroutine field might reference a routine which does not exist on your system. In this case, a warning message appears when you close the Job Properties dialog box. You must install or import the missing routine or choose an alternative one to use. A return code of 0 from the routine indicates success. Any other code indicates failure and causes a fatal error when the job is run. v After-job subroutine and Input value. Optionally contains the name (and input parameter value) of a subroutine that is executed after the job has finished. For example, you can specify a routine that sends an electronic message when the job finishes. Choose a routine from the drop-down list box. This list box contains all the built routines defined as a Before/After Subroutine under the Routines branch in the Repository. Enter an appropriate value for the routines input argument in the Input value field. If you use a routine that is defined in the Repository, but which was edited but not compiled, a warning message reminds you to compile the routine when you close the Job Properties dialog box. A return code of 0 from the routine indicates success. Any other code indicates failure and causes a fatal error when the job is run. v Only run after-job subroutine on successful job completion. This option is enabled if you have selected an After-job subroutine. If you select the option, then the After-job subroutine will only be run if the job has successfully completed running all its stages. v Enable Runtime Column Propagation for new links. This check box appears if you have selected Enable Runtime Column propagation for Parallel jobs for this project in the Administrator client. Check it to enable runtime column propagation by default for all new links on this job. v Allow Multiple Instances. Select this to enable the Director client to run multiple instances of this job. v Enable hashed file cache sharing. Select this to enable multiple processes to access the same hash file in cache (the system checks if this is appropriate). This can save memory resources and speed up execution where you are, for example, running multiple instances of the same job. This only applies to parallel jobs that use server functionality in server shared container stages. v Enabled for Information Services. Select this to make the job available for deployment as a service. v Short job description. An optional brief description of the job. v Full job description. An optional detailed description of the job.
142
v v v v v
Directory. Specifies the directory in which the report will be written. XSL stylesheet. Optionally specifies an XSL style sheet to format an XML report. If the job had an alias ID then the report is written to JobName_alias.txt or JobName_alias.xml, depending on report type. If the job does not have an alias, the report is written to JobName_YYYYMMDD_HHMMSS.txt or JobName_YYYYMMDD_HHMMSS.xml, depending on report type. ExecDOS. This routine executes a command via an MS-DOS shell. The command executed is specified in the routines input argument. ExecDOSSilent. As ExecDOS, but does not write the command line to the job log. ExecTCL. This routine executes a command via a WebSphere DataStage Engine shell. The command executed is specified in the routines input argument. ExecSH. This routine executes a command via a UNIX Korn shell. ExecSHSilent. As ExecSH, but does not write the command line to the job log.
NLS page
This page only appears if you have NLS installed with WebSphere DataStage. It allows you to ensure that WebSphere DataStage uses the correct character set map and collate formatting rules for your parallel job The character set map defines the character set WebSphere DataStage uses for this job. You can select a specific character set map from the list or accept the default setting for the whole project. The locale determines the order for sorted data in the job. Select the project default or choose one from the list.
143
The Execution page has the following options: v Compile in trace mode. Select this so that you can use the tracing facilities after you have compiled this job. v Force Sequential Mode. Select this to force the job to run sequentially on the conductor node. v Limits per partition. These options enable you to limit data in each partition to make problems easier to diagnose: Number of Records per Link. This limits the number of records that will be included in each partition. v Log Options Per Partition. These options enable you to specify how log data is handled for partitions. This can cut down the data in the log to make problems easier to diagnose. Skip count. Set this to N to skip the first N records in each partition. Period. Set this to N to print every Nth record per partition, starting with the first record. N must be >= 1. v Advanced Runtime Options. This field is for experienced Orchestrate users to enter parameters to be added to the OSH command line. Under normal circumstances this field should be left blank.
144
The release number n.N.n. This field is obsolete. The bug fix number n.n.N. This number reflects minor changes to the job design or properties. To change this number, select it and enter a new value directly or use the arrow buttons to increase the number. v Before-job subroutine and Input value. Optionally contain the name (and input parameter value) of a subroutine that is executed before the job runs. For example, you can specify a routine that prepares the data before processing starts. Choose a routine from the list box. This list box contains all the built routines defined as a Before/After Subroutine under the Routines branch in the Repository. Enter an appropriate value for the routines input argument in the Input value field. If you use a routine that is defined in the Repository, but which was edited and not compiled, a warning message reminds you to compile the routine when you close the Job Properties dialog box. If you installed or imported a job, the Before-job subroutine field might reference a routine which does not exist on your system. In this case, a warning message appears when you close the Job Properties dialog box. You must install or import the missing routine or choose an alternative one to use. A return code of 0 from the routine indicates success. Any other code indicates failure and causes a fatal error when the job is run. After-job subroutine and Input value. Optionally contains the name (and input parameter value) of a subroutine that is executed after the job has finished. For example, you can specify a routine that sends an electronic message when the job finishes. Choose a routine from the list box. This list box contains all the built routines defined as a Before/After Subroutine under the Routines branch in the Repository. Enter an appropriate value for the routines input argument in the Input value field. If you use a routine that is defined in the Repository, but which was edited but not compiled, a warning message reminds you to compile the routine when you close the Job Properties dialog box. A return code of 0 from the routine indicates success. Any other code indicates failure and causes a fatal error when the job is run. Only run after-job subroutine on successful job completion. This option is enabled if you have selected an After-job subroutine. If you select the option, then the After-job subroutine will only be run if the job has successfully completed running all its stages. Allow Multiple Instances. Select this to enable the Director client to run multiple instances of this job. Enable hashed file cache sharing. Check this to enable multiple processes to access the same hash file in cache (the system checks if this is appropriate). This can save memory resources and speed up execution where you are, for example, running multiple instances of the same job. Enabled for Information Services. Select this to make the job available for deployment as a service. Short job description. An optional brief description of the job. Full job description. An optional detailed description of the job.
v v
v v v
145
146
A default locale is set for each project during installation. You can override the default for a particular job by selecting the locale you require for each category on the NLS page of the Job Properties window: v Time/Date specifies the locale to use for formatting times and dates. v Numeric specifies the locale to use for formatting numbers, for example, the thousands separator and radix character. v Currency specifies the locale to use for monetary amounts, for example, the currency symbol and where it is placed. v CType specifies the locale to use for determining character types, for example, which letters are uppercase and which lowercase. v Collate specifies the locale to use for determining the order for sorted data. In most cases you should use the same locale for every category to ensure that the data is formatted consistently.
147
save memory resources and speed up execution where you are, for example, running multiple instances of the same job.
148
v Generate operational metadata. Click this to have the job generate operational metadata. v NULL indicator location. Select Before column or After column to specify the position of NULL indicators in mainframe column definitions. v NULL indicator value. Specify the character used to indicate nullability of mainframe column definitions. NULL indicators must be single-byte, printable characters. Specify one of the following: A single character value (1 is the default) An ASCII code in the form of a three-digit decimal number from 000 to 255 An ASCII code in hexadecimal form of %Hnn or %hnn where nn is a hexadecimal digit (0-9, a-f, A-F) v Non-NULL Indicator Value. Specify the character used to indicate non-NULL column definitions in mainframe flat files. NULL indicators must be single-byte, printable characters. Specify one of the following: A single character value (0 is the default) An ASCII code in the form of a three-digit decimal number from 000 to 255 An ASCII code in hexadecimal form of %Hnn or %hnn where nn is a hexadecimal digit (0-9, a-f, A-F) v Short job description. An optional brief description of the job. v Full job description. An optional detailed description of the job. Click OK to record your changes in the job design. Changes are not saved to the Repository until you save the job design.
149
v v v v v
Decimal. A COBOL signed zoned-decimal number, the precision is indicated by Length and the scale by Scale. The COBOL program defines this parameter with PIC S9(length-scale)V9(scale). Integer. A COBOL signed zoned-decimal number, where the Length attribute is used to define its length. The COBOL program defines this parameter with PIC S9(length). Length. The length of a char or a decimal parameter. Scale. The precision of a decimal parameter. Description. Optional description of the parameter. Save As... . Allows you to save the set of job parameters as a table definition in the WebSphere DataStage Repository. Load... . Allows you to load the job parameters from a table definition in the WebSphere DataStage Repository.
150
The Extension page contains a grid with these columns: v Name. The name of the extension variable. The name must begin with an alphabetic character and can contain only alphabetic or numeric characters. It can be upper or lowercase or mixed. v Value. The value that the extension variable will take in this job. No validation is done on the value.
151
152
You can compare two objects that are in the same project or compare two objects that are in two different projects. For example, you can compare the table definition named CA_address_data in the project named MK_team with the table named Cal_addresses in the project named WB_team. The Designer client displays descriptions of all the differences between the two objects in a hierarchical tree structure. You can expand branches in the tree to view the details. Click the underlined link in the description to view the changed object. The details that are displayed depend on the types of object that you are comparing. For example, the following picture shows the result of comparing two table definitions.
153
If you compare objects that contain differences in multi-line text (for example, source code in routines), the change tree displays a View button. Click View to view the source code. By default the source is displayed in Notepad, but you can
154
choose a different application in the Designer client options. Select Tools Options and select Comparison Tool to view comparison tool options.
155
You can use the command line tool to build compare operations into scripts. These can be combined with other command line tools that are available for manipulating objects and projects. The compare command line tool has the following syntax:
diffapicmdline.exe /lhscd "left_side_connection_details" /rhscd "right_side_connection_details" /t difftype /ot output_type /ol output_location
The command takes the following arguments: left_side_connection_details The connection details for the left side of the comparison, that is, the first object that you want to compare. Enclose the connections details in double quotation marks. Specify the following details: v /d=domainname /host=hostname /u=username /p=password project_name object_name The name of a table definition object must be specified as its data locator name, not the name that is displayed in the repository tree. The data locator name is displayed in the table definition properties window. right_side_connection_details The connection details for the right side of the comparison, that is, the second object that you want to compare. You must specify full connection details only if you compare two objects in different projects. Otherwise, you can specify only the object name. The syntax for the connection details is the same as for the left_side_connection_details option. difftype The type of objects to compare. This argument can be one of the following values: v Job v SharedContainer v Routine v TableDef output_type The format of the output of the comparison. The format is always HTML. output_location The full path for the output file. On completion the tool returns one of the following codes: v 0 indicates successful comparison v 1 indicates an error in the command line v 2 indicates that the client cannot compare the objects
156
diffapicmdline.exe /lhscd /d=localhost:9080 /h=R101 /u=billg /p=paddock tutorial exercise1 /rhscd new_exercise1 /t job /ot html /ol c:\compare_output.html
157
158
Find facilities
Use the find facilities to search for objects in the repository and to find where objects are used by other objects. The repository tree has extensive search capabilities. Quick find is available wherever you can see the repository tree: in browse windows as well the Designer client window. Advanced find is available from the Designer client window.
Quick find
Use the quick find feature to search for a text string in the name of an object, or in its name and its description. You can restrict your search to certain types of object, for example you can search for certain jobs. You can keep the quick find window open at the top of your repository tree in the Designer client window and you can access it from any windows in which the repository tree is displayed. Quick find supports the use of wildcards. Use an asterisk to represent zero or more characters. To search for an object using quick find: 1. Open the quick find window by clicking Open quick find in the top right corner of the repository tree. 2. Enter the name to search for in the Name to find field. If you are repeating an earlier search, click the down arrow in the name box and select your previous search from the drop-down list. 3. Select the Include descriptions check box if you want the search to take in the object descriptions as well as the names (this will include long and short description if the object type has both). 4. If you want to restrict the search to certain types of object, select that object types from the drop-down list in Types to find. 5. After specifying the search details, click Find.
Example
Here is an example of a quick find where you searched for any objects called copy, and WebSphere DataStage found one job meeting this criteria. The job it found is highlighted in the tree. WebSphere DataStage reports that there are two matches because it also found the copy stage object. If you wanted to search only jobs then you could select jobs in the Types to find list. If you want to find any jobs with copy in the title you could enter *copy* in the Name to find field and the search would find the jobs copy and copy2.
159
If quick find locates several objects that match your search criteria, it highlights each one in the tree, with the folders expanded as necessary. To view the next object, click Next. You can click the n matches link to open the Advanced find window and display all the found objects in the window.
Advanced find
You can use advanced find to carry out sophisticated searches. Advanced find displays all the search results together in a window, independent of the repository tree structure. To access the advanced find facilities, do one of the following actions: v Open the quick find window in the main repository tree and click Adv.... v Perform a search using quick find, and then click the n matches hyperlink. v Choose Tools Advanced Find from the main menu. v Select a folder in the repository tree, and then do one of the following actions:
160
Select Repository Find in this folder Open Advanced Find from the main menu. Select Find in this folder Open advanced find from the pop-up menu. Choose Repository Find in this folder Objects that I created, Objects that I created today, or Objects that I created up to a week ago from the main menu Select Find in this folder Objects that I created , Objects that I created today, or Objects that I created up to a week ago from the pop-up menu. Select Repository Find in this folder Objects that I last modified, Objects that I last modified today, or Objects that I last modified up to a week ago from the main menu. Select Find in this folder Objects that I last modified, Objects that I last modified today, or Objects that I last modified up to a week ago from the pop-up menu.
161
The names and details of the objects in the repository that match the search criteria are listed in the Results - Details tab. You can select items in the list and perform various operations on them from the shortcut menu. The available operations depend on the object type, but include the following: v Find dependencies and Find where used. v Export. Opens the Repository Export window which allows you to export the selected object in a .dsx or .xml file. v Multiple job compile. Available for jobs and job sequences. Opens the multiple job compiler tool. v Edit. Available for jobs and job sequences. The selected job opens in the job design canvas. v Add to palette. The object is added to the palette currently open so that it is readily available for future operations. v Create copy. Creates a copy of the selected object named CopyOfObjectName. v Rename. Use this operation to rename the selected object. v Delete. Use this operation to delete the selected object. v Properties. Opens the Properties window for the selected object. All objects have associated properties giving detailed information about the object. You can save the results of an advanced search operation as an XML file or as a report. You can view a report in the reporting console. The following example shows a search for object names that start with the word exercise and include descriptions that contain the word tutorial. The results show that four objects match these criteria:
162
163
You can search for column definitions within the table definitions that are stored in the WebSphere DataStage repository, or you can search for column definitions in jobs and shared containers. You can narrow the search by specifying other search criteria, such as folders to search in, when the column definition was created or modified, or who created or modified the column definition. To search for a column: 1. Type the name of the column that you want to find in the Name to find field. You can use wildcard characters if you want to search for a number of columns with similar names. 2. Select Columns in Table Definitions or Columns in Jobs or Shared Containers in the Type list. Select both to search for the column in table definitions and in jobs and shared containers. 3. Specify other search criteria as required. 4. Click Find to search for the column.
Impact analysis
Use the impact analysis features to discover where objects are used, and what other objects they depend on. The impact analysis features help you to assess the impact of changes you might make to an object on other objects, or on job designs. For example, before you edit a table definition, you could find the jobs that derive their column definitions from the table definition and consider whether those jobs will need changing too. There are four types of impact analysis queries: where used Finds where the selected object is used by other objects. For example, if the selected object is a routine, a where used query lists all the jobs that use that routine. where used (deep) Finds where the selected object is used by other objects, and also where those objects in turn are used. For example if the selected object is a routine, a where used (deep) query lists all the jobs that use that routine, and also lists any job sequences that use those jobs. dependencies of Finds the objects that the selected object depends upon. For example, a job might depend upon a particular routine. dependencies of (deep) Finds the objects that the selected object depends upon, and also what objects they in turn depend upon. For example, a job sequence might depend on a job which in turn depends on a routine.
164
Table 2. Impact analysis and objects Object type Data element Where used query Parallel job, server job, sequence job, parallel shared container, server shared container, table definition, transform Mainframe job Mainframe job None Dependencies of query None
Table definition Table definition IMS database, IMS viewset, machine profile, mainframe routine, mainframe stage type, table definition Data element, parallel routine, server routine, parallel shared container, server shared container, rule specification, parallel stage type, server stage type, table definition transform, parameter set, data connection Data element, server routine, server shared container, server stage type, table definition, transform, parameter set, data connection Data element, parallel job, server job, sequence job, parallel routine, server routine, parallel shared container, server shared container, rule specification, parallel stage type, server stage type, table definition, transform, parameter set, data connection None Rule specification
Parallel job
Sequence job
Server job
Sequence job
Sequence job
Sequence job
Mainframe job Parallel job, sequence job, parallel shared container, rule specification mainframe job Parallel job, sequence job, parallel shared container Parallel job, server job, sequence job, server routine, parallel shared container, server shared container
165
Table 2. Impact analysis and objects (continued) Object type Parallel shared Pcontainer Where used query parallel job, sequence job, parallel shared container Dependencies of query Data element, parallel routine, server routine, parallel shared container, server shared container, rule specification, parallel stage type, server stage type, table definition, transform, parameter set, data connection Data element, server routine, server shared container, server stage type, table definition, transform, parameter set, data connection None None None
Parallel job, server job, sequence job, parallel shared container, server shared container
Mainframe job Parallel job, sequence job, parallel shared container Server job, parallel job, sequence job, server shared container, parallel shared container Sequence job Mainframe job, server job, parallel job, sequence job, server shared container, parallel shared container, table definition, IMS database, IMS viewset Server job, parallel job, sequence job, server shared container, parallel shared container Server job, parallel job, sequence job, server shared container, parallel shared container Server job, parallel job, sequence job, server shared container, parallel shared container Server job, parallel job, sequence job, server shared container, parallel shared container, data connection
Column definition
None
Transform
Data element
Data connection
Parameter set
Parameter set
None
166
2. Either right-click to open the pop-up menu or open the Repository menu from the main menu bar. 3. Select Find where used All types or Find where used (deep) All types to search for any type of object that uses your selected object. 4. Select Find where used object type or Find where used (deep) object type to restrict the search to certain types of object. The search displays the results in the Repository Advanced Find window. The results of a deep search show all the objects related to the ones that use your search object. To run a where used query from the Repository Advanced Find window: 1. Click the where used item in the left pane to open the where used tool. 2. Click Add. 3. In the Select items window, Browse for the object or objects that you want to perform the where used query on and click OK. 4. You can continue adding objects to the where used window to create a list of objects that you want to perform a where used query on. 5. Click Find when you are ready to perform the search. The results are displayed in the Results - Details and the Results - Graphical tabs of the Repository Advanced Find window. 6. View the results in the Repository Advanced Find window, click the Results Details tab to view the results as a text list, click the Results - graphical tab to view a graphical representation of the results. If you view the results as a text list, note that the results only list the first three dependency paths found for each object in the Sample dependency path field. To view all the dependency paths for an object, right-click on the object and select Show dependency path to object.
167
2. Right-click and select Find where column used from the pop-up menu. 3. In the table column window, select one or more columns. 4. Click OK. To 1. 2. 3. 4. run a query from the Repository Advanced Find window: Open the Repository Advanced Find window. Click the where used item in the left pane. Click Add. In the Select Items window, select the column or columns that you want to query. 5. Click OK. 6. Click Find.
168
3. Select one of the following items from the pop-up menu: v Show where data flows to v Show where data originates from v Show where data flows to and originates from 4. In the browser window, select the column or columns that you want to display data lineage for. Alternatively, click Select All to select all the columns.
169
You can select any of the objects from the list and perform operations on them from the pop-up menu as described for advanced find.
170
Impact analysis queries also display results in graphical form, showing the relationships of the objects found by the query. To view the results in graphical form, click the Results - Graphics tab:
Figure 16. Example of a where used query, showing the results in graphical view
171
From this view you can invoke a view of the dependency path to a particular object. Select that object in the graphical results viewer tab and choose Show dependency path from ObjectName from the pop-up menu. This action opens another tab, which shows the relationships between the objects in the job. Here is the dependency path from the job to the table definition:
If an object has a plus sign (+) next to it, click the plus sign to expand the view:
172
173
174
Importing objects
The Designer allows you to import various objects into the repository: v WebSphere DataStage components previously exported from other WebSphere DataStage projects (in proprietary format or in XML). v External function definitions v WebService function definitions v Metadata via bridges v Table definitions v IMS definitions
175
Note: You can import components that support mainframe functionality only into a WebSphere DataStage system that has Enterprise MVS Edition installed. You should also ensure that the system to which you are importing supports the required platform type. Note: When importing a large project into an HP system, you might run into the HP limit of 32767 links to a file. This problem will result in the import failing with the error: unable to create operating system file. Note: There is a limit of 255 characters on object names. It is possible that exports from earlier systems might exceed this limit. To import components: 1. Choose Import DataStage Components... to import components from a text file or Import DataStage Components (XML) ... to import components from an XML file. The DataStage Repository Import dialog box appears (the dialog box is slightly different if you are importing from an XML file, but has all the same controls): 2. Type in the path or browse for the file to import from. 3. To import objects from the file into the repository, select Import all and click OK. During import, you will be warned if objects of the same name already exist in the repository and asked if you want to overwrite them. If you select the Overwrite without query check box before importing you will not be warned, and any existing objects will automatically be overwritten. If you import job components, they are imported into the current project in the Designer. 4. To import selected components from the file into the repository, select Import selected and click OK. The Import Selected dialog box appears. Select the required items and click OK. The selected job components are imported into the current project in the Designer. 5. To turn impact analysis off for this particular import, deselect the Perform Impact Analysis checkbox. (By default, all imports are checked to see if they are about to overwrite a currently used component, disabling this feature might speed up large imports. You can disable it for all imports by changing the impact analysis options)
dsimport command
The dsimport command is as follows:
dsimport.exe /D domain /H hostname /U username /P password /NUA project|/ALL| /ASK dsx_pathname1 dsx_pathname2 ...
176
The arguments are as follows: v domain or domain:port_number. The application server name. This can also optionally have a port number. v hostname. The WebSphere DataStage Server to which the file will be imported. v username. The user name to use for connecting to the application server. v password. The users password . v NUA. Include this flag to disable usage analysis. This is recommended if you are importing a large project. v project, /ALL, or /ASK. Specify a project to import the components to, or specify /ALL to import to all projects or /ASK to be prompted for the project to which to import. v dsx_pathname. The file to import from. You can specify multiple files if required. For example, the following command imports the components in the file jobs.dsx into the project dstage1 on the R101 server:
dsimport.exe /D domain:9080 C:/scratch/jobs.dsx /U wombat /P w1ll1am dstage1 /H R101
dscmdimport command
The dscmdimport command is as follows:
dscmdimport /D domain /H hostname /U username /P password /NUA project|/ALL|/ASK pathname1 pathname2 ... /V
The arguments are as follows: v domain or domain:port_number. The application server name. This can also optionally have a port number. hostname. The WebSphere DataStage Server to which the file will be imported. username. The user name to use for connecting to the application Server password. The users password. NUA. Include this flag to disable usage analysis. This is recommended if you are importing a large project. v project, /ALL, or /ASK. Specify a project to import the components to, or specify /ALL to import to all projects or /ASK to be prompted for the project to which to import. v pathname. The file to import from. You can specify multiple files if required. The files can be DSX files or XML files, or a mixture of both. v V. Use this flag to switch the verbose option on. v v v v For example, the following command imports the components in the file jobs.dsx into the project dstage1 on the R101 server:
dscmdimport /D domain:9080 /U wombat /P w1ll1am dstage1 /H R101 C:/scratch/jobs.dsx
Messages from the import are sent to the console by default, but can be redirected to a file using >, for example:
dscmdimport /D domain:9080 /U wombat /P w1ll1am /H R101 /NUA dstage99 c:/scratch/project99.dsx /V > c:/scratch/importlog
177
You can simply type dscmdimport at the command prompt to get help on the command options.
XML2DSX command
The XML2DSX command is as follows: XML2DSX.exe /D domain /H hostname /U username /P password /N[OPROMPT] /I[INTERACTIVE] /V[ERBOSE] projectname filename /T templatename The arguments are as follows: v domain or domain:port_number. The application server name. This can also optionally have a port number. v hostname. The WebSphere DataStage Server to which the file will be imported. v username. The user name to use when connecting. v password. The users password. v NOPROMPT. If you include this flag, the import will run silently, overwriting any existing components in the WebSphere DataStage Repository with the same name. v INTERACTIVE. If you include this flag, the Designer opens the standard Import DataStage Components dialog box. This allows you to choose which components should be imported. v VERBOSE. Displays details about the XML2DSX tool such as version number when it starts up. v projectname. The name of the WebSphere DataStage project to attach to on hostname. v filename. The name of the XML file containing the components to import. v templatename. Optionally specifies an XSLT template to be used to transform the XML file. If this is omitted, a default template is used. The default template is the one used when WebSphere DataStage exports components into XML. For example, the following command imports the components in the file jobs.xml into the project dstage1 on the R101 server:
XML2DSX.exe /D domain:9080 /U wombat /P w1ll1am /N dstage1 /H R101 C:/scratch/jobs.xml
DS_IMPORTDSX command
This command is run from within the DS engine using dssh. It can import any job executables found within specified DSX files. Run it as follows: 1. CD to the project directory on the server:
cd C:\IBM\InformationServer\Server\projects\project
3. At the dssh prompt enter the DS_IMPORTDSX command (see below for syntax). The DS import command is as follows:
DS_IMPORTDSX filename [[-OVERWRITE] -JOB[S] * | jobname ...] | [-LIST]
178
v filename. The name of the DSX file containing the components to import. v OVERWRITE. Specify this to overwrite any existing executables of the same name. v JOB[S]. Specify one or more job executables to import. v LIST. Specify this to list the executables in a .DSX file rather than import them. If you dont specify a JOB or a LIST argument, the default is to import all job executables in the specified file.
179
2. Choose Import Web Service Function Definitions... . The Web Service Browser appears. The upper right panel shows the web services whose metadata you have loaded. Select a web service to view the operations available in the web service in the upper left pane. 3. Select the operation you want to import as a routine. Information about the selected web service is shown in the lower pane. 4. Either click Select this item in the lower pane, or double-click the operation in the upper right pane. The operation is imported and appears as a routine in the Web Services category under a category named after the web service. Once the routine is imported into the Repository, you can open the Server Routine dialog box to view it and edit it. See Viewing and Editing a Routine.
180
The arrows indicate relationships and you can follow these to see what classes are related and how (this might affect what you choose to import). 6. Specify what is to be imported: v Select boxes to specify which instances of which classes are to be imported. v Alternatively, you can click Select All to import all instances of all classes. v If you change your mind, click Clear All to remove everything from the Selection Status pane. 7. When you are satisfied that you have selected all the metadata you want to import, click OK. The Parameters Selection dialog box appears. 8. Specify the required parameters for the import into WebSphere DataStage. These allow you to specify whether you should be prompted to confirm overwrites to the WebSphere DataStage Repository, and whether verbose output is enabled. Click OK. The Status dialog box appears. 9. The bridge copies the selected data into the repository. The status of the import is displayed in a text window. Click Finish when it has completed the import. You might import more items than you have explicitly selected. This is because the bridge ensures that data integrity is maintained. For example, if you import a single column, the table definition for the table containing that column is also imported.
181
v FINISH v END The clauses captured from PSB files include: v PCB v SENSEG v SENFLD v PSBGEN v END You can import IMS definitions from IMS version 5 and above. IMS field types are converted to COBOL native data types during capture, as described in the table below.
Table 3. Conversion of IMS field types to COBOL native data types IMS Field Type X P C F H
1
DISPLAY_NUMERIC PIC S9(n)V9(0) COMP-3 CHARACTER BINARY BINARY PIC X(n) PIC S9(9) COMP PIC S9(4) COMP
Choose Import IMS Definitions Data Base Description (DBD)...to import a DBD or Import IMS Definitions Program Specification Block (PSB)... to import a PSB. The Import Metadata dialog box appears. This dialog box has the following fields: v Seen from. Your computers network identification name. The name is automatically filled in and is read-only. v IMS file description pathname. The pathname where the IMS file is located. You can type the pathname or browse for it by clicking the ... (browse) button. The IMS file must either reside on the WebSphere DataStage client workstation or on a network that is visible from the client workstation. The default capture file extension is *.dbd for DBD files or *.psb for PSB files. v Platform type. The operating system for the mainframe platform. (OS/390 is the only platform currently supported.) v Database names or Viewset names. The databases or viewsets defined in the selected DBD or PSB file. This list will appear after you enter the IMS pathname. Click Refresh to refresh the list if necessary. Select a single item by clicking the database or viewset name, or select multiple items by holding down the Ctrl key and clicking the names. To select all items, click Select all. To see a summary description of an item, select it and click Details. The Details of dialog box appears, displaying the type, description, modification history, and column names. When importing from a PSB there is an additional field:
182
v Create associated tables. Select this check box to have WebSphere DataStage create a table in the Repository that corresponds to each sensitive segment in the PSB file, and columns in the table that correspond to each sensitive field. Only those fields that are defined in the PSB become columns; fillers are created where necessary to maintain proper field displacement and segment size. The associated tables are stored in the Table Definitions branch of the project tree. If you have a CFD with a definition of the complete IMS segment, you can import it to create the completely defined table, including any columns that were captured as fillers. You can then change the associated table for each segment in the IMS Viewset Editor; see Editing IMS Definitions for details. Click OK after selecting the items to import. The data is extracted and parsed. If any syntactical or semantic errors are found, the Import Error dialog box appears, allowing you to view and fix the errors, skip the import of the incorrect item, or stop the import process altogether. Viewing and Editing IMS Definitions After you import IMS databases and viewsets, you can view and edit their definitions in the Designer. Editing of IMS definitions is limited to entering descriptions and creating mappings between viewset segments and their associated tables. If you want to edit columns, you must open the associated table definition, see Viewing or Modifying a Table Definition.
Exporting objects
The Designer allows you to export various objects from the repository. You can export objects in text format or in XML.
183
3.
4.
5. 6. 7.
8.
9.
10.
v Choose Repository Export from the main menu. The Repository Export dialog box appears, populated with the selected items. Use the Add, Remove, and Select all hyperlinks to change the selection if necessary. Selecting Add opens a browse dialog box showing the repository tree. From the drop-down list, choose one of the following options to control how any jobs you are exporting are handled: v Export job designs with executables (where applicable) v Export job designs without executables (this is the only option available for XML export) v Export job executables without designs Select the Exclude read-only objects check box to exclude such objects from the export. Select the Include dependent items check box to automatically include items that your selected items depend upon. Click the Options button to open the Export Options dialog box. This allows you to change the exporters default settings on the following: Under the Default General branch: v Whether source code is included with exported routines (yes by default) v Whether source code is included with job executables (no by default) v Whether source content is included for data quality items. Under the Default Viewer branch: v Whether the default viewer or specified viewer should be used (the default viewer is the one Windows opens this type of file with, this is normally Internet Explorer for XML documents, but you need to explicitly specify one such as Notepad for .dsx files). Using the default viewer is the default option. Under the XML General branch. v Whether a DTD is to be included (no by default) v Whether property values are output as internal values (which are numeric) or as externalized strings (internal values are the default). Note that, if you chose the externalized string option, you will not be able to import the file that is produced. Under the XML Stylesheet branch: v Whether an external stylesheet should be used (no by default) and, if it is, the type and the file name and location of the stylesheet. Specify the type of export you want to perform. Choose one of the following from the drop-down list: v dsx v dsx 7-bit encoded v legacy XML Specify or select the file that you want to export to. You can click the View button to look at this file if it already exists (this will open the default viewer for this file type specified in Windows or any viewer you have specified in the Export Options dialog box). Select Append to existing file if you wanted the exported objects to be appended to, rather than replace, objects in an existing file. (This is not available for export to XML.)
184
11. Examine the list of objects to be exported to assure yourself that all the ones you want to export have Yes in the Included column. 12. Click Export to export the chosen objects to the specified file.
185
8. Specify or select the file that you want to export to. You can click the View button to look at this file if it already exists (this will open the default viewer for this file type specified in Windows or any viewer you have specified in the Export Options dialog box). 9. Select Append to existing file if you wanted the exported objects to be appended to, rather than replace, objects in an existing file. (This is not available for export to XML.) 10. Examine the list of objects to be exported to assure yourself that all the ones you want to export have Yes in the Included column. Click Export to export the chosen objects to the specified file.
186
dscmdexport command
The dscmdexport is as follows:
dscmdexport /D domain /H hostname /U username /P password project pathname /V
The arguments are as follows: v domain or domain:port_number. The application server name. This can also optionally have a port number. v hostname. Specifies the DataStage Server from which the file will be exported. v username. The user name to use for connecting to the Application Server. v password. The users password. v project. Specify the project to export the components from. v pathname. The file to which to export. v V. Use this flag to switch the verbose option on. For example, the following command exports the project dstage2 from the R101 to the file dstage2.dsx:
dscmdexport /D domain:9080 /H R101 /U billg /P paddock dstage2 C:/scratch/dstage2.dsx
Messages from the export are sent to the console by default, but can be redirected to a file using >, for example:
dscmdexport /H R101 /U=billg /P paddock dstage99 c:/scratch/project99.dsx /V > c:/scratch/exportlog
You can simply type dscmdexport at the command prompt to get help on the command options.
dsexport command
The dsexport command is as follows:
dsexport.exe /D domain /H hostname /U username /P password /JOB jobname /XML /EXT /EXEC /APPEND project pathname1
The arguments are as follows: v domain or domain:port_number. The application server name. This can also optionally have a port number. v hostname specifies the DataStage Server from which the file will be exported. v username is the user name to use for connecting to the Application Server. v password is the users password.
Chapter 11. Sharing and moving your designs
187
v jobname specifies a particular job to export. v project. Specify the project to export the components from. v pathname. The file to which to export. The command takes the following options: v /XML export in XML format, only available with /JOB=jobname option. v /EXT export external values, only available with /XML option. v /EXEC export job executable only, only available with /JOB=jobname and when /XML is not specified. v /APPEND append to existing dsx file/ only available with /EXEC option. For example, the following command exports the project dstage2 from the R101 to the file dstage2.dsx:
dsexport.exe /D domain:9080 /H R101 /U billg /P paddock dstage2 C:/scratch/dstage2.dsx
188
189
The report is not dynamic, if you change the job design you will need to regenerate the report. Note: Job reports work best using Microsoft Internet Explorer 6, they might not perform optimally with other browsers.
190
2. Specify a name for the report. The Reporting Console folder field shows you the location where the report is saved to. 3. Type a description of the report. 4. Choose Use Default Stylesheet to use the default XSLT stylesheet supplied with WebSphere DataStage (the report is generated in XML, the stylesheet converts it to HTML, suitable for viewing in a browser). You can also define a custom stylesheet for this purpose, in which case choose Use custom stylesheet and type in, or browse for, the pathname of your stylesheet. Any stylesheets that you supply must use UTF-8 encoding. The default stylesheet is DSJobReport.xsl or DSMainframeJobReport.xsl for mainframe jobs - both located in the WebSphere DataStage client directory for example, C:\IBM\InformationServer\Clients\Classic. 5. Select Retain intermediate XML file to have WebSphere DataStage retain the XML file that it initially generates when producing a report. Specify the folder in which to save the xml file. 6. Click OK to generate the report.
The arguments are as follows: v domain or domain:port_number. The application server name. This can also optionally have a port number. v hostname. The WebSphere DataStage Server that will generate the report. v username. The user name to use for connecting to the Application Server v password. The users password. v Job_name | shared_container_name. The name of the job or shared container for which you want to generate a report. v /R. Indicates that you want to generate a report. v report_pathname. The directory where the report subdirectory will appear. The report is always written to a subdirectory named after the job. If no directory is specified, the report is written to a subdirectory in the client directory (for example, C:\IBM\InformationServer\Clients\Classic\myjob). v stylesheet_pathname. Specify an alternative XSLT stylesheet to use. If you do not specify a stylesheet, the default one will be used v /RX. Specify this option to retain the intermediate XML file. For example, the command:
dsdesign /D domain:9080 /H R101 /U william /P wombat dstage ServerJob1 /R /RP c:\JobReports
191
Would result in the report ServerJob1.htm being written to the directory c:\JobReports\ServerJob1 (another file, jobnumber.bmp, is also written, containing the graphic image of the job design). The command:
dsdesign /D domain:9080 /H R101 /U william /P wombat dstage ServerJob1 /R /RP c:\JobReports /RX
Would result in the files ServerJob1.htm, jobnumber.bmp, and ServJob1.xml being written to the directory c:\ServerJob1\JobReports. Note: If a job report is requested from the command line, then that report will not be available for viewing through the reporting console.
192
193
Successful compilation
If the Compile Job dialog box displays the message Job successfully compiled with no errors. You can: v Validate the job v Run or schedule the job v Release the job v Package the job for deployment on other WebSphere DataStage systems Jobs are validated and run using the Director client.
The dscc command takes the following arguments: v /H hostname specify the WebSphere DataStage server where the job or jobs reside. v /U username the username to use when attaching to the project. v /P password the password to use when attaching to the project. v project_name the project which the job or jobs belong to. v /J jobname |* | category_name\* specifies the jobs to be compiled. Use jobname to specify a single job, * to compile all jobs in the project and category_name\* to compile all jobs in that category (this will not include categories within that category). You can specify job sequences as well as parallel or server jobs. v /R routinename | * | category_name\* specifies routines to be compiled. Use routinename to specify a single routine, * to compile all routines in the project and category_name\* to compile all routines in that category (this will not include categories within that category). v /F force compile (for parallel jobs). v /OUC only compile uncompiled jobs. v /RD reportname specify a name and destination for a compilation report. Specify DESKTOP\filename to write it to your desktop or .\filename to write it to the current working directory. The options are not case sensitive. For example:
dscc /h r101 /u fellp /p plaintextpassword dstageprj /J mybigjob
Will connect to the machine r101, with a username and password of fellp and plaintextpassword, attach to the project dstageprj and compile the job mybigjob.
194
Job validation
Validation of a mainframe job design involves: v Checking that all stages in the job are connected in one continuous flow, and that each stage has the required number of input and output links. v Checking the expressions used in each stage for syntax and semantic correctness. A status message is displayed as each stage is validated. If a stage fails, the validation will stop.
Code generation
Code generation first validates the job design. If the validation fails, code generation stops. Status messages about validation are in the Validation and code generation status window. They give the names and locations of the generated files, and indicate the database name and user name used by each relational stage. Three files are produced during code generation: v COBOL program file which contains the actual COBOL code that has been generated. v Compile JCL file which contains the JCL that controls the compilation of the COBOL code on the target mainframe machine. v Run JCL file which contains the JCL that controls the running of the job on the mainframe once it has been compiled.
195
Job upload
After you have successfully generated the mainframe code, you can upload the files to the target mainframe, where the job is compiled and run. To upload a job, choose File Upload Job. The Remote System dialog box appears, allowing you to specify information about connecting to the target mainframe system. Once you have successfully connected to the target machine, the Job Upload dialog box appears, allowing you to actually upload the job.
JCL templates
WebSphere DataStage uses JCL templates to build the required JCL files when you generate a mainframe job. WebSphere DataStage comes with a set of building-block JCL templates suitable for various tasks. The supplied templates are in a directory called JCL Templates under the WebSphere DataStage server install directory. There are also copies of the templates held in the WebSphere DataStage Repository for each WebSphere DataStage project. You can edit the templates to meet the requirements of your particular project. This is done using the JCL Templates dialog box from the Designer. Open the JCL Templates dialog box by choosing Tools JCL Templates. It contains the following fields and buttons: v Platform type. Displays the installed platform types in a drop-down list. v Template name. Displays the available JCL templates for the chosen platform in a drop-down list. v Short description. Briefly describes the selected template. v Template. The code that the selected template contains. v Save. This button is enabled if you edit the code, or subsequently reset a modified template to the default code. Click Save to save your changes. v Reset. Resets the template code back to that of the default template. If there are system wide changes that will apply to every project, then it is possible to edit the template defaults. Changes made here will be picked up by every WebSphere DataStage project on that WebSphere DataStage server. The JCL Templates directory contains two sets of template files: a default set that you can edit, and a master set which is read-only. You can always revert to the master templates if required, by copying the read-only masters over the default templates. Use a standard editing tool, such as Microsoft Notepad, to edit the default templates.
Code customization
When you check the Generate COPY statement for customization box in the Code generation dialog box, WebSphere DataStage provides four places in the generated COBOL program that you can customize. You can add code to be executed at program initialization or termination, or both. However, you cannot add code that would affect the row-by-row processing of the generated program.
196
When you check Generate COPY statement for customization, four additional COPY statements are added to the generated COBOL program: v COPY ARDTUDAT. This statement is generated just before the PROCEDURE DIVISION statement. You can use this to add WORKING-STORAGE variables or a LINKAGE SECTION to the program. v COPY ARDTUBGN. This statement is generated just after the PROCEDURE DIVISION statement. You can use this to add your own program initialization code. If you included a LINKAGE SECTION in ARDTUDAT, you can use this to add the USING clause to the PROCEDURE DIVISION statement. v COPY ARDTUEND. This statement is generated just before each STOP RUN statement. You can use this to add your own program termination code. v COPY ARDTUCOD. This statement is generated as the last statement in the COBOL program. You use this to add your own paragraphs to the code. These paragraphs are those which are PERFORMed from the code in ARDTUBGN and ARDTUEND. WebSphere DataStage provides default versions of these four COPYLIB members. As provided, ARDTUDAT, ARDTUEND, and ARDTUCOD contain only comments, and ARDTUBGN contains comments and a period. You can either preserve these members and create your own COPYLIB, or you can create your own members in the WebSphere DataStage runtime COPYLIB. If you preserve the members, then you must modify the WebSphere DataStage compile and link JCL templates to include the name of your COPYLIB before the WebSphere DataStage runtime COPYLIB. If you replace the members in the WebSphere DataStage COPYLIB, you do not need to change the JCL templates.
197
the Remove buttons. Clicking Add while you have a folder selected selects all teh items in that folder and move them to the right pane. All the jobs in the right pane will be compiled. 3. Click Next>, if you are compiling parallel or mainframe jobs, the Compiler Options screen appears, allowing you to specify the following: v Force compile (for parallel jobs). v An upload profile for mainframe jobs you are generating code for. 4. Click Next>. The Compile Process screen appears, displaying the names of the selected items and their current compile status. 5. Click Start Compile to start the compilation. As the compilation proceeds the status changes from Queued to Compiling to Compiled OK or Failed and details about each job are displayed in the compilation output window as it compiles. Click the Cancel button to stop the compilation, although you can only cancel between compilations so the Designer client might take some time to respond. 6. Click Finish. If the Show compile report checkbox was selected the job compilation report screen appears, displaying the report generated by the compilation.
198
199
Job sequences are optionally restartable. If you run a restartable sequence and one of the jobs fails to finish correctly, you can fix the problem, then re-run the sequence from the point at which it left off. The sequence records checkpoint information to enable it to do this. Checkpoint information enables WebSphere DataStage to restart the sequence in a sensible way. You can enable or disable checkpointing at a project-wide level, or for individual job sequences. If you have checkpointing enabled for a job, you can specify that individual components within a sequence are not checkpointed, forcing them to be re-executed whenever the sequence is restarted regardless of whether they were executed successfully before. You can also specify that the sequence automatically handles any errors it encounters when executing jobs. This means you do not have to specify an error handling trigger for every job in the sequence. This can also be enabled on a project-wide basis, or for individual job sequences.
The Diagram window appears in the right pane of the Designer, along with the Tool palette for job sequences. You can now save the job sequence and give it a name. This is exactly the same as saving a job (see Saving a Job on page 3-2). You create a job sequence by: v Placing the stages representing the activities in your sequence on the canvas.
200
v Linking the stages together. v Specifying properties details for each activity, defining what it does. v Specifying trigger information for each activity specifying what action is taken on success or failure of the activity. You can open an existing job sequence in the same way you would open an existing job (see Opening an Existing Job on page 3-2).
Activity stages
The job sequence supports the following types of activity: v Job. Specifies a WebSphere DataStage server or parallel job. v Routine. Specifies a routine. This can be any routine in the WebSphere DataStage Repository (but not transforms). v ExecCommand. Specifies an operating system command to execute. v Email Notification. Specifies that an email notification should be sent at this point of the sequence (uses SMTP). v Wait-for-file. Waits for a specified file to appear or disappear. v Exception Handler. There can only be one of these in a job sequence. It is executed if a job in the sequence fails to run (other exceptions are handled by triggers) or if a job aborts and the Automatically handle activities that fail option is set for the sequence. v Nested Conditions. Allows you to further branch the execution of a sequence depending on a condition. v Sequencer. Allows you to synchronize the control flow of multiple activities in a job sequence. v Terminator. Allows you to specify that, if certain situations occur, the jobs a sequence is running shut down cleanly. v Start Loop and End Loop. Together these two stages allow you to implement a For...Next or For...Each loop within your sequence. v User Variable. Allows you to define variables within a sequence. These variables can then be used later on in the sequence, for example to set job parameters. The activity stages are controlled by the setting of their properties (see Activity Properties). To add an activity to your job sequence, drag the corresponding icon from the tool palette and drop it on the Diagram window. You can also add particular jobs or routines to the design as activities by dragging the icon representing that job or routine from the Designers Repository window and dropping it in the Diagram window. The job or routine appears as an activity in the Diagram window.
201
Activities can be named, moved, and deleted in the same way as stages in an ordinary server or parallel job. (see Developing a Job.)
Triggers
The control flow in the sequence is dictated by how you interconnect activity icons with triggers. To add a trigger, select the trigger icon in the tool palette, click the source activity in the Diagram window, then click the target activity. Triggers can be named, moved, and deleted in the same way as links in an ordinary server or parallel job (see Developing a Job.). Other trigger features are specified by editing the properties of their source activity. Activities can only have one input trigger, but can have multiple output triggers. Trigger names must be unique for each activity. For example, you could have several triggers called success in a job sequence, but each activity can only have one trigger called success. There are three types of trigger: v Conditional. A conditional trigger fires the target activity if the source activity fulfills the specified condition. The condition is defined by an expression, and can be one of the following types: OK. Activity succeeds. Failed. Activity fails. Warnings. Activity produced warnings. ReturnValue. A routine or command has returned a value. Custom. Allows you to define a custom expression. User status. Allows you to define a custom status message to write to the log. v Unconditional. An unconditional trigger fires the target activity once the source activity completes, regardless of what other triggers are fired from the same activity. v Otherwise. An otherwise trigger is used as a default where a source activity has multiple output triggers, but none of the conditional ones have fired. Different activities can output different types of trigger Stage Type Trigger Type Wait for file, Execute Command
Unconditional Otherwise Conditional Conditional Conditional Conditional -
Routine
Unconditional Otherwise Conditional Conditional Conditional Conditional -
Job
202
Nested condition
Unconditional Otherwise Conditional - Custom
Exception Handler, Sequencer, Notification, Start Loop, End Loop Unconditional Note: If a job fails to run, for example because it was in the aborted state when due to run, this will not fire a trigger. Job activities can only fire triggers if they run. Non-running jobs can be handled by exception activities, or by choosing an execution action of reset then run rather than just run for jobs (see Job Activity Properties).
Expressions
You can enter expressions at various places in a job sequence to set values. Where you can enter expressions, the Expression Editor is available to help you and to validate your expression. The expression syntax is a subset of that available in a server job Transformer stage or parallel job BASIC Transformer stage, and comprises: v Literal strings enclosed in double-quotes or single-quotes. v v v v v Numeric constants (integer and floating point). The sequences own job parameters. Prior activity variables (for example, job exit status). All built-in BASIC functions as available in a server job. Certain macros and constants as available in a server or parallel job: DSHostName DSJobController DSJobInvocationId DSJobName DSJobStartDate DSJobStartTime DSJobStartTimestamp DSJobWaveNo DSProjectName DS constants as available in server jobs. Arithmetic operators: + - * / ** ^ Relational operators: > < = # <> >= =< and so on. Logical operators (AND OR NOT) plus usual bracketing conventions. The ternary IF ... THEN ... ELSE operator.
v v v v v
203
Note: When you enter valid variable names (for example a job parameter name or job exit status) in an expression, you should not delimit them with the hash symbol (#) as you do in other fields in sequence activity properties.
General page
The General page contains: v Version number. The version number of the job sequence. A version number has several components: The version number N.n.n. This number checks the compatibility of the job with the version of WebSphere DataStage installed. This number is automatically set when WebSphere DataStage is installed and cannot be edited. The bug fix number n.n.N. This number reflects minor changes to the job sequence design or properties. To change this number, select it and enter a new value directly or use the arrow buttons to increase the number. v Allow Multiple Instance. Select this to enable the WebSphere DataStage Director to run multiple instances of this job sequence. The compilation options specify details about restarting the sequence if one of the jobs fails for some reason. v Add checkpoints so sequence is restartable on failure. Select this to enable restarting for this job sequence. If you have enabled this feature on a project-wide basis in the WebSphere DataStage Administrator, this check box is selected by default when the sequence is first created. v Automatically handle job runs that fail. Select this to have WebSphere DataStage automatically handle failing jobs within a sequence (this means that you do not have to have a specific trigger for job failure). When you select this option, the following happens during job sequence compilation: For each job activity that does not have a specific trigger for error handling, code is inserted that branches to an error handling point. (If an activity has either a specific failure trigger, or if it has an OK trigger and an otherwise trigger, it is judged to be handling its own aborts, so no code is inserted.) If the compiler has inserted error-handling code the following happens if a job within the sequence fails: A warning is logged in the sequence log about the job not finishing OK. If the job sequence has an exception handler defined, the code will go to it. If there is no exception handler, the sequence aborts with a suitable message. If you have enabled this feature on a project-wide basis in the WebSphere DataStage Administrator, this check box is selected by default when the sequence is first created. Note that, when using this feature, you should avoid using any routines within the job sequence that return any value other than 0 to indicate success, as non-zero values will always be taken as indicating failure (all routines supplied with WebSphere DataStage return 0 to indicate success).
204
v Log warnings after activities that finish with status other than OK. Select this to have the sequence log a message in the sequence log if it runs a job that finished with a non-zero completion code (for example, warnings or fatal errors). Messages are also logged for routine or command activities that fail (that is, return a non-zero completion code). v Log report messages after each job run. Select this to have the sequence log a status report for a job immediately the job run finishes. The following is an example of the information that will be logged:
************************************************** STATUS REPORT FOR JOB: jobname Generated: 2003-10-31 16:13:09 Job start time=2003-10-31 16:13:07 Job end time=2003-10-31 16:13:07 Job elapsed time=00:00:00 Job status=1 (Finished OK) Stage: stagename1, 10000 rows input Stage start time=2003-10-31 16:17:27, end time=2003 -10-31 16:17:27, elapsed=00:00:00 Link: linkname1, 10000 rows Stage: stagename2, 10000 rows input Stage start time=2003-10-31 16:17:28, end time=2003 -10-31 16:17:28, elapsed=00:00:00 Link: linkname2, 10000 rows Link: linkname3, 10000 rows
v Short Description. An optional brief description of the job sequence. v Full Description. An optional detailed description of the job sequence.
Parameters page
The Parameters page allows you to specify parameters for the job sequence. Values for the parameters are collected when the job sequence is run in the Director. The parameters you define here are available to all the activities in the job sequence, so where you are sequencing jobs that have parameters, you need to make these parameters visible here. For example, if you were scheduling three jobs, each of which expected a file name to be provided at run time, you would specify three parameters here, calling them, for example, filename1, filename2, and filename 3. You would then edit the Job page of each of these job activities in your sequence to map the jobs filename parameter onto filename1, filename2, or filename3 as appropriate (see Job Activity Properties ). When you run the job sequence, the Job Run Options dialog box appears, prompting you to enter values for filename1, filename2, and filename3. The appropriate filename is then passed to each job as it runs. You can also set environment variables at run time, the settings only take effect at run time, they do not affect the permanent settings of environment variables. To set a runtime value for an environment variable: 1. Click Add Environment Variable... at the bottom of the Parameters page. The Choose environment variable list appears. This shows a list of the available environment variables (the example shows parallel job environment variables). 2. Click on the environment variable you want to override at runtime. It appears in the parameter grid, distinguished from job parameters by being preceded by a $. You can also click New... at the top of the list to define a new environment variable. A dialog box appears allowing you to specify name and prompt. The
205
new variable is added to the Choose environment variable list and you can click on it to add it to the parameters grid. 3. Set the required value in the Default Value column. This is the only field you can edit for an environment variable. Depending on the type of variable a further dialog box might appear to help you enter a value. The Parameters grid has the following columns: v Parameter name. The name of the parameter. v v v v Prompt. Text used as the field name in the run-time dialog box. Type. The type of the parameter (to enable validation). Default Value. The default setting for the parameter. Help text. The text that appears if a user clicks Property Help in the Job Run Options dialog box when running the job sequence.
You can refer to the parameters in the job sequence by name. When you are entering an expression, you just enter the name directly. Where you are entering a parameter name in an ordinary single-line text box, you need to delimit the name with hash symbols, for example: #dayofweek#.
Dependencies page
The Dependencies page of the Properties dialog box shows you the dependencies the job sequence has. These might be functions, routines, or jobs that the job sequence runs. Listing the dependencies of the job sequence here ensures that, if the job sequence is packaged for use on another system, all the required components will be included in the package. The details as follows: v Type. The type of item upon which the job sequence depends: Job. Released or unreleased job. If you have added a job to the sequence, this will automatically be included in the dependencies. If you subsequently delete the job from the sequence, you must remove it from the dependencies list manually. Local. Locally cataloged BASIC functions and subroutines (that is, Transforms and Before/After routines). Global. Globally cataloged BASIC functions and subroutines (that is, Custom UniVerse functions). File. A standard file. ActiveX. An ActiveX (OLE) object (not available on UNIX-based systems). v Name. The name of the function or routine. The name required varies according to the Type of the dependency: Job. The name of a released, or unreleased, job. Local. The catalog name. Global. The catalog name. File. The file name.
206
ActiveX. The Name entry is actually irrelevant for ActiveX objects. Enter something meaningful to you (ActiveX objects are identified by the Location field). v Location. The location of the dependency. A browse dialog box is available to help with this. This location can be an absolute path, but it is recommended you specify a relative path using the following environment variables: %SERVERENGINE% - WebSphere DataStage engine account directory (normally C:\IBM\InformationServer\Server\DSEngine). %PROJECT% - Current project directory. %SYSTEM% - System directory on Windows or /usr/lib on UNIX.
Activity properties
When you have outlined your basic design by adding activities and triggers to the diagram window, you fill in the details by editing the properties of the activities. To edit an activity, do one of the following: v Double-click the activity in the Diagram window. v Select the activity and choose Properties... from the shortcut menu. v Select the activity and choose Edit Properties. The format of the Properties dialog box depends on the type of activity. All have a General page, however, and any activities with output triggers have a Triggers page. The General page contains: v Name. The name of the activity, you can edit the name here if required. v Description. An optional description of the activity. v Logging text. The text that will be written to the Director log when this activity is about to run. The Triggers page contains: v Name. The name of the output trigger. v Expression Type. The type of expression attached to the trigger. Choose a type from the drop-down list (see Triggers for an explanation of the different types). v Expression. The expression associated with the trigger. For most predefined conditions, this is fixed and you cannot edit it. For Custom conditions you can enter an expression (see Expressions), and for UserStatus conditions you can enter a text message. You can use variables when defining trigger expressions for Custom and ReturnValue conditional triggers. The rules are given in the following table:
Table 4. Variables used in defining trigger expressions Activity Type Job Variable stage_label.$ExitStatus stage_label.$UserStatus stage_label.$JobName Use Value of job completion status Value of jobs user status Name of job actually run, including invocation id, if present.
207
Table 4. Variables used in defining trigger expressions (continued) Activity Type ExecCommand Variable Use
stage_label.$ReturnValue Command status Name of stage_label.$CommandName command executed stage_label.$CommandOutput (including path name if one was specified) Output captured by executing the command stage_label.$ReturnValue stage_label.$RoutineName stage_label.$ReturnValue Value of routines return code Name of routine called Value returned by DSWaitForFile before/after subroutine The stage label of the activity stage that raised the exception (for example, the job activity stage calling a job that failed to run). Indicates the reason the Exception Handler activity was invoked, and is one of: 1. Activity ran a job but it aborted, and there was no specific handler set up. -1. Job failed to run for some reason. The text of the message that will be logged as a warning when the exception is raised.
Routine Wait-for-File
Exception Handler (these are stage_label.$ErrSource. available for use in the stage_label.$ErrNumber. sequence of activities the stage initiates, not the stage stage_label.$ErrMessage. itself)
stage_label is name of the activity stage as given in the Diagram window. You can also use the job parameters from the job sequence itself. Custom conditional triggers in Nested condition and Sequencer activities can use any of the variable in the above table used by the activities connected to them. The specific pages for particular activities are described in the following sections.
208
parameter name needs to be delimited by hashes (#). You can also click the browse button to be presented with a list of available job parameters you could use. You cannot leave this field blank. v Execution Action. Allows you to specify what the activity will do with the job. Choose one of the following from the drop-down list: Run (the default) Reset if required then run Validate only Reset only v Do not checkpoint run. Set this to specify that WebSphere DataStage does not record checkpoint information for this particular job. This means that, if another job later in the sequence fails, and the sequence is restarted, this job will be rerun regardless of the fact that it finished successfully before. This option is only visible if the sequence as a whole is checkpointed. v Parameters. Allows you to provide values for any parameters that the job requires. The grid displays all the parameters expected by the job. You can: Type in an expression giving a value for the parameter in the Value Expression column. Literal values must be enclosed in inverted commas. (For details about expressions, see Expressions.) Select a parameter and click Insert Parameter Value to use another parameter or argument in the sequence to provide the value. A dialog box appears displaying a tree of all the available parameters and arguments occurring in the sequence before the current activity, This includes parameters that you have defined for the job sequence itself in the Job Sequence Properties dialog box (see Job Sequence Properties). Choose the required parameter or argument and click OK. You can use this feature to determine control flow through the sequence. Click Clear to clear the value expression from the selected parameter. Click Clear All to clear the expression values from all parameters. Select a parameter and click Set to Default to enter the default for that parameter as defined in the job itself. Click All to Default to set all the parameters to their default values. When you select the icon representing a job activity, you can choose Open Job from the shortcut menu to open the job in the Designer ready for editing.
209
v Arguments. Allows you to provide values for any arguments that the routine requires. The grid displays all the arguments expected by the routine. You can: Type in an expression giving the value for the argument in the Value Expression column. Literal values must be enclosed in inverted commas. (For details about expressions, see Expressions.) Click Clear to clear the value expression from the selected parameter. Click Clear All to clear the expression values from all parameters. Select an argument and click Insert Parameter Value to use another parameter or argument in the sequence to provide the value. A dialog box appears displaying a tree of all the available parameters and arguments occurring in the sequence before the current activity. Choose the required parameter or argument and click OK. You can use this feature to determine control flow through the sequence. You can access routine arguments in the activity triggers in the form routinename.argname. Such arguments can be also be accessed by other activities that occur subsequently in the job sequence. This is most useful for accessing an output argument of a routine, but note that BASIC makes no distinction between input and output arguments, it is up to you to establish which is which. When you select the icon representing a routine activity, you can choose Open Routine from the shortcut menu to open the Routine dialog box for that routine ready to edit.
210
you all parameters available at this point in the job sequence. Parameters entered in these fields need to be delimited with hashes (#). Parameters selected from the External Parameter Helper will automatically be enclosed in hash symbols. v Attachments. Files to be sent with the e-mail. Specify a path name, or a comma-separated list of pathnames (in the latter case this should be contained in single-quotes or double-quotes). You can also specify an expression that resolves to a pathname or comma-separated pathnames. The Arrow button offers you the choice of browsing for a file or inserting a job parameter whose value will be supplied at run time. Again, parameters entered in these fields need to be delimited with hashes (#). Parameters selected from the External Parameter Helper will automatically be enclosed in hash symbols. v Email body. The actual message to be sent. (Do not enclose the message in inverted commas unless you want them to be part of the message.) v Include job status in email. Select this to include available job status information in the message. v Do not checkpoint run. Set this to specify that WebSphere DataStage does not record checkpoint information for this particular notification operation. This means that, if a job later in the sequence fails, and the sequence is restarted, this notification operation will be re-executed regardless of the fact that it was executed successfully before. This option is only visible if the sequence as a whole is checkpointed.
211
212
For example, you could use a nested condition to implement the following control sequence:
Load/init jobA Run jobA If ExitStatus of jobA = OK then /*tested by trigger*/ If Today = "Wednesday" then /*tested by nested condition*/ run jobW If Today = "Saturday" then run jobS Else run JobB
Each nested condition can have one input trigger and will normally have multiple output triggers. The conditions governing the output triggers are specified in the Triggers page as described in Triggers . The triggers for the WhichJob Nested Condition activity stage would be:
(In this case DayCommand$CommandOutput refers to the value returned by a command that returns todays day and is executed by an ExecCommand activity called DayCommand.)
213
The following is section of a similar job sequence, but this time the sequencer mode is set to Any. When any one of the Wait_For_File or job activity stages complete successfully, Job_Activity_12 will be started.
214
215
216
The Start Loop stage, networkloop, has Numeric loop selected and the properties are set as follows:
This defines that the loop will be run through up to 1440 times. The action differs according to whether the job succeeds or fails: v If it fails, the routine waitabit is called. This implements a 60 second wait time. If the job continually fails, the loop repeats 1440 times, giving a total of 24 hours of retries. After the 1440th attempt, the End Loop stages passes control onto the next activity in the sequence.
Chapter 14. Building job sequences
217
v If it succeeds, control is passed to the notification stage, emailsuccess, and the loop is effectively exited. The following is a section of a job sequence that makes use of the loop stages to run a job repeatedly to process results for different days of the week:
The Start Loop stage, dayloop, has List loop selected and properties are set as follows:
218
The job processresults will run five times, for each iteration of the loop the job is passed a day of the week as a parameter in the order monday, tuesday, wednesday, thursday, friday. The loop is then exited and control passed to the next activity in the sequence.
219
The values of the user variables are set by expressions in the stages properties. (For details, see Expressions.) You would most likely start a sequence with this stage, setting up the variables so they can be accessed by subsequent sequence activities. The exit trigger would initiate the sequence proper. You can also use a User Variable activity further into a sequence to change the value of a variable previously defined by an earlier User Variable activity. The variables are defined in the Properties page for the stage. To add a variable: v Choose Add Row from the shortcut menu. v Enter the name for your variable. v Supply an expression for resolving the value of the variable.
In this example, the expression editor has picked up a fault with the definition. When fixed, this variable will be available to other activity stages downstream as MyVars.VAR1.
220
In this case you can take one of the following actions: v Run Job. The sequence is re-executed, using the checkpoint information to ensure that only the required components are re-executed. v Reset Job. All the checkpoint information is cleared, ensuring that the whole job sequence will be run when you next specify run job. Note: If, during sequence execution, the flow diverts to an error handling stage, WebSphere DataStage does not checkpoint anything more. This is to ensure that stages in the error handling path will not be skipped if the job is restarted and another error is encountered.
221
222
223
* exit) J2stat = DSGetJobInfo(Hjob2, DSJ.JOBSTATUS) If J2stat = DSJS.RUNFAILED Then Call DSLogFatal("Job DailyJob2 failed","JobControl") End * Now get a handle for the third job Hjob3 = DSAttachJob("DailyJob3",DSJ.ERRFATAL) * and run it Dummy = DSRunJob(Hjob3,DSJ.RUNNORMAL) * then wait for it to finish Dummy = DSWaitForJob(Hjob3) * Finally, get the finishing status for the third job and test it J3stat = DSGetJobInfo(Hjob3, DSJ.JOBSTATUS) If J3stat = DSJS.RUNFAILED Then Call DSLogFatal("Job DailyJob3 failed","JobControl") End
Possible status conditions returned for a job are as follows. A job that is in progress is identified by: v DSJS.RUNNING - Job running; this is the only status that means the job is actually running. Jobs that are not running might have the following statuses: v DSJS.RUNOK - Job finished a normal run with no warnings. v DSJS.RUNWARN - Job finished a normal run with warnings. v DSJS.RUNFAILED - Job finished a normal run with a fatal error. v DSJS.VALOK - Job finished a validation run with no warnings. v DSJS.VALWARN - Job finished a validation run with warnings. v DSJS.VALFAILED - Job failed a validation run. v DSJS.RESET - Job finished a reset run. v DSJS.STOPPED - Job was stopped by operator intervention (cannot tell run type). If a job has an active select list, but then calls another job, the second job will effectively wipe out the select list.
224
Intelligent Assistants
WebSphere DataStage provides intelligent assistants which guide you through basic WebSphere DataStage tasks. Specifically they allow you to: v Create a template from a server, parallel, or mainframe job. You can subsequently use this template to create new jobs. New jobs will be copies of the original job. v Create a new job from a previously created template. v Create a simple parallel data migration job. This extracts data from a source and writes it to a target.
Administrating templates
To delete a template, start the Job-From-Template Assistant and select the template. Click the Delete button. Use the same procedure to select and delete empty categories. The Assistant stores all the templates you create in the directory you specified during your installation of WebSphere DataStage. You browse this directory when you create a new job from a template. Typically, all the developers using the Designer save their templates in this single directory.
Copyright IBM Corp. 2006, 2008
225
After installation, no dialog is available for changing the template directory. You can, however change the registry entry for the template directory. The default registry value is:
[HKLM/SOFTWARE/Ascential Software/DataStage Client/currentVersion/ Intelligent Assistant/Templates]
226
screen also allows you to specify the table name or file name for your source data (as appropriate for the type of data source). 5. Click Next to go to the next screen. This allows you to specify details about the target where your job will write the extracted data. 6. Select one of these stages to receive your data: Data Set, DB2, InformixXPS, Oracle, Sequential File, or Teradata. Enter additional information when prompted by the dialog. 7. Click Next. The screen that appears shows the table definition that will be used to write the data (this is the same as the one used to extract the data). This screen also allows you to specify the table name or file name for your data target (as appropriate for the type of data target). 8. Click Next. The next screen invites you to supply details about the job that will be created. You must specify a job name and optionally specify a job folder. The job name should follow WebSphere DataStage naming restrictions (that is, begin with a letter and consist of alphanumeric characters). 9. Select Create Job to trigger job generation. A screen displays the progress of the job generation. Using the information you entered, the WebSphere DataStage generation process gathers metadata, creates a new job, and adds the created job to the current project 10. When the job generation is complete, click Finish to exit the dialog. All jobs consist of one source stage, one transformer stage, and one target stage. In order to support password maintenance, all passwords in your generated jobs are parameterized and are prompted for at run time.
227
Segment 1
Segment 1
The descriptor file for a data set contains the following information: v Data set header information. v Creation time and date of the data set. v The schema of the data set. v A copy of the configuration file use when the data set was created. For each segment, the descriptor file contains: v The time and date the segment was added to the data set. v A flag marking the segment as valid or invalid. v Statistical information such as number of records in the segment and number of bytes. v Path names of all data files, on all processing nodes. This information can be accessed through the Data Set Manager.
Partitions
The partition grid shows the partitions the data set contains and describes their properties. v #. The partition number. v Node. The processing node that the partition is currently assigned to. v Records. The number of records the partition contains.
228
v Blocks. The number of blocks the partition contains. v Bytes. The number of bytes the partition contains.
Segments
Click on an individual partition to display the associated segment details. Segment details contain the following information: v #. The segment number. v Created. Date and time of creation. v Bytes. The number of bytes in the segment. v Pathname. The name and path of the file containing the segment in the selected partition. Click the Refresh button to reread and refresh all the displayed information. Click the Output button to view a text version of the information displayed in the Data Set Viewer. You can open a different data set from the viewer by clicking the Open icon on the tool bar. The browse dialog open box opens again and lets you browse for a data set.
229
Note: You cannot use the UNIX cp command to copy a data set because WebSphere DataStage represents a single data set with multiple files.
230
231
handler from the drop-down list. The settings for whichever handler you have chosen to edit appear in the grid. (Note that the option to edit the project level handler is only available if such a handler has been set up in the Administrator client.) 2. Edit the grid as required, you can: v Choose a new Action for a particular message. Select a new Action from the drop-down list. Possible Actions are: Suppress from log. The message is not written to the jobs log as it runs. Promote to Warning. Promote an informational message to a warning message. Demote to Informational. Demote a warning message to become an informational one. v Delete a message. Select the message in the grid, right-click and choose Delete Row from the shortcut menu. v Add a new message. Right-click in the grid and choose Insert Row from the shortcut menu. You can then type in details of the message you want to add to the handler. When you are done with your edits, click Save and choose Save Message Handler to save the handler with its current name or Save Message Handler As to save it as a new handler.
To delete a handler:
1. Choose an option to specify whether you want to delete the local runtime handler for the currently selected job, delete the project-level message handler, or delete a specific message handler. If you want to delete a specific message handler, select the handler from the drop-down list. The settings for whichever handler you have chosen to edit appear in the grid. 2. Click the Delete button.
232
1 1 2
1 2 3
The open file limit is 100; raising to 1024... APT configuration file... Attempt to Cleanup after ABORT raised in stage...
Each line in the file represents message rule, and comprises four tab-separated fields: v Message ID . Case-specific string uniquely identifying the message v Type. 1 for Info, 2 for Warn v Action . 1 = Suppress, 2 = Promote, 3 = Demote v Message . Example text of the message
JCL templates
WebSphere DataStage uses JCL templates to build the required JCL files when you generate a mainframe job. WebSphere DataStage comes with a set of building-block JCL templates suitable for a variety of tasks. The supplied templates are in a directory called JCL Templates under the WebSphere DataStage server install directory. There are also copies of the templates held in the WebSphere DataStage Repository for each WebSphere DataStage project. You can edit the templates to meet the requirements of your particular project. This is done using the JCL Templates manager from the Designer. Open the JCL Templates dialog box by choosing Tools JCL Templates. The JCL Templates dialog box contains the following fields and buttons: v Platform type. Displays the installed platform types in a drop-down list. v Template name. Displays the available JCL templates for the chosen platform in a drop-down list. v Short description. Briefly describes the selected template. v Template. The code that the selected template contains. v Save. This button is enabled if you edit the code, or subsequently reset a modified template to the default code. Click Save to save your changes. v Reset. Resets the template code back to that of the default template. If there are system wide changes that will apply to every project, then it is possible to edit the template defaults. Changes made here will be picked up by every WebSphere DataStage project on that WebSphere DataStage server. The JCL Templates directory contains two sets of template files: a default set that you can edit, and a master set which is read-only. You can always revert to the master templates if required, by copying the read-only masters over the default templates. Use a standard editing tool, such as Microsoft Notepad, to edit the default templates.
233
234
Repository tree
You can use the Repository tree to browse and manage objects in the repository. The Designer enables you to store the following types of object in the repository (listed in alphabetical order): v Data connections v Data elements (see Data Elements) v IMS Database (DBD) (see IMS Databases and IMS Viewsets) v IMS Viewset (PSB/PCB) (see IMS Databases and IMS Viewsets) v Jobs (see Developing a Job) v Machine profiles (see Machine Profiles) Match specifications (see IBM WebSphere QualityStage User Guide) Parameter sets (see Parameter Sets) Routines (see Parallel Routines,Server Routines , and Mainframe Routines) Shared containers (see Shared Containers) Stage types (see reference guides for each job types and Custom Stages for Parallel Jobs) v Table definitions (see Table Definitions) v Transforms v v v v v When you first open WebSphere DataStage you will see the repository tree organized with preconfigured folders for these object types, but you can arrange your folders as you like and rename the preconfigured ones if required. You can store any type of object within any folder in the repository tree. You can also add your own folders to the tree. This allows you, for example, to keep all objects relating to a particular job in one folder. You can also nest folders one within another. Note: Certain built-in objects cannot be moved in the repository tree. Click the blue bar at the side of the repository tree to open the detailed view of the repository. The detailed view gives more details about the objects in the repository tree and you can configure it in the same way that you can configure a Windows Explorer view. Click the blue bar again to close the detailed view. You can view a sketch of job designs in the repository tree by hovering the mouse over the job icon in the tree. A tooltip opens displaying a thumbnail of the job design. (You can disable this option in the Designer Options.) When you open an object in the repository tree, you have a lock on that object and no other users can change it while you have that lock (although they can look at read-only copies).
235
236
5. If the object requires more information from you, another dialog box will appear to collect this information. The type of dialog box depends on the type of object (see individual object descriptions for details). 6. When you have supplied the required details, click OK. The Designer asks you where you want to store the object in the repository tree.
237
238
Product documentation
Documentation is provided in a variety of locations and formats, including in help that is opened directly from the product interface, in a suite-wide information center, and in PDF file books. The information center is installed as a common service with IBM Information Server. The information center contains help for most of the product interfaces, as well as complete documentation for all product modules in the suite. A subset of the product documentation is also available online from the product documentation library at publib.boulder.ibm.com/infocenter/iisinfsv/v8r1/ index.jsp. PDF file books are available through the IBM Information Server software installer and the distribution media. A subset of the information center is also available online and periodically refreshed at www.ibm.com/support/docview.wss?rs=14 &uid=swg27008803. You can also order IBM publications in hardcopy format online or through your local IBM representative. To order publications online, go to the IBM Publications Center at www.ibm.com/shop/publications/order. You can send your comments about documentation in the following ways: v Online reader comment form: www.ibm.com/software/data/rcf/ v E-mail: comments@us.ibm.com
Contacting IBM
You can contact IBM for customer support, software services, product information, and general information. You can also provide feedback on products and documentation.
Customer support
For customer support for IBM products and for product download information, go to the support and downloads site at www.ibm.com/support/us/. You can open a support request by going to the software support service request site at www.ibm.com/software/support/probsub.html.
My IBM
You can manage links to IBM Web sites and information that meet your specific technical support needs by creating an account on the My IBM site at www.ibm.com/account/us/.
239
Software services
For information about software, IT, and business consulting services, go to the solutions site at www.ibm.com/businesssolutions/us/en.
General information
To find general information about IBM, go to www.ibm.com.
Product feedback
You can provide general product feedback through the Consumability Survey at www.ibm.com/software/data/info/consumability-survey.
Documentation feedback
You can click the feedback link in any topic in the information center to comment on the information center. You can also send your comments about PDF file books, the information center, or any other documentation in the following ways: v Online reader comment form: www.ibm.com/software/data/rcf/ v E-mail: comments@us.ibm.com
240
If an optional item appears above the main path, that item has no effect on the execution of the syntax element and is used only for readability.
optional_item required_item
v If you can choose from two or more items, they appear vertically, in a stack. If you must choose one of the items, one item of the stack appears on the main path.
required_item required_choice1 required_choice2
If choosing one of the items is optional, the entire stack appears below the main path.
required_item optional_choice1 optional_choice2
If one of the items is the default, it appears above the main path, and the remaining choices are shown below.
default_choice required_item optional_choice1 optional_choice2
v An arrow returning to the left, above the main line, indicates an item that can be repeated.
241
required_item
repeatable_item
If the repeat arrow contains a comma, you must separate repeated items with a comma.
, required_item repeatable_item
A repeat arrow above a stack indicates that you can repeat the items in the stack. v Sometimes a diagram must be split into fragments. The syntax fragment is shown separately from the main syntax diagram, but the contents of the fragment should be read as if they are on the main path of the diagram.
required_item fragment-name
Fragment-name:
required_item optional_item
v Keywords, and their minimum abbreviations if applicable, appear in uppercase. They must be spelled exactly as shown. v Variables appear in all lowercase italic letters (for example, column-name). They represent user-supplied names or values. v Separate keywords and parameters by at least one space if no intervening punctuation is shown in the diagram. v Enter punctuation marks, parentheses, arithmetic operators, and other symbols, exactly as shown in the diagram. v Footnotes are shown by a number in parentheses, for example (1).
242
Product accessibility
You can get information about the accessibility status of IBM products. The IBM Information Server product modules and user interfaces are not fully accessible. The installation program installs the following product modules and components: v IBM Information Server Business Glossary Anywhere v IBM Information Server FastTrack v IBM Metadata Workbench v IBM WebSphere Business Glossary v IBM WebSphere DataStage and QualityStage v IBM WebSphere Information Analyzer v IBM WebSphere Information Services Director For more information about a products accessibility status, go to http://www.ibm.com/able/product_accessibility/index.html.
Accessible documentation
Accessible documentation for IBM Information Server products is provided in an information center. The information center presents the documentation in XHTML 1.0 format, which is viewable in most Web browsers. XHTML allows you to set display preferences in your browser. It also allows you to use screen readers and other assistive technologies to access the documentation.
243
244
Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the users responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. For license inquiries regarding double-byte character set (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to: IBM World Trade Asia Corporation Licensing 2-31 Roppongi 3-chome, Minato-ku Tokyo 106-0032, Japan The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.
Copyright IBM Corp. 2006, 2008
245
Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact: IBM Corporation J46A/G4 555 Bailey Avenue San Jose, CA 95141-1003 U.S.A. Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee. The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equivalent agreement between us. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. All statements regarding IBMs future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. This information is for planning purposes only. The information herein is subject to change before the products described become available. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
246
Each copy or any portion of these sample programs or any derivative work, must include a copyright notice as follows: (your company name) (year). Portions of this code are derived from IBM Corp. Sample Programs. Copyright IBM Corp. _enter the year or years_. All rights reserved. If you are viewing this information softcopy, the photographs and color illustrations may not appear.
Trademarks
IBM trademarks and certain non-IBM trademarks are marked on their first occurrence in this information with the appropriate symbol. IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol ( or ), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at Copyright and trademark information at www.ibm.com/legal/ copytrade.shtml. The following terms are trademarks or registered trademarks of other companies: Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. UNIX is a registered trademark of The Open Group in the United States and other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom.
Notices
247
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. The United States Postal Service owns the following trademarks: CASS, CASS Certified, DPV, LACSLink, ZIP, ZIP + 4, ZIP Code, Post Office, Postal Service, USPS and United States Postal Service. IBM Corporation is a non-exclusive DPV and LACSLink licensee of the United States Postal Service. Other company, product, or service names may be trademarks or service marks of others.
248
Index A
accessibility 239 ActiveX (OLE) functions 137 importing 179 programming functions 122, 124 adding Sequential File stage 8 stages 29 Transformer stage 8 advanced find 159, 160, 161, 173 after-job subroutines 142, 145 assigning data elements 131 assistants 40 connectors 56, 57, 58 containers 23 editing 96 viewing 96 copying BASIC routine definitions 126 copying data elements 132 copying mainframe routine definitions 136 copying Transform definitions 129 creating data elements 130 stored procedure definitions 79 table definitions 61 currency formats 147 customer support 239 customizing COBOL code 196
E
editing BASIC routine definitions 126 column definitions 32, 76, 81 using the Edit button 32 containers 96 data elements 131 stages 32 stored procedure definitions 81 table definitions 76 email notification activity 201 end loop activity 201 entering code in BASIC routines 124 entering column definitions 62 example advanced find 161 quick find 159 ExecCommand ativity 201 exercise setting up 4 Expression Editor 86 external ActiveX (OLE) functions 137
B
BASIC routines copying 126 editing 126 entering code 124 name 123 renaming 137 saving code 125 testing 126 type 123 viewing 126 before-job subroutines 141, 145 browsing server directories 36 built-in data elements 132
D
data browser 11 Data Browser 55 using 37 data elements assigning 131 built-in 132 copying 132 creating 130 defining 130 editing 131 viewing 131 data files adding 61 data lineage 169 highlighting 168 data lineage highlighting disabling 169 Data Migration Assistant 226 databases adding 60 date formats 147 defining data elements 130 locales 143, 146 maps 143, 146 deleting column definitions 33, 76, 81 links 31 stages 30 dependencies of query 164, 169, 171 Designer client 159 starting 4 developing jobs 21, 29, 235 diff operation 153, 155, 156 diffapicmdline command report 156 Director client 16 documentation accessible 239
F
find 159 Find dialog box 56 force compile 193
C
character set maps 146 character set maps, specifying 143, 146 code customization 196 column definitions 168, 169 column name 50 data element 50 defining for a stage 32 deleting 33, 76, 81 editing 32, 76, 81 using the Edit button 32 entering 62 inserting 32 key fields 50 length 50 loading 34, 75 name alias 50 running a where used query 167 scale factor 50 searching for 164 Columns grid 32, 50 common repository 235 comparing objects 153, 155 in different projects 155 using the command line 156 compiling code in BASIC routines 125 compiling job 16 configuration file editor 230 Copyright IBM Corp. 2006, 2008
G
generating code customizing code 196 generating job reports 189
H
highlighting data lineage host systems adding 60 168
I
IBM support 239 impact analysis 164, 166, 167, 169 examples 170, 171 impact analysis results 170 import selection 181 importing external ActiveX (OLE) functions metadata using bridges 180 stored procedure definitions 77 table definitions 54 input parameters, specifying 80 inserting column definitions 32 intelligent assistants 225
179
249
J
JCL templates 196 job compiling 16 configuring 11 creating 7 parallel 1 running 16 saving 7 job activity 201 job control routines 223 job log 16 job parameters 86 job properties saving 149 job reports 189 generating 189 stylesheets 191 jobs defining locales 143, 146 defining maps 143, 146 dependencies, specifying 186 developing 21, 29, 235 opening 22 version number 141, 144, 148
N
naming 76 column definitions 76 data elements 131 job sequences 201 links 29 mainframe routines 134 parallel job routines 105 parallel stage types 107 server job routines 123 shared container categories 98 shared containers 98 nested condition activity 201 New Job from Template 226 New Template from Job 225 NLS 146 NLS page of the Table Definition dialog box normalization 38 null values 50, 78 number formats 147
renaming (continued) stage 8 stages 30 report 173 Reporting Console 173 repository 4 Repository Advanced Find window 170 routine activity 201 Routine dialog box 107 Dependencies page 123 routine name 123 routines parallel jobs 105 Run-activity-on-exception activity 201
S
sample data 1, 4 saving code in BASIC routines 125 saving job properties 149 schemas adding 60 screen readers 239 search advanced find 159, 160, 161 for column definitions 164 for table definitions 163 quick find 159 search criteria 161 search results 170 selecting data to be imported 181 sequencer activity 201 Sequential File stage 11 server directories, browsing 36 setting up exercise 4 shared metadata 56, 163 Shared Metadata Management tool 59 shared repository 56, 57, 58, 163 software services 239 sort order 143, 147 specifying input parameters for stored procedures 80 job dependencies 186 SQL data precision 50, 78 data scale factor 50, 78 data type 50, 78 display characters 50, 78 stage adding 8 stage editor 11 stages 23 adding 29 column definitions for 32 deleting 30 editing 32 moving 30 renaming 30 specifying 29 start loop activity 201 stored procedure definitions 38, 56 creating 79 editing 81 importing 77 manually defining 79
52
K
key field 50, 78
O
53 objects creating new 236 opening a job 22 operational metadata 40
L
Layout page of the Table Definition dialog box legal notices 245 link adding 8 linking stages 30 links deleting 31 moving 31 multiple 31 loading column definitions 34, 75 locales and jobs 146 specifying 143, 146
P
palette 4 parallel engine configuration file 230 parallel job routines 105 parameter definitions data element 79 key fields 78 length 78 scale factor 78 parameter sets 83 Parameters grid 78 parameters sets value file 90 performance monitor 38 pre-configured stages 37, 97, 102 product accessibility accessibility 243
M
mainframe routines copying 136 manually entering stored procedure definitions table definitions 61 message handlers 40, 144 metadata 56 importing 6, 57 managing 59, 60, 61 sharing 56 synchronizing 59 tables 59 wizard 58 monetary formats 147 moving links 31 79
Q
quick find 159
R
reference links 25, 27 renaming BASIC routines 137 link 8
250
stored procedure definitions (continued) result set 79 viewing 81 stored procedures 77 stream link 25, 27 stylesheets for job reports 191 support, customer 239
T
table definition 6, 57, 58 Table Definition dialog box 49 for stored procedures 78 Format page 51 Layout page 53 NLS page 52 Parameters page 78 table definitions 33, 58, 59 creating 61 editing 76 importing 54 manually entering 61 viewing 76 tables 57 terminator activity 201 testing BASIC routines 126 time formats 147 trademarks 247 Transformer stage 11 Transforms copying 129 editing transforms 128
U
user variable activity 201 using Data Browser 37 job parameters 86
V
value file 90 version number for a job 141, 144, 148 viewing BASIC routine definitions 126 containers 96 data elements 131 stored procedure definitions 81 table definitions 76
W
wait-for-file activity 201 where used query 164, 166, 167, 170
X
XML documents 175 XSLT stylesheets for job reports 191
Index
251
252
Printed in USA
LC18-9893-01
Spine information:
Version 8 Release 1