Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Ab Initio - Intro

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 43
At a glance
Powered by AI
The document discusses Ab Initio training sessions which cover concepts like components, datasets, graphs and file types.

Ab Initio is a data integration tool that uses components, datasets and graphs to transform and move data.

The different types of components in Ab Initio include dataset components, transform components, partition components and more that are used to operate on data records.

Introduction to

Ab Initio

Prepared By : Ashok Chanda

Accenture Ab Initio Training 1


Ab inito Session 8

 Dataset Components Overview


 filter By Expression
 Sort
 Dedup Sort
 Sort within group

Accenture Ab Initio Training 2


Ab Initio Product Architecture

User Applications

Development Environments
Ab Initio
GDE Shell

Component User-defined 3rd Party EME


Library Components Components

The Ab Initio Co>Operating® System

Native Operating System (Unix, Windows, OS/390)

Accenture Ab Initio Training 3


Graphical Development
Environment GDE

Accenture Ab Initio Training 4


The Graph Model

Accenture Ab Initio Training 5


The Graph Model: Naming the
Pieces
Components
Datasets
Dataset

Flows
Accenture Ab Initio Training 6
Components
 Components may run on any computer running
the Co>Operating System.
 Different components do different jobs.
 The particular work a component accomplishes
depends upon its parameter settings.
 Some parameters are data transformations, that
is business rules to be applied to an input(s) to
produce a required output.

Accenture Ab Initio Training 7


Categories of Components
 Compress Components
 Continuous Components
 Database Components
 Dataset Components
 Departition Components
 FTP Components
 Miscellaneous Components
 Partition Components
 Sort Components
 Transform Components
 Translate Components
 Validate Components

Accenture Ab Initio Training 8


COMPONENTS

 Ab Initio components represent data sets


and programs that operate on data
records in specified ways. In the Graphical
Development Environment (GDE),
components look as follows:

Accenture Ab Initio Training 9


Datasets
 A dataset is a source or destination of data. It
can be a simple file, a database table, a SAS
dataset, ...
 Datasets may reside on any machine running
the Co>Operating System.
 Datasets may reside on other machines if
connected by FTP or database middleware.
 Data is always described by record format
metadata (termed “dml”).

Accenture Ab Initio Training 10


DATASETS COMPONENTS
Dataset components represent data records or
act upon data records as follows:
 Input File
 Input Table
 Intermediate File
 Lookup File
 Output File Output Table Read Multiple
Files
 Write Multiple Files
 Output File
Accenture Ab Initio Training 11
Accenture Ab Initio Training 12
Locating Files with URLs
 Ab Initio software uses Universal Resource
Locator(URLs) to locate files.You enter URLs for
datasets,record formats,input and output files,and
so on in component’s properties dialog.Enter files
and multifiles on the description tab,transforms on
the parameters tab,and the DML record formats on
the ports tab.The Ab Initio URL fomat is:
 [file|mfile]://hostname/directory1/directory2…/filen
ame

Accenture Ab Initio Training 13


More on URLs
Argument And Description
 File:Specifies a serial file.

 Mfile:Specifies a multifile.

 Hostname:Specifies the name of the


computer containing the file you want.
 Directory1…:Specifies the directory path of
the file.
 Filename:Specifies the filename.

Accenture Ab Initio Training 14


Examples on URLs
 This file specifies a file named input.dat,located
in the tmp directory on the computer named
revkalt.abinito.com:
file://revkalt.abinito.com/tmp/input.dat
 This example specifies a multifile named
customer.dat,located in the tmp/mfs
subdirectory on a computer named mycomputer:
mfile://mycomputer.abinito.com/tmp/mfs/customer
.dat

Accenture Ab Initio Training 15


What is a Record Format

record
decimal(6) cust_id;
string (18) last_name; Name of
string (16) first_name; the Field
Data Type
string (26) street_addr;
string (2) state;
decimal (5) zip;
Length string (1) gender;
decimal (7) income;
newline (1) string;
end

In what format will the source data be read from the source data set or
written to a target data set

Accenture Ab Initio Training 16


About Record Formats :
 A record format is a description of data.
 For example, you might have a database of employees where each
record contains four fields: Six characters for the employee's first
name, followed by ten characters for the employee's last name,
followed by three characters for the employee's age, and six
characters for the employee's date of hire.
 One employee's record might look like this (where each square
represents one character, or byte in the record):

You can enter or edit a record format using the Record Format
Editor.

Accenture Ab Initio Training 17


Text Record Format
Representation:

record
decimal(4) id;
string(6) first_name;
string(6) last_name;
date("YYYY-DD-MM") newfield;
end;

Accenture Ab Initio Training 18


Specifying the Record Format
of a Port
Record Format Editor

Accenture Ab Initio Training 19


Specifying the Record Format
of a Port
 You can assign a record format to a dataset component or program
component by viewing the component's properties dialog, and specifying
the record format on the Ports tab.

Accenture Ab Initio Training 20


Specifying the Record Format
of a Port
 On this tab, you specify the record format of a component port
using one of the following:
 A record type specifier.
 A reference to a file containing a collection of type specifiers.
Using a type specifier other than record. Although this is not
commonly done, it is perfectly legal. For example, the following type
specifier indicates that the record format is simply a five-character
string: string (5)
 Record formats are usually comprised of multiple fields (called
columns in a database table). You define a field by using a keyword
that represents a DML base or compound type, followed by
additional information that the DML type needs (such as the size of
the field), and/or by optional information.

Accenture Ab Initio Training 21


Introduction to DML

 DML is an acronym for Data Manipulation


Language. It is the Ab Initio programming
language you can use to define record
formats, expressions, transform functions,
and key specifiers. Components in the Ab
Initio Co>Operating System use DML to
describe, interpret, and manipulate data.

Accenture Ab Initio Training 22


About Records
 In general, a record is one complete entry in a
file or in a database table. A record about a
customer might contain individual fields for
account number, account type, name, address,
and telephone number.
 In Ab Initio products, a record is a DML object
that contains a sequence of named fields (called
columns in a database table), each of which can
be a different DML base or compound type. Most
record types are fairly simple, containing only
data fields.

Accenture Ab Initio Training 23


Input File
 It reads the data records from a serial file or
multifile in the file system.
 Input file can provide data to multiple
components. Label should be given
appropriately to identify file uniquely. If
same label is given than system append with
count i.e 1, 2 to make it unique.
 Data location specified the file location.Data
location can be specified using absolute path or
using paramter as shown below diagram.
Accenture Ab Initio Training 24
Output File

 It writes the data records to a serial file or


a multifile in the file system.
 Output file does not provide data to other
components in the graph.

Accenture Ab Initio Training 25


Input Table

 It unloads data records from a database


into an AbInitio graph, allowing you to
specify as the source either a database
table, or an SQL statement that selects
data records from one or more tables.

Accenture Ab Initio Training 26


DBC File
 DBC file is a database configuration file is
required for AbInitio while connecting to any
Database system. By default it comes with
extension .dbc. Generally it contains different
types information to get access to database like
dbms, db_version, db_home, db_name,
db_node, user, password etc. The parameter
setting totally depends to which database
AbInitio is supposed to connect.

Accenture Ab Initio Training 27


Input Table
You can configure input table to use as its source either of
the following:
•An explicitly specified database table.
•An SQL SELECT statement that selects data records from
one or more tables in a database.

Accenture Ab Initio Training 28


Input Table

 After you select the source, the


Co>Operating system generates a record
format that matches the columns of the
table or the SELECT statement. The record
format is a DML type that contains
character-delimited fields by default.

Accenture Ab Initio Training 29


Output Table

 It loads data records from a graph into a


database, letting you specify the records
destination either directly as a single
database table, or through an SQL
statement that inserts records into one or
more tables.

Accenture Ab Initio Training 30


Output Table

 You can configure output table to use as its


destination either of the following:
•An explicitly specified database table.
•An SQL statement that inserts data records into
one or more tables in a database

Accenture Ab Initio Training 31


Output Table

Accenture Ab Initio Training 32


Output Table

 It loads data records from a graph into a


database, letting you specify the records
destination either directly as a single
database table, or through an SQL
statement that inserts records into one or
more tables.

Accenture Ab Initio Training 33


Lookup File
 Lookup File represents one or multiple serial
files or a multifile of data records small enough
to be held in main memory, letting a transform
function retrieve records much more quickly
than it could retrieve them if they were stored
on disk.

 Lookup File associates key values with


corresponding data values to index records and
retrieve them.
Accenture Ab Initio Training 34
Parameters for Lookup File

key : (key specifier, required)


 Name(s) of the key field(s) against which
Lookup File matches its arguments.
RecordFormat : (record format, required)
 The record format you want Lookup File to
use when returning data records.

Accenture Ab Initio Training 35


How to Use Lookup File
 Unlike other dataset components, Lookup File is not connected to
other components in graphs. In other words, it has no ports.
However, its contents are accessible from other components in the
same or later phases.

 You use the Lookup File in other components by calling one of the
following DML functions in any transform function or expression
parameter: lookup, lookup_count, or lookup_next.

 The first argument to these lookup functions is the name of the


Lookup File. The remaining arguments are values to be matched
against the fields named by the key parameter. The lookup
functions return a record that matches the key values and has the
format given by the RecordFormat parameter.

Accenture Ab Initio Training 36


How to Use Lookup File

 A file you want to use as a Lookup File


must fit into memory. If a file is too large
to fit into memory, use Input File followed
by Match Sorted or Join instead.
 Information about Lookup Files is stored in
a catalog, which allows you to share them
with other graphs

Accenture Ab Initio Training 37


Converting an Output File to a
Lookup File
You can convert an Output File generated
in one phase of a graph to a Lookup File
used in a later phase. To do this:
 Create an Output File to contain the data
records you want to use as a Lookup File.
 On the Description tab of the File
Properties dialog box for that Output File,
check Add to Catalog.

Accenture Ab Initio Training 38


Accenture Ab Initio Training 39
Intermediate File

 Intermediate File represents one or


multiple serial files or a multifile of
intermediate results that a graph writes
during execution, and saves for your
review after execution.

Accenture Ab Initio Training 40


Parameters : Intermediate File

The Intermediate File Properties dialog does


not have a Parameters tab. However, you can
specify values for parameters on the
Description, Access, and Ports tabs of the
Intermediate File Properties dialog. This
includes parameters such as intermediate file
location, file handling behavior and permissions,
and the intermediate record format.

Accenture Ab Initio Training 41


Runtime Behavior

 The upstream component writes to


Intermediate File through Intermediate
File's write port. After the flow of data
records into the write port is complete,
the downstream component reads from
Intermediate File's read port. This
guarantees that the writing and reading
processes are in two separate phases
Accenture Ab Initio Training 42
Thank You

End of Session 8

Accenture Ab Initio Training 43

You might also like