Abinitio Components
Abinitio Components
Abinitio Components
Null –it writes the records to reject port and writes descriptive error message to error port.
2)dedup sort
Dedup sorted:
It separates one specified record in each group of records from the rest of the records in
the group.
Parameters: Key, Select, Keep-First, Last, Unique -Specifies which record the component keeps.
Run time Behavior: Reads a grouped flow of records from the in port.
If u supplied an expression for the select parameter, Dedup sorted applies the expression to the
records and processed according to the select expression. If u doesn’t supply any expression for
select it processes all records on the in port.
Consider any consecutive records with the same key value as the same group. If group
consists of one record, writes that record only.
If u chooses unique only that does not write any groups consisting of more than one record.
Both out and dup ports are optional. If u doesn’t connect any flows to them it discards
those records.
3)reformat:
Reformat:
Reformat changes the format of records by dropping fields, or by using DML expressions to
add fields, combine fields, or transform the data in the records.
Parameters:
Count-Sets the number of out ports, reject ports, error ports, transform parameters
Default is 1.
Transform functions for REFORMAT should have one input and one output.
When you specify a value for this parameter, each input record goes to exactly one transform-
output port pair.
For example, if the component has two output ports and there are no rejects, 100 input
records results in 100 output records on each port for a total of 200 output records.
The expected output of the transform function is a vector of numeric values. The component
considers each element of this vector as an index into the output transforms and ports. The
component directs the input record to the identified output ports and executes the
transform functions, if any, associated with those ports.
If you specify a value for the output-index parameter, you cannot also specify the
output indexes parameter.
If you specify an expression for the select parameter, the expression filters the records on the in
port:
If the expression evaluates to 0 for a particular record, REFORMAT does not process the
record, which means that the record does not appear on any output port.
If the expression produces NULL for any record, REFORMAT writes a descriptive error
message and stops execution of the graph.
If the expression evaluates to anything other than 0 or NULL for a particular record,
REFORMAT processes the record.
If you do not connect flows to the reject or error ports, REFORMAT discards the information.
4)rollup:
Rollup:
Rollup evaluates a group of input records that have the same key, and then generates
records that either summarize each group or select certain information from each group.
Parameters:
When set to in memory: Input need not be sorted, the component accepts ungrouped
input, and requires the use of the max-core parameter.
When set to Input must be sorted or grouped, the component requires grouped input, and
the max-core parameter is not available.
o Use key change function — The component uses the key change transform function.
Name(s) of the key field(s) the component can use to group or define groups of records. If
the value of the key-method parameter is Use key specifier, you must specify a value for the
key parameter.
If you specify Use key change function for the key-method parameter, the key parameter is not
available.
Either the names of the file containing the types and transform functions, or a transform string.
If the total size of the intermediate results the component holds in memory exceeds the
number of bytes specified in the max-core parameter, the component writes temporary files
to disk.
purchase by each customer. You could use the sum aggregation function to determine the
total amount spent by each customer
Expanded mode: Now suppose that for each customer, you want to determine the price of the
largest single purchase and the item that was purchased. In this situation, aggregation functions
cannot compute the result you want. Consequently, you would write a transform that examines the
input records for each group and selects the appropriate fields from one of the records.
5)scan: For every input record, Scan generates an output record that includes a running
cumulative summary for the group the input record belongs to. For example, the
output records might include successive year-to-date totals for groups of records.
19) Normalize :
NORMALIZE generates multiple output records from each of its input records. You can directly
specify the number of output records for each input record, or you can make the number of
output records dependent on a calculation.
Normalize converts an array of size n to n number of records. At the input of normalize it will be
a single record for each student. The output of normalize will have 4 records of the same
student but with different subjects.
20) Denormalize :
There are a few situations when you definitely should think of denormalization:
Maintaining history: Data can change during time, and we need to store values that were valid
when a record was created. ...
Improving query performance: Some of the queries may use multiple tables to access data that
we frequently need.
lookup :
---From a lookup file, returns the first record that matches a specified expression
A lookup is a component of abinitio graph where we can store data and retrieve it by using a
key parameter.
A lookup file is the physical file where the data for the lookup is stored.
Parameters : key
Fuse :
FUSE combines multiple input flows (perhaps with different record formats) into a single output
flow. It examines one record from each input flow simultaneously, acting on the records
according to the transform function you specify. For example, you can compare records,
selecting one record or another based on specified criteria, or “fuse” them into a single record
that contains data from all the input records.
Parameters : count,transform
SORT WITHIN GROUPS refines the sorting of records already sorted according to one key
specifier: it sorts the records within the groups formed by the first sort according to a second
key specifier.
---major-key : Specifies the name(s) of key field(s) and sequence specifier(s), according to which
the component assumes the input has already been ordered.
---minor-key : Name(s) of the key field(s) and the sequence specifier(s) you want the
component to use when it refines the sorting of records.
When the component reaches the number of bytes specified in the max-core parameter, it
sorts the records it has read for the group and writes a temporary file to disk. Once all the data
is sorted, it merges the temporary files and sends the records to the out port.
---allow-unsorted
Set to True to allow input not sorted according to the major-key parameter.
When you set allow-unsorted to True, the boundary between groups of records occurs when
any change occurs in the value(s) of the field(s) specified in the major-key parameter.
Default is False.
Partition components :
1) Partition by Key : partition by key distributes records to its output flow partitions according
to key values.
Parameters :
key : Specifies the name(s) of the key field(s) that you want the component to use when it
distributes records among flow partitions.
3) Partition by Expression:
Partition By Expression distributes records to its output flow partitions according to a specified
DML expression or transform function.
Parameters :
Function parameter : we can use expressions to partition data
Departition Components :
13) GATHER :
Runtime behaviour :
1.Reads records from the flows connected to the in port.
2.Combines the records in an arbitrary order.
3.Writes the combined records to the out port.
parameters : None
14) MERGE :
MERGE combines records from multiple flows or flow partitions that have been sorted
according to the same key specifier, and maintains the sort order.
MERGE requires sorted input data, but never sorts data itself.
check-sort: Prevents the graph from running to completion if the component detects unsorted
input data.
•True (default) — The component stops the graph with an error on the first input record that is
out of sorted order, according to the value of the key parameter. In almost all cases, this default
value is appropriate.
•False — The component does not stop or issue an error when it encounters unsorted inputs.
Since MERGE does not sort data itself, do not expect that unsorted input data will result in
output data that is sorted or grouped. This setting is rarely used.
15) CONCATENATE :
CONCATENATE appends multiple flow partitions of records one after another. The in port for
CONCATENATE is ordered. For more information
1.Reads all records from the first flow connected to the in port (counting from top to bottom on
the graph) and copies them to the out port.
2.Reads all records from the next flow connected to the in port and appends them to the
records from the previously processed flow.
3.Repeats Step 2 for each subsequent flow connected to the in port.
16) REPLICATE :
REPLICATE arbitrarily combines all records it receives into a single flow and writes a copy of that
flow to each of its output flows. Use REPLICATE to support component parallelism
1.Arbitrarily combines the records from all the flows on the in port into a single flow.
2.Copies that flow to all the flows connected to the out port.
REPLICATE does not support implicit reformat, so you cannot use it to change the record format
associated with a particular flow. For that reason, you must make the record format of the in
and out ports identical. If you do not, execution of the graph stops when it reaches REPLICATE.
17) BROADCAST :
BROADCAST combines in an arbitrary order all records it receives into a single flow and writes a
copy of that flow to each of its output flow partitions.
Use BROADCAST to increase data parallelism when you have connected a single fan-out flow to
the out port, or to increase component parallelism when you have connected multiple straight
flows to the out port.
3.Copies all the records to all the flow partitions connected to the out port.
BROADCAST is used to increase data parallelism by feeding records to fan-out or all-to-all flows.
REPLICATE is generally used to increase component parallelism, emitting multiple straight flows
to separate pipelines.
Specifically, the difference between them lies in how their flows are set up and how their
layouts are propagated in the GDE:
REPLICATE allows multiple outputs for a given layout and propagates the layout from the input
to the output.
BROADCAST is a partitioning component that defines the transition from one layout to another.
19) Normalize :
NORMALIZE generates multiple output records from each of its input records. You can directly
specify the number of output records for each input record, or you can make the number of
output records dependent on a calculation.
Normalize converts an array of size n to n number of records. At the input of normalize it will be
a single record for each student. The output of normalize will have 4 records of the same
student but with different subjects.
20) Denormalize :
There are a few situations when you definitely should think of denormalization:
Maintaining history: Data can change during time, and we need to store values that were valid
when a record was created. ...
Improving query performance: Some of the queries may use multiple tables to access data that
we frequently need.