Abinitio Components PDF
Abinitio Components PDF
Abinitio Components PDF
Miscellaneous/Deprecated/Transform folder
COMBINE
Purpose
combine processes data in a number of useful ways. You can use combine to:
Restore hierarchies of data flattened by the SPLIT component
Create a single output record by joining multiple input streams
You need to specify the three keys to use when rolling up the vectors: region, states.state, and
states.counties.county.
The resulting command is:
split_dml -i ..# -b ..atm_id -k region,states.state,states.counties.county example2b.dml
The generated DML, to be used on COMBINE’s input port, is:
//////////////////////////////////////////////////////////////
// This file was automatically generated by split_dml
// with the command-line arguments:
// split_dml -i ..# -b ..atm_id -k region,states.state,states.counties.county example2b.dml
//////////////////////////////////////////////////////////////
record
string("|") region // Sort key 1
string("|") state // Sort key 2
string("|") county // Sort key 3
string("|") addr_line1;
string("|") addr_line2;
string("|") atm_id;
string("|") comment;
string("\n") regional_mgr;
string('0')DML_assignments =
'region=region,state=states.state,county=states.counties.county,
addr_line1=states.counties.atms.location.addr_line1,
addr_line2=states.counties.atms.location.addr_line2,
atm_id=states.counties.atms.atm_id,
comment=states.counties.atms.comment,
regional_mgr=regional_mgr';
string('0')DML_key_specifiers() =
'{region}=,{state}=states[],{county}=states.counties[]';
end
Related topics
DEDUP SORTED
Purpose
Dedup Sorted separates one specified record in each group of records from the rest of the
records in the group.
Requirement
Dedup Sorted requires grouped input.
Recommendation
Component folding can enhance the performance of this component. If this feature is enabled,
the Co>Operating System folds this component by default. See “Component folding” for more
information.
Location in the Component Organizer
Transform folder
FILTER BY EXPRESSION
Purpose
Filter by Expression filters records according to a DML expression or transform function, which
specifies the selection criteria.
Filter by Expression is sometimes used to create a subset, or sample, of the data. For example,
you can configure Filter by Expression to select a certain percentage of records, or to select
every third (or fourth, or fifth, and so on) record. Note that if you need a random sample of a
specific size, you should use the sample component.
FILTER BY EXPRESSION supports implicit reformat. For more information, see “Implicit
reformat”.
Recommendation
Component folding can enhance the performance of this component. If this feature is enabled,
the Co>Operating System folds this component by default. See “Component folding” for more
information.
Location in the Component Organizer
Transform folder
FUSE
Purpose
Fuse combines multiple input flows (perhaps with different record formats) into a single output
flow. It examines one record from each input flow simultaneously, acting on the records
according to the transform function you specify. For example, you can compare records,
selecting one record or another based on some criteria, or “fuse” them into a single record that
contains data from all the input records.
Recommendation
Fuse assumes that the records on the input flows always stay synchronized. However, certain
components placed upstream of Fuse, such as Reformat or Filter by Expression, could reject or
divert some records. In that case, you may not be able to guarantee that the flows stay in sync. A
more reliable option is to add a key field to the data; then use Join to match the records by key.
Component folding can enhance the performance of this component. If this feature is enabled,
the Co>Operating System folds this component by default. See “Component folding” for more
information.
JOIN
Purpose
Join reads data from two or more input ports, combines records with matching keys according to
the transform you specify, and sends the transformed records to the output port. Additional ports
allow you to collect rejected and unused records.
Recommendation
Component folding can enhance the performance of this component. If this feature is enabled,
the Co>Operating System folds this component by default. See “Component folding” for more
information.
NOTE: When you have units of work (computepoints, checkpoints, or transactions) that are
large and sorted-input is set to Inputs must be sorted, the order of output records within a key
group may differ between the folded and unfolded versions of the output.
Location in the Component Organizer
Transform folder
Types of joins
Reduced to its basics, Join consists of a match key, a transform function, and a mechanism for
deciding when to call the transform function:
The key is used to match records on incoming flows
The transform function combines matched incoming records to produce new outgoing
records
The mechanism for deciding when to call the transform function consists of the settings of
the parameters join-type, record-requiredn, and dedupn.
Inner joins
The most common case is when join-type is Inner Join. In this case, if each input port contains a
record with the same value for the key fields, the transform function is called and an output
record is produced.
If some of the input flows have more than one record with that key value, the transform function
is called multiple times, once for each possible combination of records, taken one from each
input port.
Whenever a particular key value does not have a matching record on every input port and Inner
Join is specified, the transform function is not called and all incoming records with that key value
are sent to the unusedn ports.
Full outer joins
Another common case is when join-type is Full Outer Join: if each input port has a record with a
matching key value, Join does the same thing it does for an inner join.
If some input ports do not have records with matching key values, Join applies the transform
function anyway, with NULL substituted for the missing records. The missing records are in effect
ignored.
With an outer join, the transform function typically requires additional rules (as compared to an
inner join) to handle the possibility of NULL inputs.
About explicit joins
The final case is when join-type is Explicit. This setting allows you to specify True or False for the
record-requiredn parameter for each inn port. The settings you choose determine when Join calls
the transform function. See record-requiredn.
MATCH SORTED
Purpose
Match Sorted combines multiple flows of records with matching keys and performs transform
operations on them.
NOTE: This component is superseded by either Join (for matching keys) or Fuse (for
transforming multiple records). Both provide more flexible processing options than Match Sorted.
Requirement
Match Sorted requires grouped input.
Location in the Component Organizer
Transform folder
MULTI REFORMAT
Purpose
Multi Reformat changes the format of records flowing from 1 to 20 pairs of in and out ports by
dropping fields or by using DML expressions to add fields, combine fields, or transform data in
the records.
We recommend using MULTI REFORMAT in only a few specific situations. Most often, a regular
REFORMAT component is the correct choice. For example:
If you want to reformat data on multiple flows, you should instead use multiple
REFORMAT components. These are faster because they run in parallel.
If you want to filter incoming data, sending it to various output ports while also reformatting
it (by adding, combining, or transforming fields), try using the output-index and count
parameters on the REFORMAT component.
A recommended use for Multi Reformat is to put it immediately before a custom component that
takes multiple inputs. For more information, see “Using MULTI REFORMAT to avoid deadlock”.
NORMALIZE
Purpose
Normalize generates multiple output records from each of its input records. You can directly
specify the number of output records for each input record, or you can make the number of
output records dependent on a calculation.
In contrast, to consolidate groups of related records into a single record with a vector field for
each group — the inverse of NORMALIZE — you would use the accumulation function of the
ROLLUP component.
Recommendations
Always clean and validate data before normalizing it. Because Normalize uses a multistage
transform, it follows computation rules that may cause unexpected or incorrect results in the
presence of dirty data (NULLs or invalid values). Furthermore, the results will be hard to
trace, particularly if the reject-threshold parameter is set to Never abort. Several factors —
including the data type, the DML expression used to perform the normalization, and the
value of the sorted-input parameter — may affect where the problems occur. It is safest to
avoid normalizing dirty data.
Component folding can enhance the performance of this component. If this feature is
enabled, the Co>Operating System folds this component by default. See “Component
folding” for more information.
REFORMAT
Purpose
Reformat changes the format of records by dropping fields, or by using DML expressions to add
fields, combine fields, or transform the data in the records.
Recommendation
Component folding can enhance the performance of this component. If this feature is enabled,
the Co>Operating System folds this component by default. See “Component folding” for more
information.
Location in the Organizer
Transform folder
ROLLUP
Purpose
Rollup evaluates a group of input records that have the same key, and then generates records
that either summarize each group or select certain information from each group.
Although it lacks a reformat transform function, rollup supports implicit reformat; see “Implicit
reformat”.
Location in the Organizer
Transform folder
Recommendations
For new development, use Rollup rather than AGGREGATE. Rollup provides more control
over record selection, grouping, and aggregation.
The behavior of ROLLUP varies in the presence of dirty data (NULLs or invalid values),
according to which mode you use for the rollup:
SCAN
Purpose
For every input record, Scan generates an output record that consists of a running cumulative
summary for the group to which the input record belongs, up to and including the current record.
For example, the output records might include successive year-to-date totals for groups of
records.
Although it lacks a reformat transform function, scan supports implicit reformat.
Recommendations
If you want one summary record for a group, use ROLLUP.
The behavior of SCAN varies in the presence of dirty data (NULLs or invalid values),
according to which mode you use for the scan:
Component folding can enhance the performance of this component. If this feature
is enabled, the Co>Operating System folds this component by default. See “Component
folding” for more information.
scan function that takes two input arguments (an input record and
a temporary_type record) and returns an updated temporary_type record
finalize function that returns an output record
For more information, see “Transform package for SCAN”.
transforms/scan/scan.mp
Template SCAN with an aggregation function
This example shows how to compute, from input records containing customer_id, dt (date),
and amount, a running total of transactions for each customer in a dataset. The example uses a
template scan function with the sum aggregation function.
Suppose you have the following input records:
customer_id dt amount
C002142 1994.03.23 52.20
C002142 1994.06.22 22.25
C003213 1993.02.12 47.95
C003213 1994.11.05 221.24
C003213 1995.12.11 17.42
C004221 1994.08.15 25.25
C008231 1993.10.22 122.00
1995.12.1
C008231 52.1
0
You want to produce output records with customer_id, dt, and amount_to_date:
amount_to_dat
customer_id dt
e
C002142 1994.03.23 52.20
C002142 1994.06.22 74.45
C003213 1993.02.12 47.95
C003213 1994.11.05 269.19
C003213 1995.12.11 286.61
C004221 1994.08.15 25.25
C008231 1993.10.22 122.00
temp :: initialize(in) =
begin
temp.amount_to_date :: 0;
end;
SPLIT
Purpose
SPLIT processes data in a number of useful ways. You can use SPLIT to:
Flatten hierarchical data
Select a subset of fields from the data
Transform folder
1
View comments
1.
Thank you for taking the time to provide us with your valuable information. We strive to provide our
candidates with excellent care
http://chennaitraining.in/qliksense-training-in-chennai/
http://chennaitraining.in/pentaho-training-in-chennai/
http://chennaitraining.in/machine-learning-training-in-chennai/
http://chennaitraining.in/artificial-intelligence-training-in-chennai/
http://chennaitraining.in/snaplogic-training-in-chennai/
http://chennaitraining.in/snowflake-training-in-chennai/
Reply
AB-INITIO Component
Classic
Flipcard
Magazine
Mosaic
Sidebar
Snapshot
Timeslide
1.
JAN
Miscellaneous/Deprecated/Transform folder
COMBINE
Purpose
combine processes data in a number of useful ways. You can use combine to:
Restore hierarchies of data flattened by the SPLIT component
Create a single output record by joining multiple input streams
You need to specify the three keys to use when rolling up the vectors: region, states.state, and
states.counties.county.
The resulting command is:
split_dml -i ..# -b ..atm_id -k region,states.state,states.counties.county example2b.dml
The generated DML, to be used on COMBINE’s input port, is:
//////////////////////////////////////////////////////////////
// This file was automatically generated by split_dml
// with the command-line arguments:
// split_dml -i ..# -b ..atm_id -k region,states.state,states.counties.county example2b.dml
//////////////////////////////////////////////////////////////
record
string("|") region // Sort key 1
string("|") state // Sort key 2
string("|") county // Sort key 3
string("|") addr_line1;
string("|") addr_line2;
string("|") atm_id;
string("|") comment;
string("\n") regional_mgr;
string('0')DML_assignments =
'region=region,state=states.state,county=states.counties.county,
addr_line1=states.counties.atms.location.addr_line1,
addr_line2=states.counties.atms.location.addr_line2,
atm_id=states.counties.atms.atm_id,
comment=states.counties.atms.comment,
regional_mgr=regional_mgr';
string('0')DML_key_specifiers() =
'{region}=,{state}=states[],{county}=states.counties[]';
end
Related topics
DEDUP SORTED
Purpose
Dedup Sorted separates one specified record in each group of records from the rest of the
records in the group.
Requirement
Dedup Sorted requires grouped input.
Recommendation
Component folding can enhance the performance of this component. If this feature is enabled,
the Co>Operating System folds this component by default. See “Component folding” for more
information.
Location in the Component Organizer
Transform folder
FILTER BY EXPRESSION
Purpose
Filter by Expression filters records according to a DML expression or transform function, which
specifies the selection criteria.
Filter by Expression is sometimes used to create a subset, or sample, of the data. For example,
you can configure Filter by Expression to select a certain percentage of records, or to select
every third (or fourth, or fifth, and so on) record. Note that if you need a random sample of a
specific size, you should use the sample component.
FILTER BY EXPRESSION supports implicit reformat. For more information, see “Implicit
reformat”.
Recommendation
Component folding can enhance the performance of this component. If this feature is enabled,
the Co>Operating System folds this component by default. See “Component folding” for more
information.
Location in the Component Organizer
Transform folder
FUSE
Purpose
Fuse combines multiple input flows (perhaps with different record formats) into a single output
flow. It examines one record from each input flow simultaneously, acting on the records
according to the transform function you specify. For example, you can compare records,
selecting one record or another based on some criteria, or “fuse” them into a single record that
contains data from all the input records.
Recommendation
Fuse assumes that the records on the input flows always stay synchronized. However, certain
components placed upstream of Fuse, such as Reformat or Filter by Expression, could reject or
divert some records. In that case, you may not be able to guarantee that the flows stay in sync. A
more reliable option is to add a key field to the data; then use Join to match the records by key.
Component folding can enhance the performance of this component. If this feature is enabled,
the Co>Operating System folds this component by default. See “Component folding” for more
information.
JOIN
Purpose
Join reads data from two or more input ports, combines records with matching keys according to
the transform you specify, and sends the transformed records to the output port. Additional ports
allow you to collect rejected and unused records.
Recommendation
Component folding can enhance the performance of this component. If this feature is enabled,
the Co>Operating System folds this component by default. See “Component folding” for more
information.
NOTE: When you have units of work (computepoints, checkpoints, or transactions) that are
large and sorted-input is set to Inputs must be sorted, the order of output records within a key
group may differ between the folded and unfolded versions of the output.
Location in the Component Organizer
Transform folder
Types of joins
Reduced to its basics, Join consists of a match key, a transform function, and a mechanism for
deciding when to call the transform function:
The key is used to match records on incoming flows
The transform function combines matched incoming records to produce new outgoing
records
The mechanism for deciding when to call the transform function consists of the settings of
the parameters join-type, record-requiredn, and dedupn.
Inner joins
The most common case is when join-type is Inner Join. In this case, if each input port contains a
record with the same value for the key fields, the transform function is called and an output
record is produced.
If some of the input flows have more than one record with that key value, the transform function
is called multiple times, once for each possible combination of records, taken one from each
input port.
Whenever a particular key value does not have a matching record on every input port and Inner
Join is specified, the transform function is not called and all incoming records with that key value
are sent to the unusedn ports.
Full outer joins
Another common case is when join-type is Full Outer Join: if each input port has a record with a
matching key value, Join does the same thing it does for an inner join.
If some input ports do not have records with matching key values, Join applies the transform
function anyway, with NULL substituted for the missing records. The missing records are in effect
ignored.
With an outer join, the transform function typically requires additional rules (as compared to an
inner join) to handle the possibility of NULL inputs.
About explicit joins
The final case is when join-type is Explicit. This setting allows you to specify True or False for the
record-requiredn parameter for each inn port. The settings you choose determine when Join calls
the transform function. See record-requiredn.
In the cases shown above, suppose you want to narrow the join conditions to a subset of the
shaded (required match) area. To do this, use the DML is_defined function in a rule in the
transform itself. This is the same principle demonstrated in the two-way join shown in “Getting a
joined output record”.
For example, suppose you want to produce an output record when a particular key value either is
present in in0, or is present in both in1 and in2. Only Case 2 has enough shaded area to
represent the necessary conditions. However, Case 2 also represents conditions under which
you do not want Join to produce an output record.
To produce output records only under the appropriate conditions:
1.Set join-type to Full Outer Join as in Case 2 above.
MATCH SORTED
Purpose
Match Sorted combines multiple flows of records with matching keys and performs transform
operations on them.
NOTE: This component is superseded by either Join (for matching keys) or Fuse (for
transforming multiple records). Both provide more flexible processing options than Match Sorted.
Requirement
Match Sorted requires grouped input.
Location in the Component Organizer
Transform folder
MULTI REFORMAT
Purpose
Multi Reformat changes the format of records flowing from 1 to 20 pairs of in and out ports by
dropping fields or by using DML expressions to add fields, combine fields, or transform data in
the records.
We recommend using MULTI REFORMAT in only a few specific situations. Most often, a regular
REFORMAT component is the correct choice. For example:
If you want to reformat data on multiple flows, you should instead use multiple
REFORMAT components. These are faster because they run in parallel.
If you want to filter incoming data, sending it to various output ports while also reformatting
it (by adding, combining, or transforming fields), try using the output-index and count
parameters on the REFORMAT component.
A recommended use for Multi Reformat is to put it immediately before a custom component that
takes multiple inputs. For more information, see “Using MULTI REFORMAT to avoid deadlock”.
NORMALIZE
Purpose
Normalize generates multiple output records from each of its input records. You can directly
specify the number of output records for each input record, or you can make the number of
output records dependent on a calculation.
In contrast, to consolidate groups of related records into a single record with a vector field for
each group — the inverse of NORMALIZE — you would use the accumulation function of the
ROLLUP component.
Recommendations
Always clean and validate data before normalizing it. Because Normalize uses a multistage
transform, it follows computation rules that may cause unexpected or incorrect results in the
presence of dirty data (NULLs or invalid values). Furthermore, the results will be hard to
trace, particularly if the reject-threshold parameter is set to Never abort. Several factors —
including the data type, the DML expression used to perform the normalization, and the
value of the sorted-input parameter — may affect where the problems occur. It is safest to
avoid normalizing dirty data.
Component folding can enhance the performance of this component. If this feature is
enabled, the Co>Operating System folds this component by default. See “Component
folding” for more information.
REFORMAT
Purpose
Reformat changes the format of records by dropping fields, or by using DML expressions to add
fields, combine fields, or transform the data in the records.
Recommendation
Component folding can enhance the performance of this component. If this feature is enabled,
the Co>Operating System folds this component by default. See “Component folding” for more
information.
Location in the Organizer
Transform folder
ROLLUP
Purpose
Rollup evaluates a group of input records that have the same key, and then generates records
that either summarize each group or select certain information from each group.
Although it lacks a reformat transform function, rollup supports implicit reformat; see “Implicit
reformat”.
Location in the Organizer
Transform folder
Recommendations
For new development, use Rollup rather than AGGREGATE. Rollup provides more control
over record selection, grouping, and aggregation.
The behavior of ROLLUP varies in the presence of dirty data (NULLs or invalid values),
according to which mode you use for the rollup:
SCAN
Purpose
For every input record, Scan generates an output record that consists of a running cumulative
summary for the group to which the input record belongs, up to and including the current record.
For example, the output records might include successive year-to-date totals for groups of
records.
Although it lacks a reformat transform function, scan supports implicit reformat.
Recommendations
If you want one summary record for a group, use ROLLUP.
The behavior of SCAN varies in the presence of dirty data (NULLs or invalid values),
according to which mode you use for the scan:
scan function that takes two input arguments (an input record and
a temporary_type record) and returns an updated temporary_type record
finalize function that returns an output record
For more information, see “Transform package for SCAN”.
transforms/scan/scan.mp
Template SCAN with an aggregation function
This example shows how to compute, from input records containing customer_id, dt (date),
and amount, a running total of transactions for each customer in a dataset. The example uses a
template scan function with the sum aggregation function.
Suppose you have the following input records:
customer_id dt amount
C002142 1994.03.23 52.20
C002142 1994.06.22 22.25
C003213 1993.02.12 47.95
C003213 1994.11.05 221.24
C003213 1995.12.11 17.42
C004221 1994.08.15 25.25
C008231 1993.10.22 122.00
1995.12.1
C008231 52.1
0
You want to produce output records with customer_id, dt, and amount_to_date:
amount_to_dat
customer_id dt
e
C002142 1994.03.23 52.20
C002142 1994.06.22 74.45
C003213 1993.02.12 47.95
C003213 1994.11.05 269.19
C003213 1995.12.11 286.61
C004221 1994.08.15 25.25
C008231 1993.10.22 122.00
temp :: initialize(in) =
begin
temp.amount_to_date :: 0;
end;
Transform folder
Thank you for taking the time to provide us with your valuable information. We strive to provide our
candidates with excellent care
http://chennaitraining.in/qliksense-training-in-chennai/
http://chennaitraining.in/pentaho-training-in-chennai/
http://chennaitraining.in/machine-learning-training-in-chennai/
http://chennaitraining.in/artificial-intelligence-training-in-chennai/
http://chennaitraining.in/snaplogic-training-in-chennai/
http://chennaitraining.in/snowflake-training-in-chennai/
Reply
2.
SEP
18
PARTITION BY EXPRESSION
Purpose
Partition by Expression distributes records to its output flow partitions according to a specified
DML expression or transform function.
The output port for Partition by Expression is ordered. See “Ordered ports”. Although you can
use fan-out flows on the out port, we do not recommend connecting multiple fan-out flows. You
may connect a single fan-out flow; or, preferably, limit yourself to straight flows on the out port.
Partition by Expression supports implicit reformat. See “Implicit reformat”.
Recommendation
Component folding can enhance the performance of this component. If this feature is enabled,
the Co>Operating System folds this component by default. See “Component folding” for more
information.
The component does not fold when connected to a flow that is set to use two-stage routing.
PARTITION BY KEY
Purpose
Partition by Key distributes records to its output flow partitions according to key values.
How Partition by Key interprets key values depends on the internal representation of the key. For
example, the number 4 in a field of type integer(2) is not considered identical to the number 4 in
a field of type decimal(4).
Recommendation
Component folding can enhance the performance of this component. If this feature is enabled,
the Co>Operating System folds this component by default. See “Component folding” for more
information.
The component does not fold when connected to a flow that is set to use two-stage routing.
PARTITION BY PERCENTAGE
Purpose
Partition by Percentage distributes a specified percentage of the total number of input records to
each output flow.
PARTITION BY RANGE
Purpose
Partition by Range distributes records to its output flow partitions according to the ranges of key
values specified for each partition. Partition by Range distributes the records relatively equally
among the partitions.
Use Partition by Range when you want to divide data into useful, approximately equal, groups.
Input can be sorted or unsorted. If the input is sorted, the output is sorted; if the input is unsorted,
the output is unsorted.
The records with the key values that come first in the key order go to partition 0, the records with
the key values that come next in the order go to partition 1, and so on. The records with the key
values that come last in the key order go to the partition with the highest number.
Recommendation
Component folding can enhance the performance of this component. If this feature is enabled,
the Co>Operating System folds this component by default. See “Component folding” for more
information.
The component does not fold when connected to a flow that is set to use two-stage routing.
4
View comments
Loading