Real Time Processing
Real Time Processing
Real Time Processing
By PenchalaRaju.Yanamala
Overview
Understanding Real-time Data
Configuring Real-time Sessions
Terminating Conditions
Flush Latency
Commit Type
Message Recovery
Recovery File
Recovery Table
Recovery Queue and Recovery Topic
Recovery Ignore List
Stopping Real-time Sessions
Restarting and Recovering Real-time Sessions
Rules and Guidelines
Real-time Processing Example
Informatica Real-time Products
Overview
You can use PowerCenter to process data in real time. Real-time processing is
on-demand processing of data from real-time sources. A real-time session reads,
processes, and writes data to targets continuously. By default, a session reads
and writes bulk data at scheduled intervals unless you configure the session for
real-time processing.
To process data in real time, the data must originate from a real-time source.
Real-time sources include JMS, WebSphere MQ, TIBCO, webMethods, MSMQ,
SAP, and web services. You might want to use real-time processing for
processes that require immediate access to dynamic data, such as financial data.
Real-time data. Real-time data includes messages and messages queues, web
services messages, and changes from a PowerExchange change data capture
source. Real-time data originates from a real-time source.
Real-time sessions. A real-time session is a session that processes real-time
source data. A session is real-time if the Integration Service generates a real-
time flush based on the flush latency configuration and all transformations
propagate the flush to the targets. Latency is the period of time from when
source data changes on a source to when a session writes the data to a target.
Real-time properties. Real-time properties
determine when the Integration Service
processes the data and commits the data to
the target.
Terminating conditions. Terminating conditions determine when
the Integration Service stops reading data from the source and ends
- the session if you do not want the session to run continuously.
-Flush latency. Flush latency determines how often the Integration Service
flushes real-time data from the source.
-Commit type. The commit type determines when the Integration Service
commits real-time data to the target.
Message recovery. If the real-time session fails, you can recover messages.
When you enable message recovery for a real-time session, the Integration
Service stores source messages or message IDs in a recovery file or table. If
the session fails, you can run the session in recovery mode to recover
messages the Integration Service could not process.
You can also write messages to other messaging applications. For example, the
Integration Service can read messages from a JMS source and write the data to
a TIBCO target.
PowerExchange Client for PowerCenter, PowerExchange, and the Integration
Service complete the following tasks to process change data:
For more information about change data, see PowerExchange Interfaces for
PowerCenter.
When you configure a session to process data in real time, you configure session
properties that control when the session stops reading from the source. You can
configure a session to stop reading from a source after it stops receiving
messages for a set period of time, when the session reaches a message count
limit, or when the session has read messages for a set period of time. You can
also configure how the Integration Service commits data to the target and enable
message recovery for failed sessions.
Terminating Conditions
Idle time
Message count
Reader time limit
Idle Time
Idle time is the amount of time in seconds the Integration Service waits to receive
messages before it stops reading from the source. -1 indicates an infinite period
of time.
For example, if the idle time for a JMS session is 30 seconds, the Integration
Service waits 30 seconds after reading from JMS. If no new messages arrive in
JMS within 30 seconds, the Integration Service stops reading from JMS. It
processes the messages and ends the session.
Message Count
Message count is the number of messages the Integration Service reads from a
real-time source before it stops reading from the source. -1 indicates an infinite
number of messages.
For example, if the message count in a JMS session is 100, the Integration
Service stops reading from the source after it reads 100 messages. It processes
the messages and ends the session.
Note: The name of the message count terminating condition depends on the
Informatica product. For example, the message count for PowerExchange for
SAP NetWeaver is called Packet Count. The message count for PowerExchange
Client for PowerCenter is called UOW Count.
Reader time limit is the amount of time in seconds that the Integration Service
reads source messages from the real-time source before it stops reading from
the source. Use reader time limit to read messages from a real-time source for a
set period of time. 0 indicates an infinite period of time.
For example, if you use a 10 second time limit, the Integration Service stops
reading from the messaging application after 10 seconds. It processes the
messages and ends the session.
Flush Latency
Use flush latency to run a session in real time. Flush latency determines how
often the Integration Service flushes data from the source. For example, if you
set the flush latency to 10 seconds, the Integration Service flushes data from the
source every 10 seconds.
For change data from a PowerExchange change data capture source, the flush
latency interval is determined by the flush latency and the unit of work (UOW)
count attributes. For more information, see PowerExchange Interfaces for
PowerCenter.
The Integration Service uses the following process when it reads data from a
real-time source and the session is configured with flush latency:
Configure flush latency in seconds. The default value is zero, which indicates that
the flush latency is disabled and the session does not run in real time.
Configure the flush latency interval depending on how dynamic the data is and
how quickly users need to access the data. If data is outdated quickly, such as
financial trading information, then configure a lower flush latency interval so the
target tables are updated as close as possible to when the changes occurred.
For example, users need updated financial data every few minutes. However,
they need updated customer address changes only once a day. Configure a
lower flush latency interval for financial data and a higher flush latency interval for
address changes.
Use the following rules and guidelines when you configure flush latency:
The Integration Service does not buffer messages longer than the flush latency
interval.
The lower you set the flush latency interval, the more frequently the Integration
Service commits messages to the target.
If you use a low flush latency interval, the session can consume more system
resources.
If you configure a commit interval, then a combination of the flush latency and the
commit interval determines when the data is committed to the target.
Commit Type
The Integration Service commits data to the target based on the flush latency
and the commit type. You can configure a session to use the following commit
types:
When you enable message recovery for a real-time session, the Integration
Service can recover unprocessed messages from a failed session. The
Integration Service stores source messages or message IDs in a recovery file,
recovery table, recovery queue, or recovery topic. If the session fails, run the
session in recovery mode to recover the messages the Integration Service did
not process.
Depending on the real-time source and the target type, the messages or
message IDs are stored in the following storage types:
A session can use a combination of the storage types. For example, a session
with a JMS and TIBCO source uses a recovery file and recovery table.
When you recover a real-time session, the Integration Service restores the state
of operation from the point of interruption. It reads and processes the messages
in the recovery file, recovery table, recovery queue, or recovery topic. Then, it
ends the session.
During recovery, the terminating conditions do not affect the messages the
Integration Service reads from the recovery file, recovery table, recovery queue,
or recovery topic. For example, if you specified message count and idle time for
the session, the conditions apply to the messages the Integration Service reads
from the source, not the recovery file, recovery table, recovery queue, or
recovery topic.
In addition to the storage types above, the Integration Service uses a recovery
ignore list if the session fails under certain conditions. For more information, see
Recovery Ignore List.
Sessions with MSMQ sources, web service messages, or change data from a
PowerExchange change data capture source use a different recovery strategy.
For more information, see the PowerExchange for MSMQ User Guide,
PowerCenter Web Services Provider Guide, or PowerExchange Interfaces for
PowerCenter.
Prerequisites
Complete the following prerequisites before you enable message recovery for
sessions with a JMS or WebSphere MQ source and a JMS or WebSphere MQ
target:
Create the recovery queue in the JMS provider or WebSphere MQ. Or, create
the recovery topic in the JMS provider.
Create the recovery queue under the same queue manager as the message
queue so the commit scope is the same.
Configure the recovery queue to be persistent. If the recovery queue is not
persistent, data duplication can occur.
If you do not configure the prerequisites, the Integration Service stores recovery
information in a recovery file instead of a recovery queue or recovery topic.
In the session properties, select Resume from Last Checkpoint as the recovery
1.strategy.
Specify a recovery cache directory in the session properties at each partition
2.point.
Recovery File
The Integration Service stores messages or message IDs in a recovery file for
real-time sessions that are enabled for recovery and include the following source
and target types:
Message Processing
Message Recovery
When you recover a real-time session, the Integration Service reads and
processes the cached messages. After the Integration Service reads all cached
messages, it ends the session.
For sessions with JMS and WebSphere MQ sources, the Integration Service
uses the message ID in the recovery file to retrieve the message from the
source.
The Integration Service clears the recovery file after the flush latency period
expires and at the end of a successful session. If the session fails after the
Integration Service commits messages to the target but before it removes the
messages from the recovery file, targets can receive duplicate rows during
recovery.
A recovery data flush is a process that the Integration Service uses to flush
session recovery data that is in the operating system buffer to the recovery file.
You can prevent data loss if the Integration Service is not able to write the
recovery data to the recovery file. The Integration Service can fail to write
recovery data in cases of an operating system failure, hardware failure, or file
system outage. The recovery data flush applies to sessions that include a JMS or
WebSphere MQ source and non-relational, non-JMS, or non-WebSphere MQ
targets.
You can configure the Integration Service to flush recovery data from the
operating system buffer to the recovery file by setting the Integration Service
property Flush Session Recovery Data to “Auto” or “Yes” in the Administration
Console.
Recovery Table
The Integration Service stores message IDs in a recovery table for real-time
sessions that are enabled for recovery and include the following source and
target types:
The Integration Service temporarily stores message IDs and commit numbers in
a recovery table on each target database. The commit number indicates the
number of commits that the Integration Service committed to the target. During
recovery, the Integration Service uses the commit number to determine if it wrote
the same amount of messages to all targets. The messages IDs and commit
numbers are verified against the recovery table to ensure that no data is lost or
duplicated.
Note: The source must use unique message IDs and provide access to the
messages through the message ID.
PM_REC_STATE Table
When the Integration Service runs a real-time session that uses the recovery
table and has recovery enabled, it creates a recovery table, PM_REC_STATE,
on the target database to store message IDs and commit numbers. When the
Integration Service recovers the session, it uses information in the recovery
tables to determine if it needs to write the message to the target table.
Related Topics:
Target Recovery Tables
Message Processing
Message Recovery
When you recover a real-time session, the Integration Service uses the message
ID and the commit number in the recovery table to determine whether it
committed messages to all targets.
The Integration Service commits messages to all targets if the message ID exists
in the recovery table and all targets have the same commit number. During
recovery, the Integration Service sends an acknowledgement to the source that it
processed the message.
The Integration Service does not commit messages to all targets if the targets
have different commit numbers. During recovery, the Integration Service reads
the message IDs and the transformation state from the recovery table. It
processes messages and writes them to the targets that did not have the
message. When the Integration Service reads all messages from the recovery
table, it ends the session.
If the session fails before the Integration Service commits messages to all targets
and you restart the session in cold start mode, targets can receive duplicate
rows.
The Integration Service temporarily stores message IDs and commit numbers in
a recovery queue or recovery topic that you created in the JMS provider or in
WebSphere MQ. The commit number indicates the number of commits that the
Integration Service committed to the target. During recovery, the Integration
Service uses the commit number to determine if it wrote the same amount of
messages to all targets. The messages IDs and commit numbers are verified
against the recovery queue or recovery topic to ensure that no data is lost or
duplicated.
The Integration Service uses the same recovery queue or recovery topic for all
queue targets in each session. Create multiple recovery queues or recovery
topics for sessions to improve performance.
If you do not specify the recovery queue or recovery topic name in the session
properties or in the JMS connection object, the Integration Service stores
recovery information in the recovery file.
Related Topics:
Recovery Table
Message Processing
When you stop a real-time session, the Integration Service processes messages
in the pipeline based on the following real-time sources:
JMS and WebSphere MQ. The Integration Service processes messages it read
up until you issued the stop. It writes messages to the targets.
MSMQ, SAP, TIBCO, webMethods, and web service messages. The
Integration Service does not process messages if you stop a session before the
Integration Service writes all messages to the target.
When you stop a real-time session with a JMS or a WebSphere MQ source, the
Integration Service performs the following tasks:
When you restart the session, the Integration Service starts reading from the
source. It restores the session and transformation state of operation to resume
the session from the point of interruption.
You can resume a stopped or failed real-time session. To resume a session, you
must restart or recover the session. The Integration Service can recover a
session automatically if you enabled the session for automatic task recovery.
Related Topics:
Recovering Workflows
When you restart a session, the Integration Service resumes the session based
on the real-time source. Depending on the real-time source, it restarts the
session with or without recovery.
You can restart a task or workflow in cold start mode. When you restart a task or
workflow in cold start mode, the Integration Service discards the recovery
information and restarts the task or workflow.
You can restart or recover a session in the Workflow Manager, Workflow Monitor,
or pmcmd. The Integration Service resumes the session based on the real-time
source.
Table 11-1 describes the behavior when you restart or recover a session with the
following commands:
Use the following rules and guidelines when you run real-time sessions:
The Integration Service fails sessions that have message recovery enabled and
contain any of the following conditions:
If the number of messages that the Integration Service reads or writes from the
message queue exceeds the message size limit, increase the message size limit
or decrease the flush latency.
The sample mapping includes the following components:
Source. WebSphere MQ. Each message is in XML format and contains one
purchase order.
XML Parser transformation. Receives purchase order information from the
MQ Source Qualifier transformation. It parses the purchase order ID and the
quantity from the XML file.
Lookup transformation. Looks up the supplier details for the purchase order
ID. It passes the supplier information, the purchase item ID, and item cost to the
Expression transformation.
Expression transformation. Calculates the order cost for the supplier.
Target. Oracle relational database. It contains the supplier information and the
total supplier cost.
You create and configure a session and workflow with the following properties:
Property Value
Message count 1,000
Flush latency interval 2,000 milliseconds
Commit type Source-based commit
Workflow schedule Run continuously
The following steps describe how the Integration Service processes the session
in real-time:
The Integration Service reads messages from the WebSphere MQ queue until
it reads 1,000 messages or after 2,000 milliseconds. When it meets either
1.condition, it stops reading from the WebSphere MQ queue.
The Integration Service looks up supplier information and calculates the order
2.cost.
The Integration Service writes the supplier information and order cost to the
3.Oracle relational target.
The Integration Service starts to read messages from the WebSphere MQ
4.queue again.
The Integration Service repeats steps 1 through 4 as you configured the
5.workflow to run continuously.
You can use the following products to read, transform, and write real-time data:
PowerExchange for JMS. Use PowerExchange for JMS to read from JMS
sources and write to JMS targets. You can read from JMS messages, JMS
provider message queues, or JMS provider based on message topic. You can
write to JMS provider message queues or to a JMS provider based on message
topic.
JMS providers are message-oriented middleware systems that can send and
receive JMS messages. During a session, the Integration Service connects to the
Java Naming and Directory Interface (JNDI) to determine connection information.
When the Integration Service determines the connection information, it connects
to the JMS provider to read or write JMS messages.
PowerExchange for WebSphere MQ. Use PowerExchange for WebSphere
MQ to read from WebSphere MQ message queues and write to WebSphere MQ
message queues or database targets. PowerExchange for WebSphere MQ
interacts with the WebSphere MQ queue manager, message queues, and
WebSphere MQ messages during data extraction and loading.
PowerExchange for TIBCO. Use PowerExchange for TIBCO to read
messages from TIBCO and write messages to TIBCO in TIB/Rendezvous or AE
format.
The Integration Service receives TIBCO messages from a TIBCO daemon, and it
writes messages through a TIBCO daemon. The TIBCO daemon transmits the
target messages across a local or wide area network. Target listeners subscribe
to TIBCO target messages based on the message subject.
PowerExchange for webMethods. Use PowerExchange for webMethods to
read documents from webMethods sources and write documents to
webMethods targets.
The Integration Service connects to a webMethods broker that sends, receives,
and queues webMethods documents. The Integration Service reads and writes
webMethods documents based on a defined document type or the client ID. The
Integration Service also reads and writes webMethods request/reply documents.
PowerExchange for MSMQ. Use PowerExchange for MSMQ to read from
MSMQ sources and write to MSMQ targets.
The Integration Service connects to the Microsoft Messaging Queue to read data
from messages or write data to messages. The queue can be public or private
and transactional or non-transactional.
PowerExchange for SAP NetWeaver. Use PowerExchange for SAP
NetWeaver to read from SAP using outbound IDocs or write to SAP using
inbound IDocs using Application Link Enabling (ALE).
The Integration Service can read from outbound IDocs and write to a relational
target. The Integration Service can read data from a relational source and write
the data to an inbound IDoc. The Integration Service can capture changes to the
master data or transactional data in the SAP application database in real time.
PowerCenter Web Services Provider. Use PowerCenter Web Services
Provider to expose transformation logic as a service through the Web Services
Hub and write client applications to run real-time web services. You can create a
service mapping to receive a message from a web service client, transform it,
and write it to any target PowerCenter supports. You can also create a service
mapping with both a web service source and target definition to receive a
message request from a web service client, transform the data, and send the
response back to the web service client.
The Web Services Hub receives requests from web service clients and passes
them to the gateway. The Integration Service or the Repository Service process
the requests and send a response to the web service client through the Web
Services Hub.
PowerExchange. Use PowerExchange to extract and load relational and non-
relational data, extract change data, and extract change data in real time.
To extract data, the Integration Service reads change data from PowerExchange
on the machine hosting the source. You can extract and load data from multiple
sources and targets, such as DB2/390, DB2/400, and Oracle. You can also use a
data map from a PowerExchange Listener as a non-relational source. For more
information, see PowerExchange Interfaces for PowerCenter.