IDQ Learning
IDQ Learning
IDQ Learning
n standard Data Profiling and Data Quality projects, can anyone please clarify in what sequence in the project
lifecycle one would use Data Explorer, Data Quality and Data Director?
My understanding is
1) Data Profiling - For discovering the data and potential anomalies.
2) Data Quality - Outputs from data profiling stage implemented as data quality rules.
3) Data Director - Used to correct the data based on the DQ output. Can this tool be used to correct the data on
the source system directly?
Implementation logic mentioned by Robert is exactly correct. However I would like to add some more additional
points which makes you even clearer regarding the IDQ process in ETL flow.
The outcome of Exception management process is to clean up any database table of bad records
(outliers). The output of the Exception management process should be a clean database table
that can be sourced directly into the ETL flow and is expected to be of high data quality. When
you design a process flow for Exception management, you cannot directly use the bad records
table itself as the source in the ETL flow. You need to create a separate process to copy data into
the actual source table from the bad table using the "status code" column information. There is
no inherent process to change data in bad table and have it automatically update the Source
table.
The status codes column of the exception table have the following meaning:
UPDATE = 20
REPROCESS = 21
ACCEPT = 22
MERGED = 23
REMERGED = 24
EXTRACTED = 25
REJECT = 26
I recently got Analyst access and I'm just browsing through different options. I need help on Scorecard as of
now.
1. I did a sample profile for 200 rows from relational DB as source. When I run the scorecard on this profile, I
expect results for these 200 rows only. But in turn it runs scores on complete table I believe and so it takes hell
lot of time to generate scorecard. Any workaround on this to use only 200 rows for generating scorecard?
3. Can scorecard be viewed by team members who do not have access to Analyst using hyperlink or something
like this?
1. Scorecards by default runs on the complete data set of the physical data object used to create profile. To run
the scorecard on only the required 200 records, follow the steps below:
Create an Logical Data Object (LDO) using developer client such that the output of LDO is
only the required 200 records.
Notification Email message has an option for "ObjectURL" - A hyperlink to the scorecard. You need to provide
the username and password to access the object.
Refer to the Data Explorer User Guide for more details on the scorecard notifications and other details.
Domain:
The Informatica domain is the administrative unit for the Informatica environment. The
domain is a collection of nodes that represent the machines on which the application
services run. When you install the Informatica services on a machine, you install all files for
all services.
Informatica has a service-oriented architecture that provides the ability to scale services and
to share resources across multiple machines. The Informatica domain is the primary unit for
management and administration of services.
The Informatica domain can contain one or more nodes. Multiple application services can
run on each node. The application service types that you can run depend on the Informatica
license key generated for your organization. When you plan the domain, you must consider
the number of nodes needed in the domain. You also must consider the types of application
services the domain requires and the number of application services that run on each node.
You must verify that each machine in the domain meets the system requirements to run the
installer and to run the application services. You must also verify that the port numbers that
you specify during installation are available on the machines where you install the
Informatica services (How to check whether ports are available or not?)
The domain requires a relational database to store configuration information and user
account privileges and permissions
You must verify that the databases have the disk space required by the Informatica domain
and the application services.
An Informatica domain is a collection of nodes and services. A node is the logical representation of a
machine in a domain. Services for the domain include the Service Manager that manages all domain
operations and a set of application services that represent server-based functionality.
For more information about the Informatica domain, see the Informatica Administrator Guide.
Nodes
Gateway node
A gateway node is any node that you configure to serve as a gateway for the domain.
One node acts as the gateway at any given time. That node is called the master gateway. A
gateway node can run application services, and it can serve as a master gateway node. The
master gateway node is the entry point to the domain.
The Service Manager on the master gateway node performs all domain operations on the
master gateway node. The Service Managers running on other gateway nodes perform
limited domain operations on those nodes.
(What are those limited domain tasks?)
Worker nodes
A worker node is any node not configured to serve as a gateway. A worker node can run
application services, but it cannot serve as a gateway. The Service Manager performs limited
domain operations on a worker node.
Service Manager
The Service Manager in the Informatica domain supports the domain and the application
services. The Service Manager runs on each node in the domain.
The Service Manager manages the following areas on each node in the domain:
Domain support
The Service Manager performs operations on each node to support the domain. Domain
operations include authentication, authorization, and logging. The domain operations
that the Service Manager performs on a node depend on the type of node. For example, the
Service Manager running on the master gateway node performs all domain operations on
that node. The Service Manager running on another gateway node or a worker node
performs limited domain operations on that node.
The Service Manager on each node starts the application services configured to run on that
node. It starts and stops application services based on requests from Informatica
clients.
Application Services
Application services represent server-based functionality. After you complete the
installation, you create application services based on the license key generated for your
organization. When you create an application service, you designate a node to run
the service process. The service process is the run-time representation of a service
running on a node. The service type determines how many service processes can run at a
time.
If you have the high availability option, you can run an application service on
multiple nodes. If you do not have the high availability option, configure each application
service to run on one node.
Some application services require databases to store information processed by the
application service. When you plan the Informatica domain, you also need to plan
the databases required by each application service.
License Key
The license key controls the application services and the functionality that you can use.
Informatica Clients
The clients make requests to the Service Manager or to application services.
S
.
N Client
o Name
Informa
tica
Develo
1 per
2 PowerC
enter
Cli
en
t
Ty
pe
Usage
Metadata
Stores in
Thi
ck
Thi
ck
Model
repository
PowerCent
er
will be
run by
Data
Integrati
on
Service
PowerCe
nter
Com
ment
Client
Data
Transfo
rmatio
n
3 Studio
Analyst
tool
4
Data
Analyz
er
Jaspers
oft
6
Metada
ta
Manag
7 er
Web
Service
s Hub
Consol
8 e
Thi
ck
repository
Data
Transforma
tion
repository
directory
Integrati
on
Service
Data
Transfor
mation
Engine
Model
repository
we
b
Data
Analyzer
repository
we
b
Data
Integrati
on
Service
Data
Analyzer
applicati
on
Reportin
g and
Dashboa
rds
Service
we
b
Metadata
Manager
repository
Metadata
Manager
Service
we
b
we
b
Analy
st
Servi
ce
runs
Web
Services
Hub
Service
Application services:
Analyst Service
The Analyst Service is an application service that runs the Analyst tool in the Informatica
domain. The Analyst Service manages the connections between service components and the
users that have access to the Analyst tool.When you run profiles, scorecards, or mapping
specifications in the Analyst tool, the Analyst Service connects to the Data Integration
Service to perform the data integration jobs. When you work on Human tasks in the Analyst
tool, the Analyst Service connects to the Data Integration Service to retrieve the task data
from the Human task database.
When you view, create, or delete a Model repository object in the Analyst tool, the Analyst
Service connects to the Model Repository Service to access the metadata. When you view
data lineage analysis on scorecards in the Analyst tool, the Analyst Service sends the
request to the Metadata Manager Service to run data lineage.
Note: When you create the Analyst Service, you do not associate it with any relational
databases.
Associated Services
The Analyst Service connects to other application services within the domain.When you
create the Analyst Service, you can associate it with the following application services:
Data Integration Services
You can associate up to two Data Integration Services with the Analyst Service. The Analyst
Service manages the connection to the Data Integration Service that enables users to
perform data preview, mapping specification, scorecard, and profile jobs in the Analyst tool.
The Analyst Service also manages the connection to the Data Integration Service that you
configure to run Human tasks. When you create the Analyst Service, you provide the name
of the Data Integration Services. You can associate the Analyst Service with the same Data
Integration Service for all operations.
Metadata Manager Service
The Analyst Service manages the connection to the Metadata Manager Service that runs
data lineage for scorecards in the Analyst tool. When you create the Analyst Service, you
can provide the name of the Metadata Manager Service.
Model Repository Service
The Analyst Service manages the connection to the Model Repository Service for the Analyst
tool. The Analyst tool connects to the Model Repository Service to create, update, and delete
Model repository objects in the Analyst tool. When you create the Analyst Service, you
provide the name of the Model Repository Service
Associated Services
The Content Management Service connects to other application services within the domain.
When you create the Content Management Service, you can associate it with the following
application services:
Data Integration Service
The Content Management Service uses the Data Integration Service to run mappings to
transfer data between reference tables and external data sources. When you create the
Content Management Service, you provide the name of the Data Integration Service. You
must create the Data Integration Service and Content Management Service on the same
node.
Required Databases
The Content Management Service requires a reference data warehouse in a relational
database. When you create the Content Management Service, you must provide connection
information to the reference data warehouse.
Create the following database before you create the Content Management Service:
Associated Services
The Data Integration Service connects to other application services within the domain. When
you create the Data Integration Service, you can associate it with the following application
service:
Model Repository Service
The Data Integration Service connects to the Model Repository Service to perform jobs such
as running mappings, workflows, and profiles. When you create the Data Integration Service,
you provide the name of the Model Repository Service.
Required Databases
The Data Integration Service can connect to multiple relational databases. The databases
that the service can connect to depend on the license key generated for your organization.
When you create the Data Integration Service, you provide connection information to the
databases. Create the following databases before you create the Data Integration Service:
Profiling warehouse
Stores profiling information, such as profile results and scorecard results. You need a profiling
warehouse to perform profiling and data discovery.
Associated Services
The Metadata Manager Service connects to other application services within the domain.
When you create the Metadata Manager Service, you can associate it with the following
application services:
Required Databases
The Metadata Manager Service requires a Metadata Manager repository in a relational
database. When you create the Metadata Manager Service, you must provide connection
information to the database. Create the following database before you create the Metadata
Manager Service:
Required Databases
The Model Repository Service requires a Model repository in a relational database. When you
create the Model Repository Service, you must provide connection information to the
database. Create the following database before you create the Model Repository Service:
Model repository
Stores metadata created by Informatica clients and application services in a relational
database to enable collaboration among the clients and services. You need a Model
repository to store the design-time and run-time objects created by Informatica clients
and application services.
Integration Service connects to the PowerCenter Repository Service to fetch metadata from
the PowerCenter repository, and then runs and monitors the sessions and workflows.
Note: When you create the PowerCenter Integration Service, you do not associate it with
any relational databases.
Associated Services
The PowerCenter Integration Service connects to other application services within the
domain. When you create the PowerCenter Integration Service, you can associate it with the
following application service:
PowerCenter Repository Service
The PowerCenter Integration Service requires the PowerCenter Repository Service. The
PowerCenter Integration Service connects to the PowerCenter Repository Service to run
workflows and sessions. When you create the PowerCenter Integration Service, you provide
the name of the PowerCenter Repository Service.
Required Databases
The PowerCenter Repository Service requires a PowerCenter repository in a relational
database. When you create the PowerCenter Repository Service, you must provide
connection information to the database. Create the following database before you create the
PowerCenter Repository Service:
PowerCenter repository
Stores metadata created by the PowerCenter Client in a relational database. You need a
PowerCenter repository to store objects created by the PowerCenter Client and to store
objects that are run by the PowerCenter Integration Service.
Reporting Service
The Reporting Service is an application service that runs the Data Analyzer application
in the Informatica domain. The Reporting Service manages the connections between service
components and the users that have access to Data Analyzer. The Reporting Service
stores metadata for schemas, metrics and attributes, queries, reports, user
profiles, and other objects in the Data Analyzer repository. When you run reports for a
data source, the Reporting Service uses the metadata in the Data Analyzer repository to
retrieve the data for the report and to present the report.
Associated Services
The Reporting Service connects to other application services within the domain. When you
create the Reporting Service, you can associate it with the following application services:
PowerCenter Repository Service
The Reporting Service connects to the PowerCenter Repository Service when you use Data
Analyzer to run PowerCenter Repository Reports. When you create the Reporting
Service, you can provide the name of the PowerCenter Repository Service as the reporting
source.
Metadata Manager Service
The Reporting Service connects to the Metadata Manager Service when you use Data
Analyzer to run Metadata Manager Reports. When you create the Reporting Service,
you can provide the name of the Metadata Manager Service as the reporting source.
Required Databases
The Reporting Service requires a Data Analyzer repository in a relational database. When
you create the Reporting Service, you must provide connection information to the database.
Create the following database before you create the Reporting Service:
Associated Services
The Reporting and Dashboards Service connects to other application services within the
domain.After you create the Reporting and Dashboards Service, you can associate it with the
following application services:
PowerCenter Repository Service
The Reporting and Dashboards Service connects to the PowerCenter Repository Service
when you use JasperReports to run PowerCenter Repository Reports. After you create the
Reporting and Dashboards Service, you can provide the name of the PowerCenter Repository
Service as the reporting source.
Required Databases
The Reporting and Dashboards Service requires a Jaspersoft repository in a relational
database. When you create the Reporting and Dashboards Service, you must provide
connection information to the database.Create the following database before you create the
Reporting and Dashboards Service:
Jaspersoft repository
Stores metadata for PowerCenter Repository Reports and Metadata Manager Reports. You
need a Jaspersoft repository to use JasperReports Server to run PowerCenter Repository
Reports and Metadata Manager Reports.
Search Service
The Search Service is an application service that manages search in the Analyst tool and
Business Glossary Desktop.
By default, the Search Service returns search results from a Model repository, such as data
objects, mapping specifications, profiles, reference tables, rules, and scorecards. The Search
Service can also return additional results. The results can include related assets, business
terms, and policies. The results can include column profile results and domain discovery
results from a profiling warehouse. In addition, you can perform a search based on patterns,
data types, unique values, or null values.
Note: When you create the Search Service, you do not associate it with any relational
databases.
Associated Services
The Search Service connects to other application services within the domain.
When you create the Search Service, you can associate it with the following application
services:
Analyst Service
The Analyst Service manages the connection to the Search Service that enables and
manages searches in the Analyst tool. The Analyst Service determines the associated
Search Service based on the Model Repository Service associated with the Analyst Service.
Data Integration Service
The Search Service connects to the Data Integration Service to return column profile and
domain discovery search results from the profiling warehouse associated with the Data
Integration Service. The Search Service determines the associated Data Integration Service
based on the Model Repository Service.
Model Repository Service
The Search Service connects to the Model Repository Service to return search results from a
Model repository. The search results can include data objects, mapping specifications,
profiles, reference tables, rules, and scorecards. When you create the Search Service, you
provide the name of the Model Repository Service.
Associated Services
The Web Services Hub Service connects to other application services within the domain.
When you create the Web Services Hub Service, you can associate it with the following
application services:
PowerCenter Integration Service
The Web Services Hub Service connects to the PowerCenter Integration Service to send
requests from web service clients to the PowerCenter Integration Service. The Web Services
Hub Service determines the associated PowerCenter Integration Service based on the
PowerCenter Repository Service.
PowerCenter Repository Service
The Web Services Hub Service connects to the PowerCenter Repository Service to send
requests from web service clients to the PowerCenter Repository Service. When you create
the Web Services Hub Service, you provide the name of the PowerCenter Repository Service.
Databases:
Domain configuration repository - INFA_DOMAIN
Must have permissions to create and drop tables, indexes, and views, and to select, insert,
update, and delete data from tables
The domain stores configuration and user information in a domain configuration repository.
Jaspersoft repository:
The Jaspersoft repository stores reports, data sources, and metadata corresponding to the
data source. You must specify the Jaspersoft repository details when you create the
Reporting and Dashboards Service.
Model repository:
Informatica services and clients store data and metadata in the Model repository. Before you
create the Model Repository Service, set up a database and database user account for the
Model repository.
PowerCenter repository:
A PowerCenter repository is a collection of database tables containing metadata. A
PowerCenter Repository Service manages the repository and performs all metadata
transactions between the repository database and repository clients.
Profiling warehouse:
The profiling warehouse database stores profiling and scorecard results. You specify the
profiling warehouse connection when you create the Data Integration Service
Note: Ensure that you install the database client on the machine on which you want to run
the Data Integration Service.
The installer starts the Informatica service. The Informatica service starts the Service
Manager for the node. The Service Manager generates log files that indicate the
startup status of a node. Use these files to troubleshoot issues when the
Informatica service fails to start and you cannot log in to Informatica
Administrator. The Service Manager log files are created on each node.
catalina.out:
Log events from the Java Virtual Machine (JVM) that runs the Service Manager.
For example, a port is available during installation, but is in use when the Service Manager
starts. Use this log to get more information about which port was unavailable during startup
of the Service Manager.
The catalina.out file is in the /tomcat/logs directory.
node.log:
Log events generated during the startup of the Service Manager on a node. You can
use this log to get more information about why the Service Manager for a node failed to
start.
For example, if the Service Manager cannot connect to the domain configuration database
after 30 seconds, the Service Manager fails to start.
The node.log file is in the /tomcat/logs directory.
INFA_DOMAINS_FILE variable to the path and file name of the domains.infa file
Use INFA_HOME to designate the Informatica installation directory
If you enable secure communication for the domain, set the INFA_TRUSTSTORE variable with the directory that contains
the truststore files for the SSL certificates
The directory must contain truststore files named infa_truststore.jks and infa_truststore.pem.
You must set the INFA_TRUSTSTORE variable if you use the default SSL certificate provided by Informatica or a certificate
that you provide
The following table describes the database connections that you must create before you
create the associated application services
Database Connection
Data object cache database
Description
To access the data object cache,
create the data object cache
connection for the Data Integration
Service.
To store Human task metadata,
create the human task database
connection for the Data Integration
Service.
To create and run profiles and
scorecards, create the profiling
warehouse database connection for
the Data Integration Service.
To create and run profiles and
scorecards, select this instance of
the Data Integration Service when
you configure the run-time
properties of the Analyst Service.
To store reference data, create the
reference data warehouse
connection for the Content
management service
Configuring IDQ:
Create 3 databases:
INFA_MRS - For Model Repository Database.
INFA_PROWHS - Profiling warehouse database
INFA_ANLSTG Analyst stage database
Created INFA_HUMAN user for Human Task database.
Created INFA_SQL_PROP user for SQL Properties as part of Data Integration service.
INFA_REF/INFA_REF:
Create database INFA_REF for Reference Database. After that need to create connection in
Admin Console, need to create content management service.
Logon to Infa admin console
Create 6 connections to point to above databases
Create new model repository service use infa_mrs database
It will create content and it may take some time.
Selected Human Task Service Module and Profiling Service Module. Did not select others.
Selected Human Task Service and Profiling Service. Did not select others.
http://WIN-A4ZOPLLNM64:8085/analyst/
http://WIN-A4ZOPLLNM64:8085/analyst/
Administrator/Administrator
Creating a profile in Informatica Analyst:
Property
Description
User name
Password
Code Page
Environment SQL
Transaction SQL
Retry Period
Parallel Mode
Creating a Connection
In the Administrator tool, you can create relational database, social media, and file systems connections.
1.
2.
3.
4.
5.
In the New Connection dialog box, select the connection type, and then click
OK. The New Connection wizard appears.
6.
Enter the connection properties.The connection properties that you enter depend
on the connection type. Click Next to go to the next page of the New Connection
wizard.
7.
When you finish entering connection properties, you can click Test Connection to
test the connection.
8.
Click Finish.
Application Client
Application Services
Repositories
Data Analyzer
Reporting Service
Informatica
Reporting &
Dashboards
Informatica Analyst
Data Analyzer
repository
Jaspersoft repository
Analyst Service
Data Integration Service
Model Repository Service
Search Service
Data Integration Service
Informatica Data Director
Service
- Analyst Service
- Content Management Service
- Data Integration Service
- Model Repository Service
Model repository
Metadata Manager
PowerCenter Client
PowerCenter repository
PowerCenter repository
The following application services are not accessed by an Informatica application client:
PowerExchange Listener Service. Manages the PowerExchange Listener for bulk data
movement and change data capture. The PowerCenter Integration Service connects to
the PowerExchange Listener through the Listener Service.
PowerExchange Logger Service. Manages the PowerExchange Logger for Linux, UNIX,
and Windows to capture change data and write it to the PowerExchange Logger Log files.
Change data can originate from DB2 recovery logs, Oracle redo logs, a Microsoft SQL Server
distribution database, or data sources on an i5/OS or z/OS system.
SAP BW Service. Listens for RFC requests from SAP BI and requests that the PowerCenter
Integration Service run workflows to extract from or load to SAP BI.
RFC. Purpose. Communication between applications in different systems in
the SAP environment includes connections between SAP systems as well as
between SAP systems and non-SAP systems.
Remote Function Call (RFC) is the standard SAP interface for communication
between SAP systems.
Feature Availability
Informatica products use a common set of applications. The product features you can use depend on your product license.
The following table describes the licensing options and the application features available with each option:
Licensing
Option
Data Explorer
Data Quality
Data Services
Profiling
Scorecarding
Reference table management
Create profiling rules
Run rules in profiles
Bad and duplicate record management
Informatica Analyst
Use to analyze, cleanse, standardize, profile, and score data in an
enterprise
Column and rule profiling, scorecarding, and bad record and duplicate record
management,
You can also manage reference data and provide the data to
developers in a data quality solution
Errors:
Mapping service associated with the Analyst service is disabled or is not available.
Recycle the Mapping service in the Administrator tool.
Below module was set to false, now made it to true in admin console.
And recycled Repository service, Data Int service and Analyst service.
No data domains in the data domain glossary. This error has come
while creating Quick profile in Discovery workspace.
Solution:
https://mysupport.informatica.com/message/40554#40554
Login to informatica administator console. click on the analyst service.go to action on right
hand side. the click on audit table > create. Once the audit table is created the analyst
service can create the reference table.
I could not find this option, I feel content management service is required to create reference
tables from Analyst. So need to create Reference Data Warehouse and Content
Management service.
Created Content management service. After that got the below error: Audit Tables do not
exist.