Front Cover: Ibm Infosphere Information Server Administration V9.1
Front Cover: Ibm Infosphere Information Server Administration V9.1
Front Cover: Ibm Infosphere Information Server Administration V9.1
cover
Front cover
Student Notebook
ERC 1.0
Student Notebook
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International
Business Machines Corp., registered in many jurisdictions worldwide.
The following are trademarks of International Business Machines Corporation, registered in
many jurisdictions worldwide:
DataStage DB2 IA
Informix InfoSphere MVS
QualityStage WebSphere z/OS
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or
both.
Windows is a trademark of Microsoft Corporation in the United States, other countries, or
both.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of
Oracle and/or its affiliates.
Other product and service names might be trademarks of IBM or other companies.
TOC Contents
Course description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-38
Exercises Unit 01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-39
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-40
viii Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Notebook
xii Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Notebook
xiv Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Notebook
Duration: 4 days
Purpose
IBM InfoSphere Information Server hosts a suite of products designed
for the development and delivery of data integration, data quality, and
data governance jobs. This course describes and discusses
Information Server administrative tasks surrounding the Suite as a
whole, such as security, session management, and backup and
recovery, and administrative tasks related to key Information Server
products such as DataStage and Information Analyzer.
Audience
Information Server administrators who will be supporting developers
for IBM InfoSphere Information Server and IBM InfoSphere
Information Server for z/OS products, including DataStage,
QualityStage, Information Analyzer, FastTrack, Information Services
Director, and Metadata Workbench.
Prerequisites
Those taking this course should have some experience with database
and system configuration. Some experience with Linux is helpful, but
not required.
Objectives
After completing this course, you should be able to:
Identify Information Server functional components, product
modules, and architecture components
Use and administer the Information Server products using their
clients
Configure Information Suite security for users and groups
Start and stop Information Server (IS) components
Manage IS sessions, logging and reporting
Contents
Unit 1. Technical Overview
Unit 2. Overview of Clients used for Administration
Unit 3. Authentication and Suite Security
Unit 4. Stopping and Starting Information Server
Unit 5. Session Management
Unit 6. Engine Tier Architecture
Unit 7. Engine Tier Configuration
Unit 8. Engine Tier Database Connectivity
Unit 9. Engine Tier Monitoring
Unit 10. Metadata Asset Management
Unit 11: Information Services Console Configuration
Unit 12: Installation, Deployment, and Recovery
xvi Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Notebook
xviii Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Notebook
pref Agenda
Day 1
Unit 0: Welcome and Agenda
Unit 1: Technical Overview
Exercise 01
Unit 2: Overview of Clients used for Administration
Exercise 02
Unit 3: Authentication and Suite Security
Exercise 03
Unit 4: Stopping and Starting Information Server
Exercise 04
Day 2
Unit 5: Session Management
Exercise 05
Unit 6: Engine Tier Architecture
Exercise 06
Unit 7: Engine Tier Configuration
Exercise 07
Day 3
Unit 8: Engine Tier Database Connectivity
Exercise 08
Unit 9: Engine Tier Monitoring
Exercise 09
Unit 10. Metadata Asset management
Exercise 10
Day 4
Unit 11: Information Services Console Configuration
Exercise 11
Copyright IBM Corp. 2007, 2012 Unit 0. IBM InfoSphere Information Server Administration v9.1 0-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Course objectives
After completing this course, you should be able to:
Identify Information Server functional components, product
modules, and architecture components
Use and administer the Information Server products using their
clients
Configure Information Suite security for users and groups
Start and stop Information Server (IS) components
Manage IS sessions, logging and reporting
Configure and manage IS Engine components including
environment variables, configuration files, data sets, and
operational metadata
Establish database connectivity with IS
Copyright
Copyright
IBM
IBM
Corporation
Corporation
2007,
20102012
Notes:
0-2 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Copyright
Copyright
IBM
IBM
Corporation
Corporation
2007,
20102012
Notes:
Copyright IBM Corp. 2007, 2012 Unit 0. IBM InfoSphere Information Server Administration v9.1 0-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Agenda
Day 1
Unit 0: Welcome and Agenda
Unit 1: Technical Overview
Exercise 01
Unit 2: Overview of Clients used for Administration
Exercise 02
Unit 3: Authentication and Suite Security
Exercise 03
Unit 4: Stopping and Starting Information Server
Exercise 04
Day 2
Unit 5: Session Management
Exercise 05
Unit 6: Engine Tier Architecture
Exercise 06
Unit 7: Engine Tier Configuration
Exercise 07
Copyright
Copyright
IBM
IBM
Corporation
Corporation
2007,
20102012
Notes:
0-4 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Agenda
Day 3
Unit 8: Engine Tier Database Connectivity
Exercise 08
Unit 9: Engine Tier Monitoring
Exercise 09
Unit 10: Metadata Asset Management
Exercise 10
Day 4
Unit 11: Information Services Console Configuration
Exercise 11
Unit 12: Installation, Deployment, and Recovery
Exercise 12
Unit 13: Serviceability
Exercise 13
Copyright
Copyright
IBM
IBM
Corporation
Corporation
2007,
20102012
Notes:
Copyright IBM Corp. 2007, 2012 Unit 0. IBM InfoSphere Information Server Administration v9.1 0-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Introductions
Name
Company
Where you live
Your job role
Current experience with products and technologies
in this course
Databases
ETL (Extraction Transformation Load) tools
Metadata management tools
Data quality technology
Do you meet the course prerequisites?
Some experience with database and system configuration
Class expectations
Copyright
Copyright
IBM
IBM
Corporation
Corporation
2007,
20102012
Notes:
0-6 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Unit objectives
After completing this unit, you should be able to:
List the Information Server functional categories
List the Information Server products and components that
support the Information Server functional categories
List the Information Server software, architectural tiers
Notes:
1-2 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Unified Deployment
Notes:
Information Server (IS) provides four basic categories of functionality: Understand,
Cleanse, Transform, Deliver. These functional categories support many different types of
enterprise data processing projects, including data integration, data quality, and business
information exchange projects, as well as many other types of enterprise projects.
Information Server hosts various products and components that provide this functionality.
These are discussed on the following pages.
Understanding has to do with functionality that helps you understand your data,
functionality that helps you understand how to accomplish what you want to accomplish,
and functionality that helps you to understand the jobs you are building to accomplish your
goals.
Cleansing functionality is used to correct and standardize the data processed by your jobs.
Transformation functionality is used to combine and restructure the data processed by your
jobs into useful information for your consumers.
Deliver functionality is used to deliver the product of your jobs to consumers.
Metadata produced and consumed by the hosted Information Server products is stored in a
unified, integrated Repository. This enables the produced and consumed metadata to be
shared across the platform of hosted products.
The Information Server functionality is executed using the Information Server parallel
processing engine, which uses parallel technology to process huge amounts data at
tremendous speeds.
1-4 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Blueprint Director
Information Analyzer QualityStage FastTrack Information Services
Discovery DataStage Director
Business Glossary Change Data Delivery
Metadata Workbench
Notes:
Information Server (IS) hosts various products that support each of the various functional
categories. This graphic lists the products that apply to each functional category.
Some of these products support more than one functional category. Later pages will
discuss these products in more detail.
Notes:
Different roles are involved in the typical enterprise data integration project, each role
producing and consuming different types of metadata. With IBM Information Server,
metadata is managed across these different roles and functions. Different products are
geared towards different user roles. For example, FastTrack is geared towards business
analysts. DataStage is geared towards developers. As each product creates new
metadata, that metadata is immediately available to others working on the project. This
enables the different user roles to communicate with one another and to work together and
share information.
Integrated metadata management has many benefits including simplified data integration,
change management, reliable information, and increased data governance.
1-6 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Understanding
Notes:
Blueprint Director
Define and manage a
blueprint of your data
integration project from initial
sketches through delivery Business
Analysts
Link Information Server
metadata assets (files, table
definitions, mapping
specifications, DataStage
ETL jobs) to blueprint icons
Use pre-built templates for
usage scenarios, including
warehousing projects
Notes:
You use Blueprint Director to create a plan or blueprint of your Information Server project.
The blueprint is created by laying stages on a canvas and linking them together. The
stages represent different types of metadata assets (files, table definitions, mapping
specifications, DataStage jobs, and so on).
Blueprint Director comes with a set of pre-built templates for different, standard project
scenarios. Each step of the project is fully documented.
1-8 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Information Analyzer
In-depth analysis of existing data
systems
Analysis of application, database, and
file-based sources for content, quality,
and structure
Profiling of fields, and relationship Subject Matter Data
analysis across fields and across Experts Analysts
sources
Other Business
Ongoing measurement and baseline
reporting of information quality Product Modules Glossary
Analyze source data structures, and
monitor adherence to integration and
quality rules
Creates metadata that describes
where information is managed across
systems
Provides an understanding of the fitness
of specific sources and highlights data
that may need downstream attention
Physical View
Notes:
Information Server takes a three-sided approach to understanding, each side leveraging a
different type of metadata. The first is focused on physical metadata the structure and
contents of the different source systems within your environment.
This is accomplished through data-centric profiling and analysis of source systems,
including column analysis, table analysis, and cross-table analysis, that provide detailed
profiling of the data in each column (cardinality, nullability, range, scale, length, precision).
This activity is typically conducted by data analysts and subject matter experts. The
product that automates this is Information Analyzer. It provides insight into the quality and
usage characteristics of the information. It can also help uncover data relationships across
systems, through foreign key affinity mapping. Profiling is designed to become an ongoing
process, comparing ongoing quality against a baseline, to understand how data quality
changes over time and to ensure that the understanding assumptions are still holding true.
Discovery
Compliments Information Analyzer functionality
Discover and validate possible matching keys across multiple
data sources
Discover complex business rules between two structured data
sets
Cross source data preview that enables analysts to see values
that conform to the business rules and anomalies that do not
conform
Notes:
Discovery complements some of the functionality of Information Analyzer. Both products
are used to understand the data in project sources and targets. You can use Discovery to
look for and validate possible keys in different sets or sources of data. And you can use it to
look for data that is related by possibly complex business rules.
You can also use Discovery to search for anomalies in the data, that is, data that does not
conform to the business rules used to generate it.
1-10 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Business Glossary
Facilitate communications
between roles by creating and
managing a shared vocabulary of
categories and terms
Notes:
Business Glossary is a web-based tool that enables analysts and subject matter experts to
create, manage, and share a common enterprise vocabulary and classification system.
The terms used in the glossary can be linked to Information Server metadata assets, such
as columns, tables, and DataStage jobs. These terms can be used to clarify and describe
the asset.
Also within Business Glossary, stewards can be assigned to metadata assets. These
stewards are responsible for the assets. They are the ones to go to if there are questions
about the assets.
Metadata Workbench
Graphical exploration of metadata
assets generated and consumed by
Information Server component
applications
Cross-tool graphs describing data Data
Integration Developers
lineage, business meaning, and Managers
impact dependencies
Provides IT professionals with a tool for
Ability to extend lineage and impact exploring and understanding the assets
generated and used by the Information
analysis to applications and assets Server suite.
Notes:
Metadata Workbench provides visual web-based exploration of metadata assets generated
and used by IBM Information Server components. It improves business trust in information
and increases IT responsiveness by tracing and maintaining the relationship paths of
information throughout an integration lifecycle. It visually depicts these relationships from
the sources of information to the places where information is actually used, even across
different tools and technologies. Metadata Workbench describes the complete data lineage
from applications, reports, and data warehouses back to source systems, including the
types of processing that was performed on them along the way. It also visualizes the
impact of any change to any information asset, including databases and services that
would be affected if changes occurred within a DataStage job.
1-12 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Cleansing
Notes:
Notes:
There are several types of problems within enterprise data stores.
1. The first is a lack of information standards. Names, addresses, part numbers, and other
data are entered in inconsistent ways, particularly across different systems.
2. Another common issue involves data surprises in individual fields. Data in the database
is often misplaced, or fields are used for multiple purposes as where a name field
contains company and address information, a tax ID field contains telephone numbers,
and the telephone field has a variety of mistakes.
3. A third common problem is information buried in free-form fields. In this case valuable
information is hidden away in text fields. Since these fields are difficult to query using
SQL, this information is often not leveraged, although it likely has value to the business.
This type of problem is common in product information and Customer Support case
records.
4. The fourth problem is data myopia a term for the lack of consistent identifiers across
different systems. Without adequate foreign-key relationships, it is impossible to get a
1-14 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty complete view of information across systems. This example shows three products that
look very different, but are actually the same.
5. The final problem is redundancy within individual tables. This is extremely common,
where data is re-entered into systems because the data entry mechanism is not aware
that the original record is already there.
QualityStage functionality
Provides specialized data quality
processing
Ensures clean, standardized, de-
duplicated information
Enables a single version of the truth
Supports global postal verification
Provides visual tools for designing Subject Matter Data
Experts Analysts
quality rules and matching logic
Seamlessly integrated with DataStage Standardize and correct source data
Precisely calibrates matching rules fields, and match records together
across sources to create a single view
Allows quality logic to be deployed
seamlessly within DataStage
Extraction, Transformation, Load
(ETL) jobs
Notes:
QualityStage is a product that helps to identify and resolve the data cleansing issues
previously discussed. It provides data quality functions on an easy-to-use,
design-as-you-think flow diagram. This allows data quality to be embedded in any
information integration process.
QualityStage data quality functions include:
Free-form text investigation: Enables you to recognize and parse out individual fields of
data from free-form text
Standardization: Enables individual fields to be made uniform according to your
standards
Address verification and correction: uses postal information to standardize, validate,
and enrich address data
Matching: Enables duplicates to be removed from individual sources, and common
records across sources to be identified and linked
1-16 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty Survivorship: Enables the best data from across different systems to be merged into a
consolidated record.
Transformation
Notes:
1-18 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Federated query
Query
Notes:
Information Server transforms information from the application-centric context in which it is
currently locked, into a entirely new business contexts that are appropriate to new business
opportunities or challenges. This type of transformation is not simply about
format-to-format translation, but is more focused on merging data together. Since
transformation is really focused on the context of information, it requires an understanding
of the information sources, business meaning, and relationships, so it needs to be created
by information experts (data analysts, database administrators, subject matter experts),
using the understanding provided by the metadata.
DataStage
Create codeless, visual design of ETL
data flows using built-in transformation
components (stages) and links
Use stages to extract data from and load data
to data resources, including database tables,
sequential files, enterprise resources
Developers Architects
Links specify the flow of data from one stage
to another Transform and aggregate any volume
of information in batch or real time
Can create reusable sets of components through visually designed logic
(shared containers) that can be shared across
jobs, projects, and developers
Complete ETL functionality with
metadata-driven productivity
Supports team-based development
and collaboration
Notes:
DataStage is the main Information Server product that is focused on transformation and
movement of information. DataStage enables codeless visual design of data flows, and
includes built-in transformation components (stages) and connectors.
DataStage is built around team collaboration and reuse. Everything from individual stages,
to connections, to entire data flows can be reused across different jobs and projects. In
addition, DataStage leverages the shared platform services for parallel processing,
administration, deployment, and connectivity.
1-20 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
FastTrack
Business
Used in conjunction Users
with DataStage
Build mapping
specifications that
describe and
document DataStage Generated
ETL jobs DataStage job
Generate DataStage
jobs from the
mapping
specifications FastTrack
mapping
Reverse-engineer
specification
DataStage jobs into
mapping
specifications
Notes:
Mapping specifications specify how data is mapped and transformed from source fields to
target fields. Business analysts create mapping specifications, leveraging source analysis,
target models, and metadata to facilitate the mapping process. Prototype DataStage ETL
jobs can be generated from these FastTrack mapping specifications. These mapping
specifications guide the DataStage developers work, and provide DataStage them with a
head-start in designing and building their DataStage jobs.
DataStage jobs can also be reverse-engineered back into mapping specifications that
document their mappings and transformations.
Delivery
Notes:
1-22 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Information Services Director is used to deliver functional and component logic as
Enterprise Java Beans or Web Services. Within the Information Server context, this logic
includes database functionality as well as DataStage ETL functionality.
DataStage jobs can include ISD input stages and/or ISD output stages. The ISD input
stages are used in a service to pass values to the job. ISD output stages are used to return
data to the service that can then be passed to the service consumers.
All functions are deployed as shared services within a Service Oriented Architecture
(SOA). This is done consistently, whether you are using DataStage, QualityStage, or DB2.
Notes:
Change Data Delivery is used to deliver changed data to consumers of the data. The
changed data can be delivered for data replication or synchronization or for dynamic data
warehousing.
Change Data Delivery can replicate large volumes of data with a minimal impact on
production systems.
Replication is supported for a large number of different relational database systems.
1-24 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Notes:
Information Server provides a unified architecture that works with all types of information
integration. Common services, unified parallel processing, and unified metadata are at the
core of the IS architecture.
The architecture is service-oriented, enabling Information Server to work within an
organization's evolving enterprise service-oriented architectures. A service-oriented
architecture also connects the individual products of Information Server.
1-26 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Metadata Metadata
Access Services Analysis Services
Metadata Server
Notes:
This graphic shows the Information Server backbone. The hosted applications are at the
top. They all share the same services displayed in the middle. They all share the same
repository displayed at the bottom. The Information Server parallel processing engine is
used by several Information Server applications to run their jobs, including DataStage ETL
jobs, QualityStage data cleansing jobs, and Information Analyzer data analysis jobs.
Scale up by adding
processors or
nodes with no
design change or
re-compilation
External
configuration file
specifies hardware
MPP, GRID, and
configuration and Single
processor
SMP System
Clustered
resources Systems
Notes:
Information Server uses a parallel processing layer (Engine) that is used by DataStage,
QualityStage, Information Analyzer, and other IS products and components. This
architecture enables those products to scale up their processing speeds by adding
additional processors, in several different hardware configurations.
1-28 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Information Server functionality, products, and components are separated into four different
layers or tiers. During Information Server installation you specify which tier or tiers you want
to install on a particular computer system. Different tiers can be installed on the same or
different computers that are network connected.
These different tiers are described and discussed in the following pages.
Architecture diagram
Information Server
Information Server Platform
Services 1 Repository 1
Client 1 .. N
Platform Services
Common Product-specific
Administrative Clients Services Services
Metadata
Desktop and Web Repository
Application Server
User Clients
Working Areas
Engine 1 .. N
DataStage/QualityStage
Scratch and Dataset
Information Server Engine
Information Analyzer data
Communication Agents
Notes:
Information Server clients include:
- Information Server Web Console (IS administration/reporting)
- DataStage/QualityStage clients (Administrator, Designer and Director)
- FastTrack client
- Metadata Workbench client
- Information Server Console: hosts Information Analyzer and Information Services
Director
- WebSphere Application Server (WAS) client
- Information Server Manager
- Multi-Client Manager
- Information Server Command Line Interface (istools)
Services:
1-30 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty - Uses IBM WebSphere Application Server (WAS) to implement the J2EE services
functionality
Repository:
- DB2, Oracle, and SQL Server
Parallel engine:
- A C++ compiler is required to compile DataStage, QualityStage, and Information
Analyzer jobs into an executable form capable of being run by the parallel engine.
Platform topologies
Services
Services
Notes:
The diagram shows DB2 as the Repository database server, but Oracle and SQL Server
are also supported, as previously noted. Although only one Engine is shown for each
topology, Information Server supports multiple parallel engines on the same or separate
systems.
All tiers should be installed in the same physical LAN, connected by high-speed network
connections.
The Services and Engine platform types must match. The Repository database need not
match platform type of the Services and Engine.
1-32 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Client tier
Provides access to both administrative clients and user clients
Administrative clients include Information Server clients as well as clients specific
to Information Server hosted products:
Information Server Web Console
Security
Session maintenance
Logging and reporting management
DataStage Administrator client
DataStage global and project configuration and defaults
DataStage Designer client
Configuration file editing
Other Information Server products have a single client used for both administration and
user tasks
Administrative tasks require product administrator authorization
User clients for specific Information Server products and functional components:
Appropriate interfaces for the type of user (business or technical)
Facilitate the Information Server analysis, cleansing, integration, and delivery functions
Notes:
Information Server products and components can be accessed through client components.
The client tier contains both administrative clients and user clients.
Some products and functionality are accessed through a web browser. These are called
thin clients, because the functional components exist on the server but are delivered to
the web browser.
Other clients are called thick clients, because functional components are installed and
exist on the client computer system as well as the server computer system.
Services tier
Set of shared services that centralize core tasks across the platform
Administrative tasks such as security, user administration, logging, and
reporting
Repository services
Shared services allow these tasks to be managed and controlled in one
place, regardless of which product is using the service
Notes:
The Services tier consists of a set of shared services that centralize core tasks across the
platform.
Some services address functionality that is unique to a specific Information Server product
or component. Other services, such as security services, are used across multiple products
and components.
The services tier is deployed within an IBM WebSphere Application Server (WAS) instance.
The computer system running the WAS instance is referred to as the domain or services
host system.
1-34 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Engine tier
Components
Engine: The high-performance, parallel engine that performs analysis,
cleansing, and transformation
Connectors: Provide common connectivity to external resources such as
DB2, Teradata, Oracle, Sybase, InfoSphere MQ, and others
Packs: provide high-speed connectivity to packaged enterprise applications
QualityStage modules: a set of integrated modules for accomplishing data
cleansing and re-engineering tasks such as Investigating, Standardizing,
Matching and Survivorship
Service Agents: manages bi-directional communication between the engine
processes and the Repository
To deploy the Engine tier to multiple machines, the Information Server engine
installation software is copied or NFS mounted to each engine server
Notes:
The engine tier consists of the following pieces:
- Information Server Parallel Engine: The high-performance, parallel engine that
performs analysis, cleansing and transformation processing
- Connectors: Provide common connectivity to external resources such as DB2,
Teradata, Oracle, Sybase, InfoSphere MQ, and others.
- Packs: provide high-speed connectivity to packaged enterprise applications
- QualityStage Modules: a set of integrated modules for accomplishing data cleansing
and re-engineering tasks such as Investigating, Standardizing, Matching and
Survivorship
- Service Agents: manages bidirectional communication between the engine
processes and the Metadata Repository
To deploy the engine tier to multiple computer, the Information Server engine software is
copied or NFS mounted to each server.
Repository tier
Stores objects and metadata for Information Server and each
of its hosted products
Enables Information Server products to share metadata with
each other throughout the data integration lifecycle
For the Repository database (named XMETA by default), the
Information Server installation package comes with DB2
An existing instance DB2 instance can also be configured
If another DBMS is used (for example, Oracle), scripts must be run
before the installation to configure the Repository
Notes:
The Information Server Repository stores the objects and metadata produced and
consumed by Information Server hosted products and components. The Repository is
implemented as a database, named XMETA by default. Since all the products hosted by
Information Server use the same XMETA database, metadata produced by one product
can be shared with other Information Server products.
For the XMETA database, DB2 is supported. DB2 can be installed as part of the
Information Server installation or an existing DB2 instance can be used. Other database
systems, such as Oracle, are also supported.
1-36 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Tier interaction
2. Authentication Service
retrieves credential
information
Client
Client Services Repository
33
Copyright IBM Corporation 2007, 2012
Notes:
DataStage clients log into the IS Server and retrieve the DataStage credentials the users
are mapped to. The DataStage client, using the IS Authentication Service, logs into the IS
Server as follows:
- The host name and port number provided in the DataStage login window are used to
do an HTTP request with the IS server.
- The HTTP request is going to return the JNDI properties needed to establish a
remote EJB session between the client and the IS server. One of these JNDI
properties is the Provider URL which include the hostname and port number (from
the InfoSphere serverindex.xml file). The client uses JNDI lookups to call and work
with IS Services using the retrieved JNDI properties.
- The IS Server returns to the client the mapped credentials for the user. Even if
credential mapping is turned off (shared user registry mode), the credentials needed
to log in to the DataStage Server are returned from the IS Server (in this case, the
credentials will be the same as the ones used to login to the IS server). These will
allow the client to log onto the various DataStage Servers installed.
Checkpoint
1. List the four Information Server platform functions?
2. Which IS product or component is used to build ETL (Extract,
Transform, Load) jobs?
3. Name an IS product or component that can be used for
metadata management of the IS shared Repository?
4. List the four IS architecture tiers.
Notes:
Write your answers here:
1-38 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Exercises Unit 01
In this lab exercise, you will:
Identify Information Server functions and
associated components
Notes:
Unit summary
Having completed this unit, you should be able to:
Identify Information Server platform functional components
Identify Information Server platform component modules
Identify Information Server software architecture components
Notes:
1-40 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit objectives
After completing this unit, you should be able to:
Log in and explore Information Server dedicated administrative
clients, including:
Information Server Web Console
WebSphere Application Server (WAS) console
Metadata Asset Manager
Log in and explore Information Server hosted product clients,
including:
Console for IBM Information Server
DataStage clients
FastTrack
Business Glossary
Metadata Workbench
Notes:
2-2 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Services
3
Copyright IBM Corporation 2007, 2012
Notes:
The Information Server clients run on Windows only. Unless the server systems are also
running on Windows, the clients will be accessing the server systems from separate
computers. Typically, this is the case. Information Server includes both fat clients and
thin clients. Fat clients are those that require functionality to be installed on each Client
system. Thin clients do not require this. They provide a client interface to functionality that
is fully installed on the Server system.
In this diagram, the Repository, Services, and Engine tiers are all placed on one computer.
As mentioned earlier, this is just one possible deployment. For example, commonly, the
Engine tier is separated from the Repository and Services tiers.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Thin clients include the Information Server Web Console, Business Glossary, and
Metadata Workbench. These are clients such that no client components are installed on
the client system. Any systems that support a web browser can access these clients.
Fat clients include the Information Server Console (which provides access to Information
Services Director and Information Analyzer), Information Server Manager, Multi-Client
Manager, Information Server Command Line Interface, IBM Import Export Manager,
FastTrack, and the DataStage clients.
The Command Line Interface (istool) and Information Server Manager clients are Engine
tier clients that are discussed in a later unit.
The Import Export Manager is a tool for importing metadata from business intelligence and
modeling tools outside of Information Server into the Information Server Repository.
2-4 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Within Information Server, there are a number of different clients used for different types of
administrative purposes. The Information Server Web Console is the primary general
administrative client within Information Server. Use it for configuring security and for
session management, among other tasks.
A WebSphere Application Server instance is used to configure and manage the Information
Server user registry.
DataStage jobs can be monitored using several different clients, including the DataStage
Designer and Director clients and command line utilities. The DataStage and QualityStage
Operations Console provides a web browser interface for monitoring jobs across all engine
systems and all DataStage projects. You can also use it to monitor the use of system
resources while the jobs are runnings.
Metadata asset management is accessible several Information Server products, including
Metadata Workbench and Business Glossary. There are also a number of different tools
devoted to metadata management tasks. Information Server Manager is devoted to
DataStage metadata assets. istool is command-line driven tool for exchanging assets from
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
all Information Server products. Metadata Asset manager can be used to browse and
manage assets produced outside of Information Server, but consumed by Information
Server products.
2-6 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
As mentioned earlier, some administrative functionality exists within product clients. Within
DataStage, Information Analyzer, and FastTrack, for example, data source connections
can be created and metadata can be imported. In addition, development work within
several products is done within projects. Project configuration is generally done within
product clients.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
2-8 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
The Information Server Web Console is a thin client. No special installation components
need to be installed on a client system to access the Web Console. All that is needed is a
web browser.
Using the Web Console you can perform a number of tasks, which are discussed later in
this course, including session management, security, logging, reporting, and engine
credential mappings.
Although you can log into Business Glossary and Metadata Asset Manager directly, you
can also open these applications from within the Web Console.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Information
Server Web
Console address
Information
Server
administrator ID
Figure 2-8. Logging into the Information Server Web Console KM5021.0
Notes:
To open the Administrative Web Console, open a web browser (Internet Explorer or
Mozilla) and then enter the Web Console address.
The console address is of the form: http://machine:nnnn/ibm/iis/console.
Here machine is the host name of the machine running the Services tier, that is, running
the WebSphere Application Server instance hosting the services.
nnnn is the port address of the console. By default, it is 9080.
The initial Information Server administrator ID and password are specified during
installation. The default administration ID is isadmin. After installation, new administrator
IDs can be specified.
2-10 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Link to Metadata
Asset Manager
Reporting
Link to Business
Glossary
Notes:
The Information Server Web Console is an interface to several different administrative
functions.
The Administration tab is where you perform general IS administrative tasks, including
session management, managing users, and logging.
The Reporting tab is where IS reports can be created and managed. Reports related to
specific IS products, such as FastTrack or Metadata Workbench, can also be accessed
and managed within those clients.
The Glossary tab is the Business Glossary (BG) administrative interface where BG
administrators can create and manage terms, categories, and stewards.
The Information Services Catalog can be used to publish Information Services Director
services to the IBM WebSphere Service Registry and Repository application. This
application supports the annotation of services with information that is used to select, start,
govern, and reuse services.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
The Repository Management tool can be used to browse all physical data resources and
metadata assets in the Repository. Redundant or unnecessary metadata assets can be
managed or deleted.
2-12 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
The Information Server Web Console is an interface to several different administrative
functions.
The Administration tab is where you perform general IS administrative tasks, including
session management, managing users, and logging.
The Reporting tab is where IS reports can be created and managed. Reports related to
specific IS products, such as FastTrack or Metadata Workbench, can also be accessed
and managed within those clients.
The Glossary tab is the Business Glossary (BG) administrative interface where BG
administrators can create and manage terms, categories, and stewards.
The Information Services Catalog can be used to publish Information Services Director
services to the IBM WebSphere Service Registry and Repository application. This
application supports the annotation of services with information that is used to select, start,
govern, and reuse services.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
The Repository Management tool can be used to browse all physical data resources and
metadata assets in the Repository. Redundant or unnecessary metadata assets can be
managed or deleted.
2-14 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Metadata Asset Manager is discussed in detail in a later unit. It has three main categories
of functionality. With Metadata Asset Manager (IMAM) you can import business intelligence
(BI) and physical data resource metadata (PDR) into the Information Server Repository.
These types of metadata are consumed by Information Server products. You can also
search and browse these types of metadata within the Repository.
Only a subset of the metadata stored within the Repository is visible within IMAM. To view
all the metadata, log into Metadata Workbench.
You can also manage metadata assets using IMAM. You can delete assets as well as
import assets. And you can search for duplicate or orphaned assets.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Search metadata
assets
Browse metadata
assets
Manage Repository
assets
Notes:
This graphic shows the Repository Management tab in IMAM. Here you can browse and
search through the categories of PDR and BI metadata stored in the Repository. Notice the
categories of metadata assets you can browse listed in the Browse Assets folder.
At the bottom of the Navigation panel, you can search and manage duplicate metadata
assets and disconnected metadata assets.
2-16 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Like the Information Server Web Console, the WebSphere Application Server (WAS)
console is a thin client. You log into the client using a web browser. Enter the following
address: http://servername:9060/ibm/console. Here, replace servername by the name
of the system where the WAS is installed. This is also known as the services system
because the WAS provides the services to the Information Server products and
components.
A WAS instance may host multiple server instances. The server instance that provides the
services for Information Server is called the Metadata Server component of Information
Server and it is named, by default, server1.
By default the WAS administrator user ID is wasadmin. It is important not to confuse the
WAS administrator with the Information Server administrator, which by default it isadmin.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
WAS servers
Applications servers
IS server instance
Notes:
This graphic shows the main window of the Console. The Servers folder lists the servers
hosted by this WAS instance. In this example, only one server named server1 is hosted.
This is the Metadata Server component of Information Server, which provides the services
to Information Server products.
2-18 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Product Clients
Notes:
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Engine clients
DataStage / QualityStage clients
Administrator client
DataStage / QualityStage administration
Configure DataStage development environment
Configure Engine runtime environment
Designer client
Build DataStage jobs
Run DataStage jobs
Monitor DataStage jobs as they run
Director client
Run and monitor DataStage ETL jobs
Operations Console
Monitor DataStage jobs as they run
Multi-Client Manager
Switch between different DataStage client versions
Notes:
The Information Server Engine system refers to a computer system where DataStage is
installed. It is called the Engine because this is the system where jobs are run that perform
various Information Server tasks. Within an Information Server domain there can be
multiple engine systems.
DataStage actually has two engines: the parallel engine and the server engine. These refer
to two types of DataStage jobs that can be run: parallel jobs and server jobs. When the
word engine is used without qualification, it refers to the parallel engine.
Engine clients refers to the DataStage product clients (Designer, Administrator, Director) as
well as the clients for other products and components associated with DataStage. The
Operations Console is a client used to monitor running DataStage jobs. This client is
discussed in a later unit. The Multi-Client Manager is a client used to switch between
different versions of DataStage.
2-20 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Multi-Client Manager
The Multi-Client Manager allows multiple versions of
DataStage/QualityStage clients to exist on a single Client system.
Only one set/version of clients can be active at any one time.
Multi-Client Manager allows developers to switch between different
client versions
The IS installation wizard detects previous client versions and registers
them with Multi-Client Manager
Multiple versions
would be listed if
they existed. Here
only 9.1 is installed.
Notes:
The Multi-Client Manager allows multiple versions of InfoSphere DataStage and
QualityStage clients (Designer, Director, and Administrator) to exist on a single Client
system. Only one set and version of clients can be active at any one time.
Multi-Client Manager is needed when the same computer system is being used to connect
to two different versions of DataStage. Different versions of DataStage require different
versions of the clients. You cannot, for example, connect a DataStage Designer v8.2 to a
v9.1 DataStage server.
If the Multi-Client Manager is already installed, the installation wizard detects and registers
the new versions of DataStage clients when they are installed.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
DataStage developers work with projects. A project stores the objects, such as DataStage
jobs, that the developers build. Multiple DataStage developers can work within the same
project. In order to work within a particular project a user must be authorized. As will be
discussed later, authorization is provided partially within the Information Server Web
Console and partially within the DataStage Administrator client.
The development and runtime environments for a particular DataStage project is specified
within the DataStage Administrator client. In addition, there is a set of environment
variables, configured within the Administrator client, that set the project environment.
These include variables that specify database libraries that DataStage jobs will access
(LD_LIBRARY_PATH) and variables that determine how much information is logged
during a DataStage job run (for example, APT_DUMP_SCORE).
2-22 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Host name of
services system
DataStage
administrator ID
and password
Name of DataStage
server system
Notes:
This graphic shows the log in screen for DataStage/QualityStage Administrator client.
In the Host name of the services tier type the name of the system that hosts the services.
This is the system where the WAS instance is installed.
In the User name and Password boxes type the user name and password with DataStage
Administrator role authorization and with DataStage credentials.
Multiple DataStage Servers can exist either on the same or on different systems. In the
Host name of the Information Server engine box, you select the server system that has
the DataStage projects you want to work with.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Add / Delete
projects
Notes:
This graphic shows the Projects tab in the Administrator client. It lists all
DataStage/QualityStage projects. Click the Properties button to configure the properties
and environment for the project.
You can also add and delete projects from this window.
2-24 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Enable / Disable
Runtime Column
Propagation (RCP)
Environment
variable settings
Notes:
This graphic displays the Project Properties window for the project selected on the
Projects tab. When it opens you are placed on the General tab.
Runtime Column Propagation (RCP) allows data to flow through DataStage job stages
without being explicitly mapped from input columns to output columns. This is a very
powerful feature which can be used to simplify development and to create flexible
components and jobs. Unless it is carefully managed, however, it can lead to unexpected
errors. It is recommended that, if it is enabled, it is not specified as the default setting for
new Parallel jobs. This is the setting shown in the graphic.
The General tab also provides access to the environment variables. Click the
Environment button to display the environment variables settings.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Click the Environment button on the General tab to specify environment variables. There
are several folders of environment variables. The variables listed under the Parallel branch
apply to Parallel jobs.
You can also specify your own environment variables under the User Defined branch.
These variables can be passed to jobs through their job parameters to provide project level
job defaults.
2-26 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Permissions tab
Assigned role
DataStage users
Add a user
Notes:
The Permissions tab lists IS users and groups that have a DataStage Administrator role
and users and groups that have a DataStage User role and have been added by a
DataStage Administrator.
When Suite users or groups that have a DataStage Administrator role are added, they are
automatically entered here and assigned the role of DataStage Administrator.
Suite users or groups that have a DataStage User role need to be manually added. To
accomplish this, click the Add User or Group button. Then you need to select the
DataStage user role (Operator, Super Operator, Developer, Production Manager) that this
user ID is to have.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Parallel tab
OSH visibility
Format defaults
Notes:
This graphic shows the Parallel tab. Here you can enable OSH visibility (recommended in
most cases on development platforms) and you can specify standard data type formats for
date, time, and timestamp strings.
2-28 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Restart
Logging
Notes:
This graphic shows the Sequence tab. Here you can specify defaults for job sequences.
Job sequences are DataStage jobs that control batches of other DataStage jobs. You can
use them to run a batch of DataStage jobs (including parallel jobs, server jobs, and other
job sequences) in a particular order and with specified triggers.
A major feature of job sequences is that they are restartable. This means that if a job aborts
after a number of other jobs have successfully run, the job sequence can be restarted
where it left off, with the aborted job. This and other options can be turned on by default.
Regardless of the settings specified here, they can be overridden at the job sequence level.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Auto-purge
Notes:
This graphic shows the Logs tab. Here you can specify defaults for the Director job logs
including purging defaults. Job log messages are stored in Repository. Each time a job is
run, it generates many messages that are stored in the Repository until they are purged.
Here, you can specify purging defaults.
You can also specify filtering defaults for operational repository logging. Operational
logging messages are written to the operational respository, which contains messages
that are available to other Information Server products such as the DataStage and
QualityStage Operations Console. Information Server administrators using the Operations
Console are less interested in the informational and warning messages that are written to
the job log, which DataStage developers are probably more interested in. This optional
allows a number of these informational and warning messages to be filtered out of the
operational repository.
2-30 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
In addition to the administrative tasks performed in the DataStage Administrator client,
there are also administrative tasks that can only be performed in the DataStage Designer
client. These tasks, which will be discussed in more detail in later units, include managing
data sets, managing configuration files, and backing up DataStage objects.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Host name of
services (WAS)
system
DataStage
user ID
Name of DataStage
server system followed
by name of the
DataStage project
Copyright IBM Corporation 2007, 2012
Notes:
Logging into Designer is like logging into Administrator, except that in Designer you are
logging into a specific DataStage project. You select this project in the Project list. Multiple
DataStage servers can exist either on the same or on different systems. The name of the
project is preceded by the name of the DataStage server that hosts it.
The user ID entered here requires a DataStage Administrator or DataStage Developer role.
These roles are discussed in a later unit.
2-32 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Parallel
canvas
Palette
Notes:
The appearance of the Designer work space is configurable. The graphic shown here is
only one example of how you might arrange the GUI components.
In the right center is the Designer canvas, where you create stages and links. On the top
left is the Repository window. Items in the Repository, such as jobs and table definitions
can be dragged to the canvas area. On the bottom left is the Palette, which contains
stages you can add to the canvas.
Shown on the canvas is an example of a DataStage ETL (Extraction Transformation Load)
job. The stages are functional components of the job. The links are like pipes through
which data flows. This job reads a sequential file, transforms the data, then writes it to DB2
tables using the DB2 Connector stage.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
A job can be run from Designer or Director. When it is run from Director, it displays runtime
statistics on the diagram as it runs.
When a job runs, it generates messages that are written to the job log. In both Designer
and Director, a window can be opened to view the job log messages. In Designer, click
View>Job Log to view the messages written by the job opened on the canvas.
2-34 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
When a job runs it collects statistical information. These statistics show up in the job log
and also on the Designer client diagram, if it is open.
In this graphic, a job open on the Designer canvas is running. For each link, through which
data is flowing, row throughput (rows/sec) is provided.
The links also turn colors as the job runs. They turn blue when data begins flowing through.
The turn green when all the rows have been successfully processed through the link. They
turn red if errors occur during the processing of the rows.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Click Tools>Run Director to move from the Designer client to the Director client. This
graphic shows the Director Status View window. Here you see the status of the jobs in the
project: Compiled, Not Compiled, Running, Aborted.
2-36 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Messages.
Double-click to
open
Copyright IBM Corporation 2007, 2012
Notes:
Click the Log button in the toolbar to view the job log for a job selected in the Status View.
The job log records events that occur during the execution of a job.
These events include control events, such as the starting, finishing, and aborting of a job;
informational messages; warning messages; error messages; and program-generated
messages.
You can also open a window in Designer to view these messages for an open job, without
having to open the job in Director.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The DataStage and QualityStage Operations Console is a thin client used to monitor
running DataStage jobs. Like with the monitoring functionality in DataStage Designer and
Director, you can view the job log messages as a job runs. In addition, you can monitor the
resource usage as the jobs are running.
The Operations Console also displays information about the DataStage environment,
including environment variable settings and project objects.
2-38 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Operations Console
Job activity
Engine status
System
resources
Notes:
In this graphic, you see the Dashboard tab of the Operations Console. The Operations
Console opens to the Dashboard tab, which contains three sections of information. The
Job Activity section shows which jobs are currently running and their statuses within a
time range, for example, last 10 minutes.
The Operating System Resources section displays the CPU usage and free memory that
is currently available within a time range.
The Engine Status section displays the current status of engine services, including the
Operational Console services and WLM (Workload Management).
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
FastTrack
Fat client
Logon procedure same as for other fat clients
Used to create mapping specifications
Defines mappings, filters, and transformations between source and
target columns
DataStage jobs can be generated from mapping specifications
Administrative tasks
Define source connections
Import metadata of mapping specification sources and targets
FastTrack projects configuration
Notes:
Logging into FastTrack is similar to logging into other fat clients. You specify the services
system as the port used to communicate with it, and you specify a user ID and password
with FastTrack credentials.
FastTrack is a product designed to work with DataStage. With FastTrack you can create
mapping specifications that document the mappings and transformations of a DataStage
job. This mapping specification can be used to document a DataStage job, as well as to
provide a DataStage developer with specifications for building it.
From mapping specifications, prototype DataStage jobs can be generated, which
implement the mappings and transformations specified in the mapping specification.
2-40 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Existing
Connection New Connection
Import Metadata
Notes:
One administrative task you may be called on to perform with respect to FastTrack is to
define data resource connections to database tables. These database table definitions are
stored in the Information Server Repository, to be used by FastTrack as well as other
Information Server products, such as Information Analyzer.
After a connection has been defined, developers can import metadata for selected
schemas and tables, to be used in their mapping specifications.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Business Glossary
Thin client
URL: http://server:9080/bg
Also accessible from the Information Server Web Console
Create and manage business metadata assets, including:
Terms
A word or phrase that describes a metadata asset in business terms
Stewards
A user or group of users assigned responsibility for a metadata asset
Categories
A specified folder-type object to organize your Glossary content
Link terms and stewards to Repository assets
Notes:
Business Glossary supports metadata management from the business users point of view.
With Business Glossary, developers can create a glossary of business terms that
document and explain Information Server assets. These terms can be linked to the assets,
so they are accessible to developers working with the assets.
Stewards can be assigned to specific metadata assets. A steward may be a subject matter
expert with respect to the specific asset, one who can be contacted by others for
information about the asset.
2-42 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Business Glossary
Browse
business terms Assign terms,
and categories labels, stewards to
assets
Create
business terms
and categories
Copyright IBM Corporation 2007, 2012
Notes:
This graphic shows the Business Glossary tab where a developer can create and
manage terms and categories, and create and manage data stewards.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Metadata Workbench is another thin client. It is the primary tool within Information Server
for viewing, monitoring, and analyzing the metadata assets stored in the Information Server
Repository.
With Metadata Workbench you can not only browse and query metadata assets, but you
can view diagrams that document relationships and dependencies between them, and you
can view the flow of data through a set of metadata assets.
2-44 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Engine
asset
DataStage
project
Notes:
On the Browse tab you can browse different types of metadata assets. Shown here is an
Engine asset, which includes DataStage project assets.
On the Discover tab you can search and query metadata assets.
On the Advanced tab you can perform MWB administrative functions. For example, you
can run the Automated Metadata Services which detects and retrieves for analysis
relationships between IS metadata assets.
On the Advanced tab you can also view the Metadata model, which lists and describes all
metadata assets.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Model View
Host asset
details
Notes:
This graphic shows the Advanced>Model View tab. Here you can browse the metadata
model used for defining and organizing Information Server metadata assets. This model
documents the meaning of the different assets stored within the Information Server
Repository. This model is discussed in more detail in a later unit.
2-46 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
The Information Server Console provides access to two different Information Server
products: Information Analyzer (IA) and Information Services Director (ISD). (Information
Services Director is also known as WISD, because it used to be a WebSphere product.)
Information Analyzer is used to analyze data in order to determine its quality and formats. It
might be used to analyze the data sourced by DataStage jobs, and it might be used to
analyze the data loaded into a data warehouse by DataStage jobs.
Information Services Director is used to wrap DataStage and QualityStage ISD jobs and
other function components into services that can be delivered to consumers.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
This graphic shows the log in screen of the Information Server Console. Here, you specify
the host name of the services tier and a user ID and password for logging into Information
Analyzer or Information Services Director. Although the Information Server Console is used
to access both products, there are separate user authentication roles for each product.
Once you are in the Console, you can open a project specific to either Information Analyzer
or Information Services Director.
2-48 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Configure data
source
Create a project
Notes:
This graphic shows the Home tab of the Information Server Console.
Click the Home menu for access to configuration tasks. Here you can create and edit
projects. Here, the project you create or open can be either an Information Services project
or an Information Analyzer project.
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Import
metadata
Define data stores
Notes:
This graphic shows the Information Server Console Configuration menu. This is the
menu, an administrator would use to configure Information Analyzer data sources and
connections. A later unit discusses this configuration in detail.
2-50 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Checkpoint questions
1. How would you distinguish a thin client from a thick client?
2. Name two Information Server thick clients?
3. What role does WebSphere Application Server (WAS) play in
Information Server?
Notes:
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Exercises Unit 02
In this lab exercise, you will:
Log into and explore the Information
Server Web Console Administration
and Reporting tabs
Log into and explore the Metadata
Asset Manager thin client
Log into and explore the WebSphere
Application Server (WAS) Integrated
Solutions Console
Log into and explore the Information
Server Console
Log into and explore DataStage
client functionality
Log into and explore the DataStage
and QualityStage Operations
Console
Log into and explore the FastTrack
client
Log into and explore Metadata
Workbench
Copyright IBM Corporation 2007, 2012
Notes:
2-52 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Unit summary
Having completed this unit, you should be able to:
Log in and explore Information Server dedicated administrative
clients, including:
Information Server Web Console
WebSphere Application Server (WAS) console
Metadata Asset Manager
Log in and explore Information Server hosted product clients,
including:
Console for IBM Information Server
DataStage clients
FastTrack
Business Glossary
Metadata Workbench
Notes:
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
2-54 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit objectives
After completing this unit, you should be able to:
Configure the authentication registry
Create Information Server users
Configure Suite Users and Groups
Configure DataStage credentials for Engine users
Notes:
3-2 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
A user registry stores user account information. This includes IDs and passwords as well
as user attributes, such as email addresses. A default user registry is created during
Information Server installation. After installation, it can be configured in the WebSphere
Application Server (WAS) Console.
3-4 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Information Server uses WAS for authentication and security. Three types of user registries
are supported. One supported registry is the Information Server internal registry, which is
created and configured by default during Information Server installation. This is the least
complex type of user registry, and is suitable for small-scale installations.
After installation, Information Server can be configured to use either an operating system
(OS) user registry or an LDAP user registry. Even when these alternative registries are
used, user attributes are still stored in the Information Server Repository. The LDAP user
registry is the most powerful, with features such as enforceable password policies.
However, it is also the most complex to configure.
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
This graphic depicts the architecture when the internal user registry option (the default) is
chosen.
This graphic assumes that Repository and Services (WAS) tiers are both on the same
computer. The top graphic represents a client system, which interacts with the Information
Server Directory service when a user logs into an Information Server product through its
client. The user IDs and passwords, and the user roles they possess, are all stored in the
Repository, along with other user attributes. The Directory service checks the login
information with the information stored in the Repository.
3-6 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
This graphic shows the architecture when the operating system user registry option is
chosen.
This graphic assumes that Repository and Services (WAS) tiers are both on the same
computer. The top graphic represents a client system, which interacts with the Information
Server Directory service when a user logs into an Information Server product through its
client. The user IDs and passwords, and the user roles they possess, are all stored in the
local operating system user registry. The other user attributes are stored in Repository. The
Directory service checks the login information through the WAS, which checks the
information stored in the operating system registry.
Information about the other user attributes is still retrieved directly from the Repository by
the Directory Service.
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
This graphic shows the architecture when the LDAP option is chosen.
This graphic assumes that Repository and Services (WAS) tiers are both on the same
computer. The top graphic represents a client system, which interacts with the Information
Server Directory service when a user logs into an Information Server product through its
client. The user IDs and passwords, and the user roles they possess, are all stored in the in
the external LDAP user registry. The other user attributes are still stored in Repository. The
Directory service checks the login information through the WAS, which checks the
information stored in the LDAP registry.
Information about the other user attributes is still retrieved directly from the Repository by
the Directory Service.
3-8 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Choose OS or
LDAP user registry
Configure OS or
LDAP user registry
Copyright IBM Corporation 2007, 2012
Notes:
This graphic depicts how the user registry is selected in WAS. After you log into WAS, click
Security>Global security. The Current realm definitions box identifies the type of user
registry that has been selected. By default, after Information Server installation, the
selection is Standalone custom registry. This is configured as an Information Server
internal user registry.
After installation, the user registry type can be changed. Select the type of user registry,
and then click Configure to configure it.
See the Information Server Administration Guide for more details. The Administration
Guide will point you to the relevant information for configuring WAS.
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Using internal
user registry
Copyright IBM Corporation 2007, 2012
Notes:
You can determine the current user registry type from within the Information Server Web
Console on the Administration>Domain Management>User Registry Configure panel.
The type of user registry currently in effect is indicated. (Note that this panel is read-only.)
In particular, it identifies whether the user registry is an Information Server internal user
registry, accessed through the Information Server Directory Service, or whether it is a user
registry the Directory Service connects to through WAS.
In this example, Information Server is configured to use its internal user registry.
3-10 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
During installation, Information Server is configured to use its own internal registry. After
installation, this can be changed to a local OS user registry. It is recommended that you do
this as soon as possible after installation to avoid issues concerning IDs created after
installation, but before the switch.
As noted, this configuration change is done in WAS. After the configuration changes are
made in the WAS, WAS needs to be restarted for the change to take effect.
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
WAS registry
administrator
WAS registry
administrator
Notes:
This graphic indicates the central properties that need to be edited, if you are configuring a
local operating system user registry in WAS.
3-12 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Set as current
New registry
configuration
Copyright IBM Corporation 2007, 2012
Notes:
After specifying the properties you need to select the new registry configuration and then
click the Set as current button.
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
During installation, Information Server is configured to use its own internal registry. After
installation, this can be changed to an LDAP user registry. It is recommended that you do
this as soon as possible after installation to avoid issues concerning IDs created after
installation, but before the switch.
As noted, this configuration change is done in WAS. After the configuration changes are
made in the WAS, WAS needs to be restarted for the change to take effect.
3-14 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Administrative ID
Notes:
This graphic highlights the central properties that need to be specified if you are configuring
an LDAP user registry.
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Set as current
New registry
configuration
Copyright IBM Corporation 2007, 2012
Notes:
After specifying the properties you need to select the new registry configuration and then
click the Set as current button.
3-16 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Things are more complicated if you switch user registries after the initial registry has been
in use for some time. The problem is with users and groups that were created in the initial
internal registry. These users must be removed before changing to a new user registry.
You can use the DirectoryAdmin.sh -delete command to delete existing users and
groups. It will be necessary to recreate these users and groups in the new registry.
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
3-18 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
The Information Server engine (also known as the DataStage engine) performs user
authentication separately from other Information Server components. This has to do with
the fact that prior to Information Server v8.0, DataStage was a stand-alone product that
used the local OS user registry on the computer where it was installed. It continues to use
this in Information Server.
If the Engine user registry is different from the Information Server user registry, as it will be
in most cases if the Information Server user registry is not the OS user registry, then user
credentials must be mapped between them.
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Shared OS
User attributes stored user registry
in internal user
registry
Copyright IBM Corporation 2007, 2012
Notes:
The Engine user registry can be the same as the IS user registry if they share an operating
system user registry. This graphic depicts that situation. The top graphic depicts a client
system. The lower system depicts the services tier. It is assumed in the graphic that the
engine and repository tiers are also installed on the same system.
When a user logs into DataStage, the Directory Service through the WAS checks the name
within the operating system user registry. If it finds the name and password, it passes the
user ID and password to DataStage, which then attempts to authenticate it. It will
authenticate it, since the user ID is in the operation system registry that DataStage uses.
3-20 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Shared LDAP
User attributes stored user registry
in internal user
registry
Copyright IBM Corporation 2007, 2012
Notes:
The Engine user registry can be the same as the IS user registry if they share the same
LDAP user registry. This graphic depicts that situation. The top graphic depicts a client
system. The lower system depicts the services tier. It is assumed in the graphic that the
engine and repository tiers are also installed on the same system.
When a user logs into DataStage, the Directory Service through the WAS checks the name
within the LDAP user registry. If it finds the name and password, it passes the user ID and
password to DataStage, which then attempts to authenticate it. It will authenticate it, since
the user ID is in the LDAP registry that DataStage is using.
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Share user
registry
Notes:
This graphic depicts how to configure Information Server so that the registry is shared
between Information Server and DataStage. If there is more than one engine on different
systems or on the same system, then this needs to be done for each one.
If the Share User Registry between InfoSphere Information Server and its engine box
is checked, it tells Information Server that the user directory it is configured to use is the
same as the user directory DataStage is configured to use. By default, DataStage is
configured to use the operating system user registry on the system on which it is installed,
but DataStage can be configured to use an LDAP user registry.
3-22 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Credential mappings
Credential mappings must be created when IS and the IS
Engine do not share the same user registry
This is necessary when IS uses the internal user registry, because the
Engine cannot use this registry
Credential mappings are stored with the internal user registry
in the Repository
Mappings can be either from one Information Server user to
one operating system user, or all Information Server users can
be mapped to the same, default operating system user
If the user registry is shared, Information Server must be
configured through the IS Web Console to indicate this
Click Domain Management>Engine Credentials
Select the Share User Registry option
Notes:
If Information Server and DataStage do not share the same user registry, then mappings
must be created between Information Server user IDs, having DataStage Administration or
DataStage User roles, and user IDs that exist locally in the operating system registry where
DataStage is installed.
Assume that DataStage is using the operating system user registry. A credential mapping
consists of mapping an Information Server user ID (and password), who has a DataStage
User or Administrator role attached to it, to an operating system user ID (and password).
Alternatively, a single operating system user ID and password can be specified as the
default operating system user ID that all Information Server user IDs are mapped to.
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Engine OS
IS user registry user registry
Copyright IBM Corporation 2007, 2012
Notes:
This diagram depicts credential mappings between the Information Server user registry
and the DataStage user registry, here assumed to be the operating system user registry.
Here the Information Server Repository and the Engine are on the same computer, but this
is not required.
The credential mappings are stored in the Information Server Repository. When a user logs
into DataStage, the Directory Service checks the name within the internal user registry. If it
finds the name and password, it locates the user ID and password it is mapped to, and then
it passes that user ID and password to DataStage, which then attempts to authenticate it. It
will authenticate it, since the user ID is in the operation system registry that DataStage
uses.
3-24 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
There are three types of roles used to control access to Information Server products and
components. Suite roles control access to suite-level clients such as the Information Server
Web Console. Suite Component roles control access to specific Information Server
products. In addition, some products have additional roles, defined within the product, for
controlling access to its objects.
Roles can be assigned to individual users are to groups of users. Roles assigned to a
group are inherited by all users who are members of the group.
3-26 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Suite roles
Suite Administrator: Maximum privileges
Suite User: Minimum requirement to access any IS suite or
product client
Common Metadata Administrator
Full functionality within Metadata Asset Manager to browse and
manage metadata assets
Common Metadata Importer
Log into Metadata Asset Manager to impor metadata assets
Common Metadata User
Log into Metadata Asset Manager to browse metadata assets
Notes:
There are four different types of Suite roles. Three of the roles apply to Metadata Asset
Manager product. These are discussed in a later unit.
There are two standard Suite roles: Suite Administrator, Suite User. A Suite Administrator
can log into the Information Server Web Console and perform any task, including creating
user IDs. A Suite User has limited authority within the Information Server Web Console. A
Suite User can, for instance, log into the Web Console and view reports, but cannot create
user IDs.
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
For each product there is a Suite Component Administrator role and a Suite Component
User role. Some products have additional specialized roles. The nature of these roles
differs depending on the product.
For example, with respect to DataStage a user can be an Administrator or a User. An
Administrator has full authorization, including the ability to specify user project roles. A
Users authorizations are limited to those assigned by a DataStage Administrator.
3-28 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Security roles can be applied to users or groups. Users in the group inherit the roles
defined for the group.
When creating a user or group, the primary tasks are to specify the name of the user and
group and other attributes, and to specify the Suite and Suite Component roles that apply
to the user or group. Users are also given a password.
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
New Group
Groups
Notes:
This graphic shows how to create a new group in the IS Web Console Administration tab.
First click on Users and Groups>Groups on the Administration tab. Then click New
Group. This opens the window where you specify the group attributes, shown on the next
page.
3-30 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Suite roles
Suite
Component
User ID and roles
other
attributes
Browse for
users to add to
the Group
Copyright IBM Corporation 2007, 2012
Notes:
This graphic shows the page where you specify the attributes of a group. Required
attributes include the group ID and Name. In the Roles panel, select the Suite roles for the
group in the top panel, and select the Suite Component roles for the group in the bottom
panel.
In this example, the group ID is DEV. Two Suite roles have been chosen for the group
(Suite User, Common Metadata Administrator), and one Component role has been
chosen for the group (DataStage and QualityStage User).
Click the Browse button to add users to the group. These users must already been
defined.
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
New User
Users
Notes:
This graphic shows how to create a new user in the IS Web Console Administration tab.
First click on Users and Groups>Users on the Administration tab. Then click New User.
This opens the window where you specify the group attributes, shown on the next page.
3-32 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Member of
User DEV Group
attributes
Add to a
Group
Notes:
This graphic shows the page where you specify the attributes of a user. Required attributes
include the User Name and Password. In the Roles panel, select the Suite roles for the
user in the top panel, and select the Suite Component roles for the user in the bottom
panel.
In this example, the user name is dev1. One Suite role has been chosen for the user (Suite
User).
Click the Browse button to add the user to one or more groups. These groups must
already been defined. Additional Suite and Suite Component roles will be acquired by the
users membership in these groups.
In this example, the user acquires the roles possessed by the DEV group.
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Credential Mappings
Notes:
3-34 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Engine
Open Configuration
Engine credentials
Copyright IBM Corporation 2007, 2012
Notes:
Credential mappings are specified in the Information Server Web Console in the Domain
Management>Engine Credentials folder on the Administration tab.
Begin by selecting the engine. In this example, there is only one engine to select, but
multiple engines are possible in a domain. Then click Open Configuration to open the
Engine Credentials window, shown on the next page.
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Engine user
registry user
Copyright IBM Corporation 2007, 2012
Notes:
A default credential mapping can be specified in the Default Credentials panel,
highlighted in the graphic. Here you specify an operating system user name and password
on the engine system.
This mapping will be applied to DataStage users that have not been given any explicit,
specific mapping. If you leave this blank, then every DataStage user must be explicitly
mapped to an engine system user.
3-36 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Engine
Notes:
This graphic shows how to map an individual DataStage user to an engine operating
system user ID. After selecting the engine, click Open User Credentials. This opens the
Map User Credential window, shown on the next page.
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Specify engine
system user ID
Browse for IS here
user ID
Notes:
First click Browse to retrieve the DataStage user ID. Then specify the engine system user
ID and password it is to be mapped to. You must include both the engine system ID and its
associated password. Note that if the engine system ID password changes, the mapping
will no longer work and will have to be updated.
After you specify the engine system user, click Apply to complete the mapping.
In this example, dev1 has been mapped to dsadm. Here, dev1 is a user with DataStage
authorization. dsadm is a user on the engine system.
3-38 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Checkpoint
1. What client is used to specify DataStage credential
mappings?
2. What two types of authentication roles can be assigned to a
user or group?
3. What client is used to configure the IS user registry?
4. What three types of user registries are supported?
Notes:
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Exercises Unit 03
In this lab exercise, you will:
View the User Registry configuration in
the Information Server Web Console
View WAS user registry configuration
Create Information Server users
Review and create DataStage
credentials
Notes:
3-40 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Unit summary
Having completed this unit, you should be able to:
Configure the authentication registry
Create Information Server users
Configure Suite Users and Groups
Configure DataStage credentials for Engine users
Notes:
Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
3-42 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Copyright IBM Corp. 2007, 2012 Unit 4. Stopping and Starting Information Server 4-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit objectives
After completing this unit, you should be able to:
Stop Information Server
Start Information Server
Check for running Information Server processes
Notes:
4-2 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Starting or stopping Information server involves starting or stopping many individual
Information Server components. These components need to be started or stopped in the
right order. First stop the Engine services. Then stop the domain, WAS services. At that
point, Information Server will be stopped. You can then, if you choose, stop the Information
Server supporting databases and database systems, including XMETA and IADB.
When you start Information Server, reverse the process. The supporting database systems
and databases must be running before you attempt to start the WAS Metadata Server.
Copyright IBM Corp. 2007, 2012 Unit 4. Stopping and Starting Information Server 4-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Before you stop DataStage, you may want to check that no one is using it. There are a
number of commands you can use to determine whether DataStage processes are
running.
The ps ef command displays process statuses. The grep command searches for a
pattern in the output from the grep command. Processes labeled phantom, dsapi, and
dscs are DataStage-related processes that indicated either that DataStage jobs are
running or that DataStage users are logged into DataStage.
The netstat a | grep dsrpc command displays DataStage network connections.
4-4 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
No jobs running
Job running
DataStage
client
connection
Copyright IBM Corporation 2007, 2012
Notes:
This graphic shows some example output from using the commands discussed previously.
We see output from the commands when DataStage jobs are running, DataStage clients
are running, and client connections are established.
In this example, the ps ef | grep dscs command is ran twice. The first time it is run, no
output other than the root process of running the command is displayed, indicating that no
DataStage jobs are running. The second time it is run, a dscs process owned by dsadm is
displayed. This indicates that DataStage jobs are running.
Towards the bottom, the netstat -a | grep dsrpc command is run. The output indicates that
a DataStage client connection is established.
Copyright IBM Corp. 2007, 2012 Unit 4. Stopping and Starting Information Server 4-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
To stop DataStage services, first run the dsenv file to initialize the DataStage environment.
Then execute the uv -admin -stop command. The default DataStage home directory is
/InformationServer/Server/DSEngine. If you are not sure what the home directory is, the
`cat /.dshome` command will return the DataStage home directory.
4-6 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Stop Engine
Notes:
In this example, we first change to the DataStage home directory. Then we execute the
dsenv command. Then we execute the uv -admin -stop command. The command output
indicates that the DataStage job monitor service, the resource tracking service, and the
Engine are all shut down.
Afterwards, you can run the ipcs and netstat commands shown to check whether there
are any remaining memory segments or dsrpcd port activity.
Copyright IBM Corp. 2007, 2012 Unit 4. Stopping and Starting Information Server 4-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Stop Agent
Notes:
The ASB agent establishes communication between the Engine and the Services layers,
which is necessary when the layers are installed on different computer systems. To stop
the ASB agent, run the NodeAgents.sh stop script, which is in the
/InformationServer/ASBNode/bin directory.
In the graphic, we first change to the /InformationServer/ASBNode/bin directory. Then
we run the NodeAgents.sh stop. Afterwards, we check whether any ASB agent processes are still
running.
4-8 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
You can use the MetadataServer.sh stop script to stop the Metadata Server services
layer. The MetadataServer.sh script runs the WAS stopServer.sh server1 script.
In this example, we first change to the /InformationServer/ASBServer/bin directory. Then
we issue the MetadataServer.sh stop script. When you run this command, make a note of
directory containing the log files. You may want to consult log files in that directory to verify
that no errors occurred.
Afterwards, we check whether any ASB agent processes are still running using the ps -ef command.
Copyright IBM Corp. 2007, 2012 Unit 4. Stopping and Starting Information Server 4-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Starting Information Server involves starting the components in the opposite order you use
when stopping them. Before attempting to start Information Server, verify that the database
servers for XMETA and IADB are running. Then execute the Metadata Server.sh start
script. Then start the ASB agent and the DataStage Engine.
4-10 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
To start the ASB agent, first change to the /InformationServer/ASBNode/bin directory.
Then run the NodeAgents.sh start command.
This agent must be running if DataStage and WAS are installed on separate systems. The
ASB agent establishes communication between these two Information Server layers.
Copyright IBM Corp. 2007, 2012 Unit 4. Stopping and Starting Information Server 4-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
After you start the ASB agent and the DataStage Engine, change to the
/InformationServer/Server/DSEngine directory, run dsenv to initialize the DataStage
environment, then run the vu -admin -start command.
4-12 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
The uv -admin -info command can be used to check the status of the Engine. As with any
of the uv commands, first run dsenv to initialize the DataStage environment.
In this example, we first run the command. Output from the command indicates that it is
running, and NLS is active.
Notice the reference to the DataStage startup script. This script can be modified, in order to
start additional engine services when the DataStage engine is started. As you will see later,
the Operations Console, which monitors DataStage running jobs, uses additional services.
The command that runs these services can be added to the ds.rc script to start these
services automatically.
Copyright IBM Corp. 2007, 2012 Unit 4. Stopping and Starting Information Server 4-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
ASBNode agent is
listening
Notes:
In the graphic, several commands are executed to verify that the engine services are
running. The netstat command is used to check whether the DataStage dsrpc service is
running. The ps -ef command is used to check whether the DataStage dsrpcd service is
running. Finally, the netstat command is used to check whether the ASBNode agent is
running.
4-14 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Checkpoint
1. Stopping IS involves stopping what?
2. What command would you use to start the DataStage
engine?
3. How do you set the DataStage environment for running this
command?
Notes:
Write your answers here:
Copyright IBM Corp. 2007, 2012 Unit 4. Stopping and Starting Information Server 4-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Exercises Unit 04
In this lab exercise, you will:
Check for running engine processes
Stop engine services
Stop the ASB agent
Stop the Metadata Server (server1)
Start the IS Metadata Server
Start the ASB agent and DataStage
engine
Check DataStage status
Notes:
4-16 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Unit summary
Having completed this unit, you should be able to:
Stop Information Server
Start Information Server
Check for running Information Server processes
Notes:
Copyright IBM Corp. 2007, 2012 Unit 4. Stopping and Starting Information Server 4-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
4-18 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Unit objectives
After completing this unit, you should be able to:
Configure and manage sessions
Configure and manage logging
Create, run, and manage reports
Describe Information Server locking
Notes:
5-2 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
3
Copyright IBM Corporation 2007, 2012
Notes:
Each user connection using an Information Server client results in the creation of a
session. A user can log into multiple clients as the same time. Each established connection
creates another session.
A session will timeout and expire if nothing happens in it for an extended period of time.
Alternatively, a session will cease if the user closes the client or if an Information Server
administrator stops it. The latter can be done in the Information Server Web Console.
Global
session
properties
Notes:
User sessions can be managed by an Information Server administrator in the Information
Server Web Console. On the Administration tab, click Session Management>Active
Sessions. The current active sessions are listed.
In this example, there are three active sessions. The Type column identifies the type of
session. The first session was established when the administrator isadmin logged into the
Web Console. The second session was established when a user logged into DataStage
Designer. The third session was established when a user logged into a thick client, such as
FastTrack or Information Analyzer.
The Address column identifies the computer name or IP address of the client system.
To open or disconnect a specific session, select the session and then click the appropriate
link in the right panel.
Click Global Session Properties to specify general session attributes.
5-4 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Session
properties
5
Copyright IBM Corporation 2007, 2012
Notes:
This graphic shows the Global Session Properties window.
Each session consumes WAS and engine resources. At some point as more and more
sessions are established performance will begin to deteriorate. You can limit this
deterioration by reducing the maximum number of sessions.
The maximum number of sessions determines how many users can log into Information
Server applications at one time. A user, other than an Information Server administrator
logging into the Web Console, will be unable to log into an Information Server client after
the maximum has been reached. Users will receive a message that they are unable to log
in because the maximum has been reached.
If too many users are bumping into the maximum, you can try reducing the inactive
session timeout period. This will free additional sessions.
Session details
Session
properties
User
attributes
6
Copyright IBM Corporation 2007, 2012
Notes:
Select a session and then click Open to view details about it and the user logged into the
session. In this example, a user named dsadm is logged into the session. Information
about that user, including the authorization roles the user possesses is displayed.
Some information about the session is also displayed, including its duration and the
number cached objects, which indicates how many resources the session is consuming.
5-6 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Disconnecting sessions
To disconnect specific sessions:
From the Active Sessions tab, select the connections you want to
disconnect
Click Disconnect
To disconnect all sessions (including your own session)
Select Disconnect All
Disconnect
all users
Disconnect
selected users
7
Copyright IBM Corporation 2007, 2012
Notes:
You can disconnect active sessions by selecting the sessions and then clicking
Disconnect. You can also disconnect all sessions by clicking Disconnect All. Note that
this will also disconnect your session in the Web Console as well as all others.
Log Management
Notes:
5-8 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Log management
Logged events are accessed through views
A view filters events based on specified criteria
You can create as many views as you want
Logs events are stored in the Repository
The Web Console provides a central place to view logs across all
Information Server components
Click Administration>Log Management
Logging components
Represent Suite components that use the logging service
For example, the DataStage logging component represents DataStage
Logging configurations
Determine which logging messages get saved into the Repository
Each Suite component can have multiple configurations
But only one can be active at a time
9
Copyright IBM Corporation 2007, 2012
Notes:
Information Server is capable of logging many different types of events, concerning many
different Information Server products and components. An Information Server administrator
can specify the types of events that are to be logged. Logged events are stored into the
Information Server Repository.
Logged events can be accessed through views. These views select a set of the logged
events in the Repository.
There are, then, two main tasks related to logging: Specifying which events are logged, and
creating views to access the stored events.
A logging component represents an Information Server component, such as DataStage, for
which events are logged. Logging configurations can be created for each logging
component. A logging configuration specifies the logging events that stored relative for this
logging component. There are be multiple configurations, but only one can be active at a
time.
Managing configurations
DataStage
logging
component
Open
DataStage
component
configurations
Notes:
Click Log Management>Logging Components to view the logging components that
exist. Select the component whose configurations you want to manage, for example,
DataStage. Then click Manage Configurations to open the configurations that are related
to DataStage.
Each logging component has a default configuration that is specified when Information
Server is installed. Alternative configurations can be created an made active.
5-10 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Default
configuration
New
configuration
Copyright IBM Corporation 2007, 2012
Notes:
You can create a new configuration from scratch by clicking New Logging Configuration.
Alternatively, you can make a copy of an existing configuration and then modify it.
In this example, a copy of the DataStage.ALL configuration was copied and then modified.
The modification consisted of reducing the types of logging events that are saved to those
having to do with running DataStage jobs.
DataStage.ALL configuration
The configuration lists categories of events
For each category of logging messages, the configuration specifies the severity level
of the messages to retain
Threshold refers to the event warning level floor
For example, Warn includes all events at the warning level and higher: Warn, Error, Fatal
Severity
level for
individual
events
Threshold
severity
level for all
events
Notes:
A configuration lists categories of events whose messages are to be stored. For each
category, a threshold severity level for the messages is specified. A thresh hold indicates a
floor. Any messages at the selected level or at a more severe level will be stored. For
example, if Warn is selected, then all messages at that level or higher will be stored,
namely, warning messages, error messages, and fatal error messages.
5-12 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Log views
Select messages based on specified criteria
Filters out a select set of the events that are captured into the Repository, based on the active
configurations
Click Administration>Log Management>Log Views
List of existing views
Click View Log to view the messages of a selected View
Click Open to display and edit the log view criteria
Click New Log View to create a new log view
Access can be shared with everyone or remain private to the view creator
Existing log
views
Copyright IBM Corporation 2007, 2012
Notes:
Logging views are created to select a set of messages from those that are stored in the
repository based on specified criteria.
The Log Views tab lists existing log views. Click View Log to view messages of the
selected view. You can also create new log views.
To view the messages, select the log view and then click View Log in the right panel.
Start
DataStage
job named
relMultInput
Environment
variable
settings the
job ran under
Notes:
This graphic shows the messages that were selected by an example log view. One
message informs us that a DataStage job has been started. Another lists the environment
variable settings for the job in effect at the time the job was started. To view the messages,
select the log view and then click View Log in the right panel.
The numbers of messages selected by a log view can be large. You can filter out the
messages you are interested in at the top of the window. Expand the Additional Filter
Criteria folder to reveal the full set of filtering conditions. A selected subset of messages
can then be viewed in a separate window.
5-14 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
When you create a log view, you give it a name. They you specify the criteria for selecting
the messages to include. These criteria include the configuration categories of messages
to include, the severity levels, and additional context information relevant to a specific
category of messages. For example, you could specify that you only want information
related to a job named relMultInput.
In addition to the specifying the criteria for the information to include, you also need to
specify the columns of information to include in the message. A given message contains
several columns of information. You choose which columns of information you are
interested in.
Shared with
all users
Retrieve
messages with
all severity levels
Categories of
messages to
view
Message info
to display
Notes:
This graphic shows an example of a newly created log view. It shows where you specify the
criteria and the information to display, as discussed on the previous pages.
At the top is the name given to the log view. In the Access box, Shared has been selected.
This means that the user who is creating this log view is willing to share it with all other
users. That is, other views can view the log using this log view.
In the Severity Levels panel, you filter the messages to view by severity level. In this
example, all severity levels are selected.
In the Categories panel, you add the categories of log messages to view. Click Browse to
add additional categories. To delete a category, select it and then click Remove.
In the Table Columns panel, you select from the log messages the columns of information
you want to view. In this example, several columns of information, including DSJob (the job
the message applies to) are selected.
5-16 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Reporting Administration
Notes:
Reporting administration
Managed on the Information Server Web Console Reporting
tab
Reports can be created about Suite component activities and
administrative functions
Report formats include: HTML, PDF, RTF, TXT, XML
Access to reports, report templates, and report results can be
restricted
Reports are organized into folders
Folders can only be created by Information Server administrators
Notes:
Information Server reporting is managed through the Information Server Web Console
Reporting tab. The Reporting tab, contains a folder of templates to build your reports, and
a set of folders you can use to store your reports. Access to reports, report templates, and
report results can be restricted.
Reports are stored and organized in folders. Folders can only be created by Information
Server administrators.
5-18 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Creating a report
Select a report template
Report templates are organized by Suite product or component
Example for Administration: List of users
Click New Report
Browse for report folder
Report settings
Name
Parameters
Vary depending on report type
Example: DataStage project, job name
Format: HTML, PDF
Settings include: Expiration, History policy
Notes:
There are a number of pre-build reports that can be run from within Information Server
products.
New reports can also be created on the Reporting tab. You begin by selecting a report
template. Information Server administrators have access to all of the report templates, but
not all templates are available to all users. Then you specify the report settings in the new
report.
When you create a report you specify the folder to store the report in. The folder must
already exist at the time you create the report.
Several output formats are supported, including: HTML, and PDF.
New report
Selected template
Notes:
In this example, the selected report template is List of users from the
Administration>Security folder of templates. After you select the template click New
Report.
Notice that there are administration report templates as well as report templates for specific
Information Server products.
5-20 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Name
Report folder
Report parameters
Report format
(html) and settings
(hidden)
Copyright IBM Corporation 2007, 2012
Notes:
In this List of users example, the Reports folder has been selected for its storage. This is
the root report folder.
Report settings are specific to the type of report being created. In this example, users with
product roles are being selected. The specific product is DataStage.
The output report format is a mandatory parameter. This parameter is not visible in the
graphic, but has been set as HTML.
Running a report
Run reports
Can schedule to run
Access control
View report results
Specify access View results
Selected report
Run report
Notes:
After a report is created it can be run or scheduled to run. The Reports>My Reporting
folder lists reports that have recently been created.
The report creator can specify who can run the report and view its results. Click Open
Access Control to specify who can view the report.
Click Run selected reports to run the reports selected in the list. Afterwards, click View
Report Result to view the report information.
5-22 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Sample report
Notes:
The graphic shows an example of a List of Users report. In this example, the users and
their user attributes are listed. The criteria by which this list of users was chosen is
described in the bottom half of the upper panel. In this case, this report selects users who
have one or more DataStage product component roles.
User
permissions Browse for
user to add
Copyright IBM Corporation 2007, 2012
Notes:
This window is displayed if you click Open Access Control on the Reports panel. In this
example, isadmin (the user who created the report) and other Suite administrators have
access to the report. There are several layers of access that can be allowed or restricted,
including the ability to read, update, delete, run, and administer the report. In this example,
only isadmin can administer the report, that is, specify access control. Other Suite
administrators can view, delete, and run it, but not administer it.
You can browse for users, groups, and roles to add to the access control list. Then you can
specify what authorizations they have. For example, you can add the DataStage and
QualityStage Developer role to the access list. They will then be able to view report
results. You can give them further authorizations as well, for example, to run reports.
5-24 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Locking overview
Locking occurs in two places within Information Server:
The Metadata Repository tier for design elements like job objects,
table definitions, mapping specifications, and so on
The Engine tier for files that will be used at job run-time
Exceptions (such as failed network connections or a user
forcefully killing a client application) can result in abandoned
locks
In most cases, if a user experiences a locking error, they
should retry their operation
It can take some time for a lock to be released
In instances where a lock is not cleared immediately,
Information Server provides mechanisms for both the
automatic and manual clearing of these locks
Notes:
Locking occurs in two tiers within Information Server. When design elements are opened in
a product, such as a DataStage job open in DataStage Designer, a lock is placed on that
object. Locks are also taken by the Engine tier by DataStage jobs when they run on
objects, such as files, they are using.
When the design object is closed or the DataStage job is finished, the locks are released.
However, sometimes the locks fail to get released. For example, sometimes when
DataStage job abort, some of their locks fail to get released.
Information Server has mechanisms for automatically and manually clearing locks. Some
of these mechanisms are discussed in the following pages.
5-26 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Most of the time, locks can be cleared by stopping and restarting the session or, more
drastically, by restarting Information Server. Some locks, however, are not tied to any
existing session and need to be cleared manually.
Notes:
The locks are stored in the XMETA database in the table XMetaLockInfo.
When an unconnected Session is left, locks can be cleared from the Information Server
Web Console. This can be done by disconnecting the relevant session using the
Administration>Session Management>Active Sessions>Disconnect option.
Alternatively there is a command line tool called cleanup_abandoned_locks in the
/IBM/InformationServer/ASBServer/bin directory that can be used to cleanup any
disconnected locks.
Restarting Information Server will also clear all locks.
There is also a session inactivity timeout specified in the Web Console. If this is set to
timeout, then locks are released when the session times out.
5-28 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
To clear a DataStage lock, you must be either a DataStage administrator or the owner of
the lock.
Ownership of locks is based on the Engine process user ID. When a client connects the
Engine tier, a new client process is started. Locks taken are associated to that process.
Notes:
Engine-held locks can be cleared in Director, if the Enable job administration in Director
option has been enabled in Administrator for the project. In Director, click Job>Cleanup
Resources. This opens the Job Resources window, which displays a list of the Engine
processes that are running and their PIDs. For a selected process, you can view the locks
taken by the process.
5-30 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Job process
Show locks by
process
Release locks
for process
Notes:
The top window displays job processes that are running. The bottom window displays locks
that have been taken by the job processes. You can select and release locks in these
windows, either directly or by logging out of the process.
To log out of a process, select the process and then click Logout. Click Release All to
release all the locks the process has taken.
Checkpoint
1. What client would you use to stop Information Server
sessions?
2. True or False? A logging view determines what logging
messages or events get saved into the Repository.
3. What procedure would you use to clear a lock tied to an
existing user session?
4. What procedure would you use to clear a "dangling" lock, not
tied to an existing user session?
Notes:
Write your answers here:
5-32 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Exercises Unit 05
In this lab exercise, you will:
Manage active sessions
Manage logging configurations
Create a log view
View the log
Create an administrative report
Clear abandoned locks
Notes:
Unit summary
Having completed this unit, you should be able to:
Configure and manage sessions
Configure and manage logging
Create, run, and manage reports
Describe Information Server locking
Notes:
5-34 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit objectives
After completing this unit, you should be able to:
Describe components in the Engine architecture
Describe DataStage job compile and run time processes
Create and modify parallel job configuration files
Use the Engine command line interface
Notes:
6-2 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Write the last record to disk and read the next record from disk before each processing
operation
Sub-optimal utilization of resources
One record is processed at a time
Processing resources sit idle during I/O
Cannot scale up to large data volumes
Notes:
Traditional batch processing consists of a distinct set of steps, defined by business
requirements. Between each step, intermediate results are written to disk.
This processing may exist outside of a database (using flat files for intermediate results) or
within a database (using SQL, stored procedures, and temporary tables).
There are several problems with this approach: Each step must complete and write its
entire result set before the next step can begin. Secondly, landing intermediate results
incurs a large performance penalty through increased I/O. In this example, a single source
incurs 7 times the I/O to process. Thirdly, with increased I/O requirements come increased
storage costs.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The traditional approach to improve performance is by manually splitting the source data,
and running multiple copies of the same steps against each portion of the source data.
While this brute force approach can work in some instances, it generally has limited
usefulness with complex business requirements, which require related records to be
processed together. This also requires an extensive pre-processing effort to partition the
files properly.
6-4 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
When developers design their jobs by dragging stages (functional components) onto the
DataStage Designer canvas, they specify the data flow in sequential, non-parallel terms.
The parallelism that DataStage implements is not explicitly specified by the developer, but
is implemented by DataStage during the compile and runtime process.
DataStage employs a data flow model for application design, where data flows in memory
between sources, intermediate transformations, and targets without landing to disk.
Between operators uses special in-memory structures called data sets to pass data
between operators. These are similar in structure to physical data sets that can be created
and accessed using the Data Set stage.
This model works in both batch and real-time, service-oriented implementations.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Data pipelining
Run each operator in parallel, passing data records from one operator
to the next
Transform, Enrich, and Load operators run simultaneously
Eliminates intermediate staging to disk
Utilizes all available processors busy
But pipelining alone still limits overall scalability
Copyright IBM Corporation 2007, 2012
Notes:
Data pipelining is the first step toward efficient parallel processing. Instead of waiting for all
rows to be processed by the previous step, records pass from step-to-step in memory just
like a conveyor belt in a factory assembly line moves physical products being built.
All parallel jobs developed with DataStage use data pipelining. It is a core feature of the
parallel framework and is always enabled.
Pipeline parallelism alone is not enough. There is a limit to the number of rows in that can
be in the pipeline, being processed, no matter how many resources (CPU processors,
memory) are available.
6-6 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Partition parallelism
Divide the incoming stream of
data into subsets to be separately
processed
Subsets are called partitions
Each partition of data is
processed by the same operation
If operation is Transform, each
partition will be transformed in
exactly the same way
Facilitates near-linear scalability
8 times faster on 8 processors
24 times faster on 24 processors
This assumes the data is evenly
distributed
Notes:
Partition parallelism, unlike pipeline parallelism, can scale up to take advantage of all
available resources (CPU processors, memory). And it facilitates near-linear scalability. If 8
processors are available, the job can run approximately 8 times faster than with 1
processor.
Partitioning breaks a data set into smaller sets that are each processed separately, in
parallel. This is a key to scalability. However, the data needs to be evenly distributed across
the partitions; otherwise, the benefits of partitioning are reduced.
It is important to note that what is done to each partition of data is the same. How the data
is processed or transformed is the same.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Figure 6-7. Parallel engine combines partition and pipeline parallelism KM5021.0
Notes:
DataStage combines data pipelining and partition parallelism to scale across all available
resources without landing intermediate results to disk. Within the parallel framework,
pipelining and partitioning are always on.
Data can also be re-partitioned from stage-to-stage, distributing data as required by the
business requirements, without landing to disk. This would be impossible in traditional
hand-coded approaches to parallel processing.
6-8 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Stage Stage
Stage
running in running
running in
Parallel Sequentially
Parallel
Notes:
Within a parallel job, one of two operations is performed before each stage/operator:
Partitioning, or collecting. Partitioners divide data into subsets which are processed
separately, in parallel; collectors merge parallel data streams back into a single stream.
This might be required, for example, when landing data to disk in a single file or when
performing operations that must be performed sequentially, for example a global count of
all the data.
The left graphic shows how partitioning works. A single stream is distributed into multiple
streams. Different algorithms can be used to perform the distribution. The right graphic
shows how collecting works. Multiple streams are collected into a single stream. Different
algorithms can be used to perform the collection.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Partitioners
Partitioners distribute rows of a single link (data
set) into smaller segments that can be
processed independently in parallel
Partitioners exist before ANY parallel stage. The
previous stage may be running: partitioner
Sequentially
Results in a fan-out operation (and link icon)
Stage Stage
running running in
Sequentially Parallel
Stage
In Parallel Stage
running in
running in
If partitioning method changes, data is Parallel
repartitioned Parallel
Stage Stage
running in running in
Parallel Parallel
Notes:
Technically, the parallel framework does not require explicit partitioners before each
parallel stage. Because the Designer GUI makes no such distinction, it is easier to think of
all stages as having partitioners, where AUTO is a type of partitioner (that may or may not
generate a partition operator at runtime).
There are two types of partitioners. For keyless partitioning algorithms, rows are distributed
independently of data values. For keyed partitioning algorithms, rows are distributed based
on values in specified columns.
Icons on the DataStage Designer canvas indicate when partitioning and collecting is
occurring. The fan-out icon indicates that data in a single stream is being distributed into
multiple streams. The lower, butterfly icon indicates that data in multiple streams is being
redistributed across multiple partitions.
6-10 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Collectors
Stage Stage
running in running Stage
Parallel Sequentially running
Sequentially
Notes:
There are several collector algorithms. Auto eagerly reads any row from any input partition.
The output row order is undefined (non-deterministic). This is the default collector method.
Round Robin picks rows from input partitions in round robin order. This is slower than auto
and rarely used.
Ordered reads all rows from first partition, then the second, and so on. It preserves the
order that exists within partitions.
Sort Merge produces a single (sequential) stream of rows, sorted on specified key
columns, from input sorted on those keys. It does not sort. Row order is not preserved for
non-key columns.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Parallel sorting
Many operations (joining, aggregating, removing duplicates) either
require sorting or perform optimally with sorting
In most cases, there is no need to globally sort data to produce a single
sorted sequence of rows
Instead, sorting is most often used to establish order within individual
partitions of data
Sorting for joining, aggregating, removing duplicates, and so on, can be done in parallel, for
high performance gains!
Global sorts, if desired, can be accomplished after parallel sorting, by collecting
the data into a single partition using the Sort-Merge collector
Notes:
It is sometimes thought that parallel sorting, though faster, is not very useful, because each
partition is separately sorting the data within that partition, and not sorting all the data. In
most cases, however, global sorts across all partitions are not needed. And global sorts, if
desired, can be accomplished after parallel sorting by collecting the data using the Sort
Merge collector algorithm.
6-12 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
them
GUI stages in the job design are compiled into
OSH operators
GUI Transformer stages are compiled into C++
source code, which is then compiled into C+ + f
or
custom OSH operators Trans each
Executable
forme
r
Job
This is why DataStage requires a C++ compiler
DataStage also supports custom C++ stages,
called BuildOp stages, that are compiled
manually within the GUI, and then compiled into Transformer
Gene Components
custom OSH operators rated
OSH
Notes:
What happens when a DataStage job is compiled? From the GUI design on the Designer
canvas, DataStage generates what is called OSH. OSH is a scripting language
composed of C++ operators and input/output specifiers.
Some stages, like the Transformer stage and Custom Build-Op stages, generate C++
source code that is then compiled into OSH operators. This is why DataStage requires a
C++ compiler on the Engine system.
The OSH code that is generated still represents the data flow as a sequential process. At
runtime, along with the configuration file (discussed later), the OSH is parsed into code that
implements the partition parallelism.
6-14 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Generated OSH
Enable viewing of generated
OSH in Administrator
Comments
Schemas describe
Operator the format of the
input and output
data to the OSH
Schema operators
Operator properties
Notes:
You can view generated OSH through DataStage Designer. This provides an overview of
the OSH that will be executed. It is important to note, however, that this OSH will go
through some additional changes for optimization and execution.
In the top graphic, the Parallel tab in DataStage Administrator is displayed. Developers
can only view the OSH is the Generated OSH visible for Parallel Jobs... box is checked.
There are several places where the OSH can be viewed. In the lower graphic the OSH is
being viewed on the Generated OSH tab of the Job Properties window is DataStage
Designer.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
6-16 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
At compile time the OSH is generated. It is not until runtime that the partition parallelism is
implemented. This is done by a series of start-up processes that occur whenever a parallel
job is run.
Since the parallelism is not implemented until runtime, the same compiled job can be run
with different degrees of parallelism, on different occasions. This is a major benefit of the
way DataStage implements partition parallelism.
The configuration file used to run the job determines the degree of parallelism, and the
resources (processors, disk, memory) used to run it. From this, and the OSH generated at
compile time, the Engine startup processes produce the Score, which specifies which
operators run on which processor nodes, and what resources they use when they do.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
After the Score is produced, data processing beings. There is some overhead as operators
are distributed to the various nodes. Processing ends when the last row of data is
processed by the job, unless the job aborts.
As a job runs, messages are written to the job log. The lower graphic shows the last few
messages of a job that ran to completion without errors.
6-18 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Processing Node
Players
The actual processes associated with operators
SL
(stages)
Sends stderr, stdout to section leaders
P P P
Establishes connections to other players for data flow
Cleans up upon completion
Default Communication:
SMP: Shared Memory
Cluster/GRID: Shared Memory (within hardware node) and TCP(across hardware nodes)
Copyright IBM Corporation 2007, 2012
Notes:
The graphic displayed summarizes the start-up process that occurs in generating and
implementing the Score. One processor node is designated the conductor. This is a node
on the computer system where DataStage is installed. The processor node composes
(generates) the Score based on the OSH and configuration file. It then forks off section
leader processors to each processor node specified in the configuration file.
Each section leader process then generates the OSH operator player processes that will
run on that node, and sets up the communication between those processes. A player
process is an operator (stage) running on a node.
The player processes, which are running in parallel on each node, then perform the data
processing the job is designed to do.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Stderr Channel/Pipe
APT_Communicator
Notes:
Every player process has to be able to communicate with every other player that could
potentially receive some of its output data or provide some of its input data. This is because
data can potentially move from one player process on one node to another player process
on a separate node (possibly on a separate computer).
There are separate communication channels (pathways) for control, messages, errors, and
data. Note that the data channel does not go through the section leader or conductor, as
this would limit scalability. Data flows directly from upstream operators to downstream
operators.
The graphic depicts the communication process. Two player processes are shown on each
node: generator, copy. The dotted lines represent the flow of data. So, for example, data
can flow from generator,0 to copy,0 on the same node or from generator,0 to copy,2 on
another node.
Communication also occurs between the conductor and section leaders, and between
sections leaders are player processes. These are indicated by the solid lines.
6-20 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
The Score is an in-memory text file that can be view in the job log. The Score identifies the
degree of parallelism for each operator and the node or nodes that are assigned to each
operator, for it to run on.
It is important to note that the Engine may insert additional operators into the Score
(partitioners, sorts) beyond what was generated in the OSH. These include buffer
operators to prevent deadlocks and sort operators that are inserted because certain
operators required sorted input data.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Score
Notes:
You can view the Score in the job log, if the $APT_DUMP_SCORE environment variable
has been turned on. Best practice is to have this variable turned on in both development
and production systems. The Score is a major debugging tool for DataStage developers.
And the Score is a major trouble-shooting tool for production teams.
The message in the log does not contain the word Score. Identify the message by looking
for the heading main program: This step has N datasets.
6-22 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
The Score yields a lot of useful information, including the number of operators (stages) and
the number of input and output data sets.
The Score also lists the number of player processes. In this example there are nine player
processes. One operator running on the Row Generator node. Four peek processes
running on all four nodes for the first Peek stage. And four peek processes running on all
four nodes for the second Peek stage.
An example Score is displayed in the top right corner for the job shown in the lower-left
graphic. Notice that the two Peek stages/operators each run, in parallel, on four processing
nodes. The Row Generator stage, which runs sequentially, runs on only a single node
(node1).
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Processes consume resources, CPU and memory. The more processes, the greater the
impact on resources. You can determine the total number of processes a job will generate
from the Score. There is one process generated for the Conductor node. There is one
section leader process generated for each node. Each player process running on a node is
a separate process.
The Job Score does not include the runtime startup, overhead processes, since the
number is constant across all jobs.
6-24 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Configuration file
Notes:
When a job runs, it runs using a configuration file. The configuration file number of nodes
determines the degree of parallelism of the job.
The configuration file tells the parallel Engine how to exploit the underlying computer
system or systems. What processor nodes should it use? What disk resources?
The $APT_CONFIG_FILE environment variable that is in effect for the job at the time the
job runs determines the configuration file that is used by the job. There is a project default
configuration file that is specified. The job, however, may override this default by including
the environment variable as one of its job parameters.
6-26 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
The number of nodes specified in a configuration file does not have to match the number of
physical CPUs in your system or systems. There can be more or there can be less. The
nodes specified in the configuration file are logical. For example, you can use a 4-node
configuration file when running a job on a computer with a single processor. And the job will
still run in parallel streams. It will not run in true physical parallelism. It will be the kind of
parallelism exhibited by a computer with a single processor running several applications at
one time.
True physical parallelism does not occur unless there are physical CPUs backing it up.
There is no need to connect the nodes in the configuration file to physical processors if
they exist. This occurs automatically, and you have no control over this.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
This graphic shows a typical configuration file. The file defines four nodes: node1, node2,
and so on. The names given to the nodes is arbitrary. The fastname, on the other hand, is
not arbitrary. Its name must match the network name of the computer in which it exists.
Pools can be applied to nodes and other resources. Individual jobs or stages in a job can
be constrained to use a certain pool of nodes or resources. In this way you can direct the
job or stages in the job to use certain nodes or resources, and not others.
There are several different types of resources. A disk resource is used for storing data sets.
A scratchdisk resource is used by DataStage for temporary work space, for example, by
the sort operator.
6-28 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
There are many factors that affect what the optimal configuration file would look like and
how many nodes it would have. The optimal degree depends on the application.
CPU-intensive applications and I/O-intensive applications vary in terms of what is optimal.
For production jobs that will be run repeatedly, you should test the job with different
configuration files. Start with a number of nodes, then start adding nodes as long
performance continues to improve. You should also experiment with reducing the number
of nodes. For some jobs, this may actually improve performance.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Node pools
Notes:
Node pools in the configuration file can be used to separate processing nodes into different
categories based on their characteristics. These characteristics can include resources such
a memory or disk space or access to specific applications. This enables the job to use the
most efficient processing nodes on which to run its operators (stages).
By default, DataStage uses all the nodes defined in the default node pool. The default node
pool is identified by the syntax of empty double quotes. In a typical configuration file, all
nodes will be in the default pool. In some cases, nodes with special resources will exist
outside of the default pool, as part of a special pool. This would be for nodes that are only
to be used by a job in special circumstances.
6-30 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
This is an example of a configuration file with defined node pools. In this example, nodes
n2, n3, and n4 all belong to the node pool named app1. Node n1 does not belong to this
pool.
All the nodes belong to the default node pool (identified by ). All operators can be
assigned to nodes in the default pool.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Disk pools
Disk pools indicate the directories of the file systems available to the
node
Defined as options for resource disk and resource scratchdisk
Disks and Scratch disks may be grouped into pools
Disk pools reserve storage for a particular use
Example: holding very large datasets, sorting
Syntax
resource disk "disk_name" {pools "disk_pool"}
resource scratchdisk "s_disk_name" {pools "s_pool"}
Pools defined by disk and scratch disk are not combined
Two pools having the same name and belonging to both resource disk and resource
scratchdisk are defined as two separate disk pools
Each node on which a stage runs must have at least one disk in the
default disk pool
Notes:
Disk pools identify the file directories available to a node. Each node must have at least
one disk directory it can use. Disk pools can be used to reserve storage for a particular use.
For example, a particular disk directory might be reserved for jobs that will be creating very
large data sets.
Since a jobs operators always need access to some disk space, each node in the
configuration file must have at least one disk resource in the default pool.
6-32 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Sorting in DataStage jobs requires both memory resources and disk resources. Disk
resources are needed when there is not enough memory to perform the sort in memory. In
that case, some sorting operations must be done using disk resources.
In the configuration file, you need to specify scratch disk resources for sorting operations.
The sort keyword is used to identify the scratch disk to be used first.
The usage of disk resources can be prioritized. If multiple disk resources are listed, the
order from top to bottom determines their priority. You can prioritize certain disk resources
for sorting purposes by adding the resource to the sort pool.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Just like for sorting, buffering also takes place in memory, if there is sufficient memory to
perform the buffering tasks. If there is not enough memory, disk resources will be used. The
buffer pool can be used to prioritize the scratch disk resources in the configuration file to
be used for buffering.
6-34 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
In this example, the buffer pool is used to identify /scratch0 as the priority directory for
buffering operations when they spill over to disk. Since /scratch0 is listed before
/scratch1, it would be used first, even without being in buffer pool.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The environment variable $APT_BUFFER_MAXIMUM_MEMORY determines how much
memory is available for buffering. Some jobs may require more for good performance. In
this case, you can used properties in the job to increase the memory available for specific
operations.
When memory is exhausted, disk space is used for buffering. There is a defined order in
which disk space is used until its exhausted. Scratch space in the buffer pool is used up
first, after memory is exhausted.
6-36 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
It may seem that using a configuration file with the maximum number of nodes relative to
the number of available physical CPUs will yield the best performance. But this is not
necessarily true. Each node increases the amount of overhead as it adds additional
processes. And you need to keep in mind the other activity on the system.
The best way to determine the optimal number of nodes is through testing. Run the job
several times on the same set of test data using a variety of configuration files, with
different numbers of nodes and different resource allocations. Compare the results to
determine the optional configuration.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
A default configuration file, named default.apt, is created when Information Server is
installed. Depending on the version of Information Server, this configuration file may have
only one node. And it uses subdirectories of the Information Server install directory for
specified disk resources. At a minimum, you should create a configuration file that specifies
other disk resources. And probably you will want to use a configuration file with multiple
nodes.
6-38 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
This slide offers some guidelines for sizing the optimal number of nodes in a configuration
file. As mentioned earlier, testing your jobs with several different configuration files is
recommended.
And remember, configuration files that work well for one job may not work well for other
jobs, depending on the type of job and whether, for example, it is highly I/O dependent.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
DataStage jobs are not limited to running on a single system with its limited number of
CPUs. DataStage can be configured to run jobs on multiple systems networked together.
The fastnames identify the names of the different systems. In this example, there are two
different fastnames (machine1 and machine2). This indicates that node1 and node2 are
on different computers.
6-40 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Spreading resource disks for nodes across different directories decreases latency and
increases throughput. Notice in this example that the resource disks for node1 and node2
are different. This insures that node1 disk operations will not contend with node2 disk
operations.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Even with respect to a single node, resource disk usage can be spread across different
disks to avoid contention. If multiple resource disks are specified data sets will be written
alternately to each one, in the order in which the resources are listed.
In this example, node1 has two resource disk entries. The first entry refers to disk1 and
the second to a directory on disk2. The first data set will be created on disk1. The second
will be created on disk2. The third will be created on disk1, as the process starts over
again.
6-42 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
There are times when a single node configuration file is appropriate and can yield the best
performance. This may be true when you are running a batch of DataStage jobs in a job
sequence, and all the jobs process a small amount of data. The overhead of the additional
nodes will outweigh benefits of the additional nodes, which are not really needed because
of the small amount of data.
Real-time DataStage jobs process data in small message units and usually get their best
performance using one node configuration files.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Named node
pool
Notes:
Click Teleconferencing in Designer to create a new configuration file or edit an existing
one. The easiest way to add a node is to copy the first node and paste in copies for the
other nodes. All you are required to change is the name of the node. You may also, as
noted earlier, want to change the resource disks.
6-44 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Add
Added environment
variable variable
Notes:
The $APT_CONFIG_FILE environment variable specifies the default configuration file to
be used by any job running in the project. Not all jobs have to run with that configuration
file. You can add $APT_CONFIG_FILE as a job parameter, so that the configuration file for
the job can be specified at runtime.
This graphic shows the Parameters tab of the Job Properties window. Click Add
Environment Variable to add any environment variable, including $APT_CONFIG_FILE,
as a job parameter. The values specified at runtime override the default values specified in
DataStage Administrator.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
6-46 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Commands for administering DataStage, controlling and monitoring DataStage jobs, and
commands importing and exporting DataStage objects and projects can be executed from
the Engine server system from the command line. These commands fall into four groups.
The dsjob command can be used to control DataStage jobs. Jobs can be run from the
command line. And the job log messages generated from the job can be viewed.
The dsadmin command can be used to configure DataStage projects and to retrieve
information about the DataStage environment.
The DSXImportService command can be used import and export DataStage dsx (import)
file. This command runs on both the DataStage server as well as DataStage client
systems.
The SyncProject command can be used when DataStage project directories get out of
sync with the Repository. This command runs on both the DataStage server as well as
DataStage client systems.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
dsjob command
DataStage user credentials: -domain domainName user
userName password password server engineName
Running a job: -run projectName jobNameRet
Options include:
-mode [ NORMAL or RESET ]
-param parameterName=value
-stop
Use to stop a running job
List projects: dsjob lprojects
List jobs: dsjob ljobs projectName
Access job log files: dsjob logsum projectName jobName
Generate a job report: dsjob report projectName jobName
Notes:
When using the dsjob command, DataStage user credentials need to be specified in all
cases.
Use the -run parameter to run a job. The -run parameter is followed by the name of the
project and the name of a job to run.
You can use the -lprojects parameter to list the projects on the Engine.
You can use the -ljobs parameter to list the jobs in a project.
You can use the -logsum parameter to display the job log messages for a job. The
-logsum parameter is followed by the name of the project and the name of a job.
6-48 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
This graphic shows the syntax of the dsjob command. At the bottom of the graphic is the
list of command parameters that can be used in the dsjob command. All these options are
preceded by a dash.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
In this example, the dsjob -lprojects command is executed. Before you run the command,
change to the /DSEngine directory, and then initialize the DataStage environment by
running the dsenv script. Then enter the command. The command is located in the
/DSEngine/bin directory.
In the graphic, the dsjob keyword is followed by the authentication credentials. In this
example student/student is used to log into the server (edserver.ibm.com).
6-50 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Project
Notes:
In this example, the dsjob -run command is executed. Before you run the command,
change to the /DSEngine directory, and then initialize the DataStage environment by
running the dsenv script. Then enter the command. The command is located in the
/DSEngine/bin directory.
In the graphic, the dsjob keyword is followed by the authentication credentials. In this
example student/student is used to log into the server (edserver.ibm.com). The -run
parameter is followed by the -param option, which is used to pass a value to the
NumRows job parameter, defined in the job. This is followed by the name of the project
and job.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Project Job
Notes:
In this example, the dsjob -logsum command is executed. Before you run the command,
change to the /DSEngine directory, and then initialize the DataStage environment by
running the dsenv script. Then enter the command. The command is located in the
/DSEngine/bin directory.
In the graphic, the dsjob keyword is followed by the authentication credentials. In this
example student/student is used to log into the server (edserver.ibm.com). The -logsum
parameter is followed by the name of the project and job.
6-52 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
-report
Project Job
Notes:
In this example, the dsjob -report command is executed. Before you run the command,
change to the /DSEngine directory, and then initialize the DataStage environment by
running the dsenv script. Then enter the command. The command is located in the
/DSEngine/bin directory.
The -report parameter returns a report of the last job run.
In the graphic, the dsjob keyword is followed by the authentication credentials. In this
example student/student is used to log into the server (edserver.ibm.com). The -report
parameter is followed by the name of the project and job.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
dsadmin command
Create a project: dsadmin createproject projectName
Set the value of an environment variable: dsadmin env
variableName value Value projectName
List projects: dsadmin listprojects
List environment variables: dsadmin listenv projectName
Notes:
You can use the dsadmin command to execute various DataStage administrative
functions: create a project, set an environment variable, list projects, list environment
variables.
6-54 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
This graphic shows the syntax of the dsadmin command. At the bottom of the graphic is
the list of command parameters that can be used in the dsadmin command. All these
options are preceded by a dash.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
-listprojects
-listenv
Environment
variable
settings
Notes:
In this example, the dsadmin -listprojects and the dsadmin -listenv commands are
executed. Before you run these commands, change to the /DSEngine directory, and then
initialize the DataStage environment by running the dsenv script. Then enter the
command. The command is located in the /DSEngine/bin directory.
The -listproject parameter returns a list of projects.
The -listenv parameter returns a list of environment variables and their current settings.
In the graphic, the dsadmin keyword is followed by the authentication credentials. In this
example student/student is used to log into the server (edserver.ibm.com). The -listenv
parameter is followed by the name of the project.
6-56 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
List contents
Listings
Notes:
This command is located in the /ASBNode/bin directory, on both the Engine server and
client systems. In this example, the DSXImportService keyword is followed by the -List
parameter. Then the type of import file is specified by the -DSXFile parameter. This
distinguishes the import file as a dsx type rather than an xml type. Then the path to the
import file is specified.
Notice that the output lists the type of DataStage object (parameter set, job, etc.) followed
by a list of the objects of that type contained in the input file.
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Import file
Results
Notes:
This command is located in the /ASBNode/bin directory, on both the Engine server and
client systems. In this example, the DSXImportService keyword is followed by parameters
for specifying the domain host, and the user ID and password used to log into the host.
This is followed by the name of the project the file is to be imported into. The -DSXFile
parameter distinguishes the import file as a dsx type rather than an xml type. Then the
path to the import file is specified.
Notice that the output lists the type of DataStage object (parameter set, job, etc.) followed
by the name of the object imported.
6-58 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Checkpoint
1. What determines the degree of parallelness that a job runs
under?
2. What message in the job log lists the nodes that a stage
(operator) runs on?
3. What two types of parallelism are supported in DataStage
parallel jobs?
4. When you click the Compile button for a DataStage parallel
job, what type of script gets generated?
5. What determines the configuration file a job runs under?
Notes:
Write your answers here:
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Exercises Unit 06
In this lab exercise, you will:
Edit a configuration file
Run a DataStage job from the GUI
using the non-default configuration
file
Examine the OSH and Score
Run a job from the command line
Administer the Engine from the
command line
Use the DSXImportService
command to list the contents of a
DataStage import (dsx) file
Use the DSXImportService
command to import a DataStage
import (dsx) file
Notes:
6-60 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Unit summary
Having completed this unit, you should be able to:
Describe components in the Engine architecture
Describe DataStage job compile and run time processes
Create and modify parallel job configuration files
Use the Engine command line interface
Notes:
Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
6-62 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit objectives
After completing this unit, you should be able to:
Configure DataStage projects
Configure Engine environment variables
Manage data sets
Configure the Engine to gather and process operational
metadata
Use the Multiple-Job Compile utility to compile batches of
DataStage jobs
Notes:
7-2 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Primary project configuration is done by a DataStage administrator in the DataStage
Administrator client. The DataStage Administrator client contains a number of tabs where
these tasks are performed.
On the General tab, you can configure Runtime Column Propagation (RCP) settings,
default operational metadata handling, and the default workload management (WLM)
queue.
On the Permissions tab, you can specify DataStage user permissions.
On the Parallel tab, you can specify OSH visibility and format defaults.
On the Sequence tab, you can specify job sequence default settings.
On the Logs tab, you can specify job log default settings.
7-4 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Administrator tabs
General tab
Enable job administration in Director
RCP settings
Access to environment variables
Generate operational metadata
Workload management default queue
Permissions: Specify user roles
Tracing: Enable server side tracing
Schedule: Specify user ID for scheduled jobs
Only enabled on Windows
Mainframe: Defaults for mainframe jobs
Tunables: Defaults for Server jobs
Parallel: Defaults for Parallel jobs
Sequence: Defaults for Job Sequences
Remote: Used for job deployment on a USS system
Logs: Logging defaults
Notes:
On the General tab, you can configure Runtime Column Propagation (RCP) settings,
default operational metadata handling, environment variable settings, and the default
workload management (WLM) queue.
On the Permissions tab, you can specify DataStage user permissions.
On the Parallel tab, you can specify OSH visibility and format defaults.
On the Sequence tab, you can specify job sequence default settings.
On the Logs tab, you can specify job log default settings.
In addition, there are several tabs for special purpose configuration. The Schedule tab is
used by the DataStage job scheduler. It is only enabled on Windows platforms. The
Mainframe tab is only enabled if support for DataStage mainframe jobs has been installed.
The Tunables tab specifies defaults for DataStage server jobs. The Remote tab specifies
defaults for job deployment on a USS system.
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Operational
metadata Edit environment
variables
Workload
Management
Copyright IBM Corporation 2007, 2012
Notes:
This graphic shows the Administrator client tabs. The tabs described previously are at the
top. The General tab is selected and displayed.
Click the Environment button to edit environment variables.
If Workload Management is enabled (not enabled in this example), the default Workload
Management (WLM) queue is specified in the Queue box. Workload Management is
discussed in a later unit.
7-6 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
When RCP is turned on columns of data can flow through stages in a DataStage job
without being explicitly defined in the stage. Although this can be used to create DataStage
jobs that can process data in more flexible ways, it can also lead to unpredictable results in
DataStage jobs, if not handled carefully.
For this reason, if RCP is to be enabled, it is recommended that you not turn it on by
default. That way, job developers can turn it on, but it will not be turned on without their
explicit decision to do so.
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
RCP can be turned on at any level: project, job, stage. Settings at a lower level override
settings at a higher, more global, level. Therefore, even if RCP is not turned on by default, it
can be turned on at the job level or, even more specifically, at the individual stage level
within a DataStage job.
7-8 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Check to enable
RCP to be used
Check to make
RCP the default for
new jobs
Notes:
In this example, RCP has been enabled, but the Enable Runtime Column Propagation
for new links as been left unchecked. This means that when a new DataStage parallel job
is created, it will not automatically have RCP turned on. Developers can, if they choose,
turn it on for the job or for individual stages of the job.
If the Enable Runtime Column Propagation for Parallel Jobs is not checked, then
developers will not be able to use RCP in any of the jobs they develop.
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Another important task of the DataStage administrator is to specify DataStage user
permissions. For any IS user ID given the IS DataStage User role, the DataStage
administrator can specify a DataStage project role. There are several different types of
roles that can be assigned.
The DataStage Administrator, DataStage Production Manager, and DataStage Developer
roles give developers full access to all areas of a DataStage project. DataStage
Developers do not, however, have access to protected projects. A protected project is a
read only project. Objects imported into the project can neither be edited or deleted.
The DataStage Operator and Super Operator roles are more limited. Operators can only
log into DataStage Director and run DataStage jobs. They cannot log into DataStage
Designer and view or edit DataStage jobs. Super operators can log into Designer and view
jobs, but cannot modify jobs.
7-10 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Permissions tab
Added user
DataStage
administrators
Drop-down list
of project roles
Add new
user
Copyright IBM Corporation 2007, 2012
Notes:
DataStage Administrators, created in the Information Server Web Console, show up
automatically in the user list. DataStage users, created in the Information Server Web
Console, can be added to the user list. Then a role can be selected from the User Role list
for the user.
To add a user and assign a role to the user, click the Add User or Group button and
browse for a user to add. Then select the users role from the User Role list.
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Data Sets
Notes:
7-12 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Data sets
Binary data file
Preserves partitioning
Component data set files are written to each partition
Suffixed by .ds
Referred to by a header file
Managed by:
Data Set Management utility from GUI (Designer, Director)
orchadmin command from the command line
Represents persistent parallel data
Notes:
Data sets represent persistent data maintained in the Engine framework internal format.
The key feature of data sets, which distinguishes them from, for example, sequential files is
that they are partitioned. This makes them very useful as temporary staging files between
multiple jobs. They yield much better performance over sequential files because the data is
not collected, but remains partitioned.
Data sets are created and accessed using the Data Set stage in parallel jobs. Once
created, they are managed using the Data Set Management utility, accessible in DataStage
Designer and DataStage Director, and using the orchadmin command at the command
line on the engine server.
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Data sets
Key to good performance for DataStage applications in set of linked
jobs (possibly in a job sequence)
No import / export conversions are needed
No repartitioning needed
Written to and read from in DataStage jobs using Data Set stages
Implemented with two types of components:
Descriptor file:
contains metadata, data location, but NOT the data itself
Data component files
contain the data
multiple files, one per partition (node)
Notes:
As mentioned previously, the key feature of data sets, which distinguishes them from, for
example, sequential files is that they are partitioned. This makes them very useful as
temporary staging files between multiple jobs. They yield much better performance over
sequential files because the data is not collected, and remains partitioned.
They support this structure through two components: Data component files for each
partition and a descriptor file containing references to the data component files.
The descriptor file does not itself contain any actual data. It just contains pointers to
component files containing the actual data. For this reason you need to be careful when
attempting to delete a data set. If you delete the descriptor file, without also deleting the
component data files, you have deleted only the smallest portion of the data set.
7-14 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
This graphic shows an example of a DataStage parallel job with a Data Set stage. The
Data Set stage has been opened to reveal its properties. The file path specified is to the
Testdata.ds data set file. Data sets must be created with the .ds extension. The path
shown is specifies where the descriptor file component of the data set will be created. The
data file component files will be created in folders specified in the configuration file.
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Display schema
Display data
Notes:
The Data Set Management utility window is available from both Designer and Director. In
Designer, click Tools>Data Set Management to open this window. Use the icons at the
top to display its schema, which corresponds to a table definition, and its data, by partition.
In addition to viewing the data and format of the data set, you can use the Data Set
Management tool to copy and delete data sets. When used, these functions will
copy/delete all components of the data set, including its descriptor file and its component
data files.
7-16 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
This graphic shows examples of displaying the data within a data set and displaying its
schema. The schema describes the format of the data within the file, that is, its columns
and their data types.
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Although the internal format of data sets is subject to change it should be upward
compatible. That is, jobs built in future releases of DataStage should be able to read data
sets created using earlier versions. Nevertheless, data sets are not recommended for
long-term or archival storage, since they cannot be read outside of DataStage.
A data set is linked to the configuration file used to create it. That is, the number of nodes in
the configuration file determines the number of component data files. And the names of the
nodes and the paths to the data component files are referenced in the data file. This means
that if a job using a different configuration file than the one that was used to create the file
may not be able to read the data in the file.
7-18 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
The orchadmin utility is run on the DataStage Server system. It provides a command-line
interface to data set administration tasks.
Before you run the orchadmin utility you need to initialize the DataStage environment
using dsenv. In addition, the $APT_CONFIG_FILE variable needs to be set to the path of
the configuration file used to create the data set. This can be done by adding a line to the
dsenv file, as shown in the graphic. (The dsenv file, and how to edit it, is discussed in
more detail in a later unit.)
The orchadmin script is located in the /PXEngine/bin directory. It is a very powerful
command with more functionality than the Data Set Management utility in DataStage
Designer. You can use the orchadmin -help command to get documentation on its
parameters.
As an example, the following command lists all the partitioning information, data files, and
schema of a data set named datafile.ds: orchadmin II datafile.ds
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
This graphic shows an example of using the orchadmin command with the II parameter.
First the environment is initialized using the dsenv command. Then the orchadmin
command is run.
To determine the number of records in a data set, you can also use dsrecords.
7-20 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Number of file
partitions
Number of
records in the
file partition
Notes:
This graphic shows an example data set report generated by the orchadmin II command.
The information includes the number of file partitions, the number of records in each file
partition, and the paths to the data component files of the data set.
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Environment Variables
Notes:
7-22 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
There are three places where environment variable values can be specified. Those
specified in the dsenv file apply to all DataStage projects. Those set in Administrator apply
to a specific project. Those set in the job apply just to the job.
$DSHOME is a variable defined in the dsenv file that specifies the DataStage home
directory. By default, this is /InformationServer/Server/DSEngine.
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The dsenv file specifies the DataStage environment. It is read by the DataStage daemon
at Engine startup. Environment variable settings in the dsenv file apply globally to all
projects.
The Engine inherits environment variable settings of the user who starts the Engine and the
environment variables settings in dsenv at the time the Engine is started.
7-24 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
This lists some of the main environment variables that need to be set in the dsenv file in
order for DataStage to run.
The DataStage Engine consists of two separate engines: the parallel engine and the server
engine. /DSEngine is the home of the server engine. /PXEngine is the home of the parallel
engine.
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Variable
setting
User Defined
variables
Copyright IBM Corporation 2007, 2012
Notes:
Environment variables defined in Administrator apply to a specific project. They override
any settings in the dsenv file.
The User Defined section can be used to create and set variables that do not exist as part
of the standard system. This might include variables required for data resources or custom
stages.
7-26 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
DSParams file
Stores project level
environment variables
for each DataStage
project
Gets entries from
Administrator
Should not be edited
Can be copied between
projects to deploy the
settings you have
configured
Notes:
The DSParams file is a DataStage system file used by DataStage to keep track of
environment variable settings.
In general, the DSParams file should not be directly edited; appropriate entries are
somewhat complex, and if you make a mistake you can possibly disable DataStage.
However, you can copy this file and then replace it when backing up, deleting, and
restoring a project.
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Operational Metadata
Notes:
7-28 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Operational metadata describes events and processes that occur and objects that are
affected when a DataStage job is run.
Operational metadata must be generated before it can be captured. To generate
operational metadata for a DataStage job, run the job with Generate Operational
Metadata box checked.
Use the Run Import utility to capture the generated metadata. Capturing the metadata
refers here to loading the metadata into the Information Server Repository where it can be
viewed and analyzed using Information Manager products and tools, such as Metadata
Workbench.
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Project default
Notes:
You can specify that operational metadata is generated by default by selecting the
Generate operational metadata box in Administrator, as shown here.
7-30 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
When operational metadata is generated, XML files are created that contain the
operational metadata for the job runs. By default, these XML files are saved to the folder
/IBM/InformationServer/ASBNode/conf/etc/XmlFiles on the drive where you installed
Information Server.
To load the operational metadata in the Information Server Repository, so that it can be
viewed and analyzed, you run the Run Import utility. The Run Import utility imports the
contents all XML files in the XmlFiles folder into the Repository, and then deletes the files
(or moves them to a folder of your choice).
To study the operational metadata that you imported, you can create a report on the
operational metadata in the Reporting tab of IBM Information Server Web console.
When you no longer need the operational metadata, you can delete it from the Repository.
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
IS admin user
Password is
encrypted when the
file is saved
Repository host
name
Notes:
Before you can execute the Run Import utility to load the generated operational metadata
into the Repository, the utility must first be configured. The runimport.cfg file is used to
configure the utility. The essential properties that need to be configured are highlighted i in
this graphic.
The configuration file is located by default in the /InformationServer/ASBNode/conf
directory.
You must specify the user ID and password the utility is to use to access the Information
Server Repository. In this example, isadmin is used. You must also specify the name of the
Repository host system (in this example, EDSERVER.IBM.COM) and the port number
used to connect to it (by default, 9080).
7-32 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Directory with
configuration file
Directory with
generated XML
files
Notes:
This example shows an XML file that was generated when the desRowGenDataSet
DataStage job was run. Each run of a DataStage job produces an XML file. After the XML
file is generated, you can now run the Run Import utility to load this operational metadata
into the Repository.
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Run Import
Utility
After run, you can check whether the /XmlFiles directory is empty
The XML files containing the operational metadata are deleted after they
are imported into the Repository
Copyright IBM Corporation 2007, 2012
Notes:
The Run Import utility is by default located in the /IBM/InformationServer/ASBNode/bin
directory. First change to the directory containing utility, as shown, and then run the utility,
as shown. Review the messages output from the utility. In this example, the message tells
us that one XML file was successfully loaded into the Repository.
7-34 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Reports can be created on job runs after operational metadata has been collected. The
reports contain a variety of information including design information, start and end times,
job duration, and parameter values job ran under.
This graphic shows an example of one such report.
In addition, reports and analyses can be generated within Metadata Workbench. These
analyses can show the flow of data through a series of jobs and data resources.
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
A large amount of operational metadata can accumulate in the Repository.
To delete operational metadata from the Repository do the following. In a text editor, open
the PurgeJobRuns.sh file. This file is in the opt/IBM/InformationServer/ASBNode/bin
directory. At the end of the text in the file, type the appropriate command to delete
operational metadata for one or more job runs:
To delete operational metadata for a single job run, type the -activityID command
followed by the activity ID of the run in quotation marks, for example -activityID
"multilink 2006-06-19 00:00:03". You can specify only one activity ID.
To delete operational metadata for all jobs that ran in a range of dates, type the
-beginDate command, followed by the beginning date of the range, in the format
YYYY-MM-DD, followed by the -endDate command, followed by the last date in the
range, for example -beginDate 2006-06-07 -endDate 2006-06-20. This command
deletes operational metadata for jobs that ran on the beginning date, ending date, and
all days in the range.
7-36 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty Just before the end of the text in the file, change the values for -user and -password to the
credentials for a user who has the Operational Metadata Administrator role.
From the command line, run the file. The operational metadata for the specified run or runs
will be deleted from the Repository.
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
7-38 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
If you move DataStage jobs from one system to another it is recommended that you
recompile the jobs to make sure that they will run on the new system. This can be very time
consuming if you open and compile one job at a time in Designer. Fortunately, there is a
utility you can use to compile batches of DataStage jobs at one time.
To open the utility, in DataStage Designer, click Tools>Multiple Job Compile to begin the
process.
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
When you open the Multiple Compile utility the Selection Criteria window is displayed.
Select the types of jobs you want to compile. By default, all types of jobs are selected.
By default, only uncompiled jobs are selected for compile. If you are moving jobs to a new
system, it is a good idea to force a recompile of all jobs, so you should change this default.
7-40 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
On the Selection Override window you can add or remove specific jobs from the compile
process. The selected jobs are displayed in the Selected items panel. Use the Add> and
<Remove buttons to add or remove jobs from the compile queue.
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Queued jobs
Generate
Start compile report
Notes:
On the Compile Process window you see the jobs queued for compile. Click the Start
Compile button to begin processing the queue.
A report is generated when the compile process is complete, identifying which jobs
compiled successfully, and which jobs failed to compile.
7-42 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Checkpoint
1. What do you need to do to configure a project to collect
operational metadata?
2. What tool can you use to view the data in a data set on a
partition-by-partition basis?
3. What is RCP (Runtime Column Propagation)?
4. What is a DataStage "protected project"?
Notes:
Write your answers here:
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Exercises Unit 07
In this lab exercise, you will:
Configure a DataStage project
View a data set using the Data Set
Management tool
Manage data sets from the command
line
Configure the Engine for operational
metadata collection
Generate operational metadata
View an operational job run report
Use Multiple-Job Compile tool
Notes:
7-44 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Unit summary
Having completed this unit, you should be able to:
Configure DataStage projects
Configure Engine environment variables
Manage data sets
Configure the Engine to gather and process operational
metadata
Use the Multiple-Job Compile utility to compile batches of
DataStage jobs
Notes:
Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
7-46 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit objectives
After completing this unit, you should be able to:
Configure the Engine to connect to databases using direct API
connections
Configure the Engine to connect to databases using ODBC
drivers
Notes:
8-2 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Connectivity to databases within a DataStage project and within Information Server
generally is established either through ODBC connectivity or DBMS-specific API
connectivity, configured in the Engine tier.
ODBC connectivity can be wired or non-wired. Connectivity that is wired does not require
database client software to establish the connection. The connection is wired directly to the
database. Non-wired connectivity requires database client software to be installed on the
Engine server system.
8-4 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
The main difference between configuring ODBC connectivity and configuring database API
connectivity is in how it is done. API connectivity is set up using environment variables in
the project or in the global dsenv file. ODBC connectivity is set up in configuration files
stored in DataStage directories.
It is important to be aware that the connectivity established does not apply just to
DataStage, but to Information Server as a whole. Connections created in FastTrack and
Information Analyzer, for example, require that the connectivity has been established in
DataStage. DataStage acts as a client to the database for other Information Server
products.
Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Information Server supports a wide range of different types of data resources. This graphic
lists some of the main types. Not only does Information Server support connectivity to
database systems, such as Oracle and DB2, but it also supports connectivity from
enterprise applications, such as PeopleSoft and SAP.
Mainframe resources, such as COBOL VSAM files, are supported. Support is provided for
many different types of files, including flat files, hierarchical files, and XML files.
8-6 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
For reference, this graphic gives a detailed list of major supported data sources organized
by type.
Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
8-8 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Connecting to a database using a database API requires client software for the database.
Information Server does not provide this client software.
Connecting to a database using ODBC requires ODBC drivers. Information Server installs
a set of ODBC drivers for many enterprise DBMSs. ODBC wired drivers connect directly to
the database server and do not require any additional client software. ODBC non-wired
drivers do require additional client software, because they use the client software to make
the connection.
Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
This table provides an overview of the DBMS software requirements for several major
databases. The first column lists the databases. The second column identifies the client
software needed to use direct database connectivity. The third column identifies whether
ODBC drivers are provided in the Information Server installation package for the database.
8-10 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
The user ID running a DataStage job or other Information Server process must have
adequate permissions to access the file system. This includes access to data resource
client software and driver files.
Some customers, as a security measure, restrict access to the database file system. Be
aware that this can lead to permission issues that can cause jobs to fail.
Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The primary environment variable requirement for API database connectivity is setting the
$LD_LIBRARY_PATH ($LIBPATH on some UNIX platforms) to the database library path.
In addition, there are often additional database-specific environment variables that need to
be set. Some are optional and some are necessary.
Unless the connectivity will only be used for specific DataStage projects, the required
environment variable settings should be set in the DataStage Engine dsenv file. This file
initializes the Engine environment. It applies to all DataStage projects and sets the Engine
environment for other Information Server products, such as FastTrack and Information
Analyzer.
8-12 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Sybase SYBASE n/a n/a - defined by the ASDIR (for IQ); SYBASE_OCS
OS locale (dir under $SYBASE for OCS)
Notes:
This table lists some of the environment variables that need to be set for some common
types of database systems. The first column lists the database. The remaining columns list
some of the different types of environment variables that need to be set. There are
environment variables for specifying the database home directory, the database instance
(where applicable), the NLS coding system, and miscellaneous variables specific to the
database.
Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
DataStage jobs that access a database must have the required database permissions for
issuing the SQL statement or command used to access the data. Typically, the user ID
used to access the database is specified in the DataStage job stage used to access the
database. The user ID and password can be parameterized, and passwords can be
encrypted.
8-14 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
LD_LIBRARY_PATH
Notes:
This graphic shows how to set the $LD_LIBRARY_PATH variable in DataStage
Administrator, for a specific project. In DataStage Administrator, open up the Environment
Variables window. The $LD_LIBRARY_PATH variable is located in the General folder.
Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
There are, similarly, other sets of environment variables specific to the type of database
system. For example, $APT_DB2INSTANCE_HOME and $APT_DBNAME are
environment variables specific to DB2. Generally, these variables are found in the
Operator Specific folder.
8-16 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
The dsenv file is used to initialize the DataStage Engine environment. It is executed
automatically during the Engine startup. This establishes the environment for all DataStage
projects as well other Information Server products and components that use the Engine.
This file can also be executed at the Engine server command line or terminal window to
initialize the session environment for running Engine commands. For example, you need to
execute dsenv before running the orchadmin command.
Editing the $LD_LIBRARY_PATH in the dsenv file makes these settings available to all
DataStage projects and to all Information Server products and components that use the
Engine settings. Connectors are used in several products (FastTrack, Information
Analyzer) to connect to data sources and to import metadata. These connectors may use
database library settings configured within dsenv.
Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
dsenv file
Located in $DSHOME (/IBM/InformationServer/Server/DSEngine)
Initializes variables: $DSHOME, $APT_ORCHHOME, $ODBCINI,
$LD_LIBRARY_PATH, $APT_CONFIG_FILE
Edit it to add additional variables and database library settings
LD_LIBRARY_PATH
DB2 library
Parallel Engine
library
Global environment
variable setting
Copyright IBM Corporation 2007, 2012
Notes:
The dsenv file is located in $DSHOME (/IBM/InformationServer/Server/DSEngine). Part
of its initialization involves setting various environment variables, some of which are shown
here. You can edit this file to add additional environment variable settings.
Be careful when editing this file. DataStage will not run if this file becomes corrupted.
The orchadmin command, which was used in an earlier unit to describe a data set,
requires that $LD_LIBRARY_PATH be set to the parallel engine library path and that the
$APT_CONFIG_FILE variable be set. Before running orchadmin, edit the dsenv file to
include these settings and initialize the command session by running the dsenv file.
Also highlighted in the graphic is the DB2 library path that has been added to
$LD_LIBRARY_PATH.
8-18 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
ODBC Setup
Notes:
Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
ODBC drivers
Data Direct ODBC drivers for DataStage are installed as part of the
Information Server installation
Installed in the ODBCDrivers subdirectory
DataDirect documentation on the drivers is in the
IBM/InformationServer/Server/branded_odbc folder
odbcref.pdf has documents all the drivers
Additional information is contained in the other PDFs in the folder
Notes:
Data Direct ODBC drivers for DataStage and QualityStage are installed as part of the
Information Server installation. The Data Direct documentation on the drivers is in the
IBM/InformationServer/Server/branded_odbc folder.
8-20 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
ODBC architecture
ODBC Architecture
Datastage Server
Driver Manager
Notes:
This graphic describes the ODBC architecture. DataStage accesses the ODBC driver
through the ODBC driver manager. If the driver is non-wired, then the driver accesses the
database server through the client software. Otherwise, it accesses the database server
directly.
Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Two files need to be configured to establish ODBC connections. The .odbc.ini file is
needed for connecting to the databases. The uvodbc.config contains entries for the
ODBC data source names, so that these are available in drop-down lists within DataStage
and Information Server products and components.
Both configuration files are located in the $DSHOME directory. uvodbc.config is copied to
each DataStage project directory (/InformationServer/Server/Projects/ProjectName)
when the engine is started, so that the settings will apply to all projects. You can also edit
the uvodbc.config files in the project directories.
8-22 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
LD_LIBRARY_PATH
setting
Export variable
DB2INSTANCE
setting
Export variable
Notes:
Environment variables settings can be specified in the dsenv file. This graphic shows
some examples of how to do this. The top graphic shows some environment variable
settings for Sybase and Informix databases. The bottom graphic shows some environment
variable settings for DB2.
Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
.odbc.ini file
For wired drivers, gives information about connecting to the database
server
For non-wired drivers, gives information about connecting to the
database client
Environment variables required by the database client software
Database home directory
Database library directory
The PATH environment variable
Location of the file is specified by the ODBCINI environment variable
By default in dsenv file: ODBCINI=$DSHOME/.odbc.ini
Entry in dsenv
Notes:
For wired drivers, the .odbc file gives information about connecting to the database server.
For non-wired drivers, it gives information about connecting to the database client.
The .odbc.ini file contains sample entries for most databases. First make a copy of the
entry and then modify it as necessary. Also add the new data source name to the list at the
top of the .odbc.ini file.
The location of the .odbc.ini file is specified in the dsenv file. The ODBCINI environment
variable specifies its location. In this example, the location is specified as $DSHOME, that
is, /InformationServer/Server/DSEngine.
8-24 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Sample settings
for connecting to
the DB2 server
using the DB2
wired ODBC
driver to connect
to DB2 SAMPLE
database
Notes:
To create this entry, copy and paste the sample entry in the .odbc.ini file headed [DB2
Wire Protocol]. Then modify the text as necessary. In this example, the name of the
database (SAMPLE), the logon ID and password (db2inst1/db2inst1), and the TCP port
number (50000) were specified.
Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Entry for
SAMPLE data
source
Notes:
At the top of the .odbc.ini file is a listing of ODBC data sources. This list shows up in
drop-down lists in DataStage and Information Server components. Add additional entries to
this list as you define new data sources in the .odbc.ini file.
In this example, the SAMPLE entry has been added.
8-26 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
uvodbc.config
Contains entries of each DSN to be accessed through Information
Server
There are multiple copies of the uvodbc.config file
One copy is in the $DSHOME directory
A copy can also exist in each project directory
(/InformationServer/Server/Projects)
The project uvodbc.config file, if it exists, takes precedence over the $DSHOME
copy
Entries have the form:
<Data source name>
Must match the name specified in the .odbc.ini file
DBMSTYPE = ODBC
Notes:
The uvodbc.config file contains entries for each DSN to be accessed through Information
Server. The data source name in the entry must match the name specified in the .odbc.ini
file. For example, recall that on a previous page a data source named [SAMPLE] was
created. The uvodbc.config file must contain a matching entry named <SAMPLE>.
The entry specifies the type of DBMS and the type of network connection used. An
example is provided on the next page.
Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The graphic shows an example of a uvodbc.config file. It contains entries for two ODBC
data sources. One is for a Universe database used by DataStage. The other is for the
<SAMPLE> ODBC data source that was defined in the example .odbc.ini file shown
earlier.
8-28 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
There are a number of ways to test the ODBC connections after you have specified them.
On the server, you can use the dssh command. This command allows you to log into a
DataStage project and then connect to a data source. If you can connect, then you
probably configured things correctly.
Before you run the dssh command you must initialize the DataStage environment by
executing the dsenv file. After you execute the dssh command, the dssh prompt is
displayed. At the prompt you can enter the LOGTO and DS_CONNECT commands.
Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Set up
Run dssh DataStage
environment
Retrieve list of
data sources from
uvodb.config
See if you can
connect to data
source
Notes:
This graphic shows an example of running the dssh command. Before you can use it you
have to set up the DataStage environment by running the dsenv file. In the example, we
first changed to the $DSHOME directory and than executed the dsenv file. Then we
executed the dssh command. The dssh prompt (>) is displayed. At the prompt, we logged
into the DataStage project named DSProject. Then we ran the DS_CONNECT command
to connect to the SAMPLE database.
The SAMPLE database prompt is then displayed. This establishes that we have properly
configured the ODBC connection to SAMPLE.
8-30 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Non-wired drivers require the database client software to be installed. Test your client
software connection to the database server outside of Information Server. If the client
software cannot connect to the database server, then the non-wired driver that uses it will
not be able to connect.
Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Database Connectivity
Notes:
8-32 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
This slide lists the main tasks for specifying DB2 environment connectivity. The user ID
used to connect must have access to the DB2 system tables.
The primary environment variables are listed and described. Use $LD_LIBRARY_PATH to
specify a path to the DB2 library. Use $APT_DB2INSTANCE_HOME to specify the path to
the DB2 home directory. Use $APT_DBHOME to optionally specify a default database.
Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
DB2 library
DB2 instance
home
Default DB2
database
Notes:
This graphic shows a DB2 configuration example. It shows example settings for the DB2
environment variables described on the previous page. Here, the variables are being
configured in DataStage Administrator for a specific project. These settings can also be
made in the dsenv file.
8-34 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Oracle configuration
Grant access to Oracle parallel server
Modify environment variable APT_ORACLE_NO_OPS
Create and set user-defined variable ORACLE_HOME
Create and set user-defined variable ORACLE_SID
Add ORACLE_HOME TO PATH
Add the path to the Oracle library to LD_LIBRARY_PATH
Set privileges on certain system tables
See Information Server Planning, Installation, and Configuration
guide for details.
Notes:
This graphic lists the main considerations in configuring the Oracle environment variables.
The primary environment variables are listed and described. Consult the Information
documentation for details.
User-defined variables can be created in DataStage Administrator or in the dsenv file.
They are variables that do not natively exist in DataStage, but can be added for special
purposes. In DataStage Administrator, they are created in the User Defined folder in the
Environment Variables window.
Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Teradata configuration
Teradata tools and utilities installed on nodes that run parallel
jobs
Set environment variables in /etc/services
Add same environment variables to dsenv
Create a Teradata user
See Information Server Planning, Installation, and
Configuration Guide for details
Notes:
This graphic lists some of the main considerations in configuring the Teradata environment
variables, to give you an idea of what is involved. Consult the Information documentation
for details.
8-36 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Checkpoint
1. What two DataStage files do you need to edit to configure
ODBC data source connections?
2. What is the difference between wired ODBC drivers and non-
wired ODBC drivers?
3. What environment variable is used to specify the database
library path?
4. What Information Server client is used to set this
environment variable?
Notes:
Write your answers here:
Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Exercises Unit 06
In this lab exercise, you will:
Enable a DataStage project to access
DB2
Globally enable access to DB2
Setup ODBC data source connections
Test ODBC connectivity using the dssh
command on the Server
Test ODBC connectivity using
DataStage Designer client import utility
Notes:
8-38 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Unit summary
Having completed this unit, you should be able to:
Configure the Engine to connect to databases using direct API
connections
Configure the Engine to connect to databases using ODBC
drivers
Notes:
Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
8-40 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit objectives
After completing this unit, you should be able to:
Monitor the DataStage job log
Use the DataStage and QualityStage Operations Console
Manage workload
Use the Performance Analyzer tool
Use the Resource Estimator tool
Notes:
9-2 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
When DataStage jobs and job sequences run they generate messages that are written to a
job log and stored in the Information Server Repository. These messages include many
different types of information, including error messages, warnings, row processing
statistics, and general information.
There are several ways in which you can view the generated log messages, some in real
time. DataStage Director and DataStage Designer both contain tools for viewing messages
in real time.
Using the Operations Console, you can not only monitor the messages generated by the
job in real time, but you can also monitor its resource usage as it is running.
Log messages can also be retrieved from the command line using the dsjob command and
its various options.
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
DataStage runs both individual jobs and organized batches of jobs called job sequences.
Since a job sequence is also a job, it generates log messages just like other DataStage
jobs. But monitoring the messages from a job sequence is more complex, because in order
to fully understand what is going on, it is necessary to view the messages of the jobs
running in the sequence, as well as the messages from the sequence itself.
9-4 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Handle
exceptions
Notes:
This graphic displays an example of a job sequence. It contains many of the different types
of available stages, which are highlighted.
In this example, the sequence is running three different DataStage jobs: Job_1, Job_2,
and Job_3. A job sequence can also run other types of activities. In this example, there is a
stage that is executing a system command or running a script file (top right). There is also a
stage that is sending an email.
Monitoring this job sequence would therefore involve monitoring the messages from
Job_1, Job_2, and Job_3.
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Running
Copyright IBM Corporation 2007, 2012
Notes:
There are three views that can be selected in Director. This graphic shows the Status view,
in which the status of running jobs and job sequences is displayed. The status can be
Compiled, Finished, Running, and so on.
In this example, notice that the job sequence named seqJobs is running. This job
sequence, runs three jobs named seqJob1, seqJob2, and seqJob3. In this example,
notice that seqJob2 is currently running. seqJob1 has already run, seqJob3 is waiting to
run.
9-6 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Waiting for
seqJob2 to
start
Summary
report
Copyright IBM Corporation 2007, 2012
Notes:
Click the Log View icon for a selected job or job sequence to display the job messages it
generates as it runs. In this example, we are looking at the messages generated by the job
sequence, rather than the individual jobs it is running.
Notice that many of the messages indicate when a particular job the sequence is running
starts, when it finishes, and its status when it finishes.
There is a summary message at the end that lists the activities that ran and their statuses.
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Operations Console
Notes:
9-8 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Operations Console
Monitor DataStage jobs that are running or have run
Information about the job, job activity, and resource usage
View jobs running on any engine system in the domain
Information is stored in the operations database
Operations Console client
Thin client, accessible from Internet Explorer and Firefox
URL: http://domain:port/ibm/iis/ds/console/login.html
Login with a DataStage user ID
Supported DataStage project roles include: DataStage Operator,
Super Operator, Developer, Administrator
Only information about projects the user ID has access to will be
displayed
DataStage Administrators can view information about all projects on
all engine systems
Copyright IBM Corporation 2007, 2012
Notes:
With the Operations Console, you can monitor DataStage jobs and job sequences in real
time. In addition to viewing job messages, you can also get job status information, and
information about the system resources available while the job is running, including CPU
usage and free memory.
In the Operations Console, you do not just see jobs running in a single project, like you do
with the DataStage clients. You can get information about jobs running on any engine
system in any project.
You access the Operations Console through a web browser. This web browser can be
running on the servers as well as the clients.
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Enable collection
Notes:
The operational metadata displayed in the Operations Console is stored in tables in a
database. By default, it is part of the XMETA database, but it uses a different schema.
Operations Console monitoring is configured using the DSODBConfig.cfg file located in
the InformationServer/Server/DSODB folder. There are a number of configuration
options, including whether operational data collection takes place at all. These options are
documented in the configuration file.
9-10 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Start the
services
Notes:
The Operations Console uses several services for collection, monitoring, and display. By
default, these services do not run automatically. To start or stop the services, you run the
DSAppWatcher.sh script. This script can be set up to run automatically when the
DataStage engine is started.
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The Operations Console opens to the Dashboard tab, which contains three sections of
information. The Job Activity section shows which jobs are currently running and their
statuses within a time range, for example, last 10 minutes.
The Operating System Resources section displays the CPU usage and free memory that
is currently available within a time range.
The Engine Status section displays the current status of engine services, including the
Operational Console services and WLM (Workload Management).
9-12 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Dashboard GUI
Dashboard Job activity Engine status
Notes:
This graphic shows the Dashboard tab. The sections described on the previous page are
highlighted.
Notice the Refresh icon located in the top right corner of each section. The information
displayed is updated at a certain interval, which is configurable in the DSODBConfig.cfg
file. Click the Refresh button to manually refresh the display.
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
There are several other tabs in addition to the Dashboard tab. You use the Projects tab to
display information about DataStage projects for a selected engine in the domain. You can
view the contents of the Repository window for each project, which displays the objects
the project contains. You can also get some statistical information about these objects, for
example, number of jobs in the project.
The environment variables and their current settings are also displayed.
You can get additional information about an object, for example a DataStage job, by
selecting the object. The information is then displayed in the right panel.
9-14 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Projects GUI
Run the job Projects filter
Notes:
You can also run DataStage jobs from the Operations Console. In this example, the
seqJobs job sequence has been selected. In the bottom panel, the previous job runs are
listed. The top panel provides information about the selected job sequence, including
information about its last job run.
Click the View Job Design button at the top to view the job diagram from the Operations
Console. Click the Run button at the top to run the job from the Operations Console. You
will be prompted to specify the jobs parameters.
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Run
Parameters
Notes:
In this example, we will run the seqJobs job sequence and monitor it as it is running from
the Operations Console. After editing the job parameters as desired, click the Run button
to start the job. Next move to the Dashboard tab to view its activity and it resources. This is
shown on the next page.
9-16 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
List of jobs
CPU spike
Notes:
Notice that the activity spiked as the job sequence and the jobs it contains ran. The bar
graph at the bottom of the Job Activity panel indicates that all jobs within the current time
period have finished without errors or warnings. You can click on the Finished link for
details about the jobs that finished.
Notice that the CPU activity also spiked at the times the jobs were running. According to
the graph CPU usage went up to about 12%.
Although its not visible in this graphic, you can also view the amount of free memory that
was available at the time the jobs ran. The graph depicts both free physical memory as well
as free virtual memory.
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Log messages
Notes:
The top graphic lists the jobs that finished during the current time period. This graphic was
displayed by clicking the Finished link. Click the View Details link next to a job, for
example, seqJobs, to view details about the job run. The Run Details window for
seqJobs is shown in the bottom graphic. The window has several tabs. Shown here is the
Log Messages tab, which displays the job log messages that were generated when the job
ran. The Full Messages box has been checked to display the full set of messages.
The Performance tab displays information similar to what you see on the Dashboard tab,
including CPU and free memory usage.
9-18 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Workload management
Enabled in the DSODBCConfig.cfg file
Set WLMON=1
The maximum number of running jobs can be prioritized
When the maximum number of running jobs is reached, jobs wait in queues until
slots are available
Queues are prioritized:
High priority queues: Jobs in this queue have the highest priority of getting the
next available slot
Medium priority queues
Low priority queues: Jobs in this queue have the lowest priority of getting the next
available slot
Special queues exist for Information Analyzer (IA) and Information Services
Director (ISD)
The priority of jobs running in these queues can be specified: Low, Medium, High
When jobs are run, a priority queue can be selected
The default queue is specified in DataStage Administrator
Notes:
Workload management (WLM) is also managed through the Operations Console.
Workload management is enabled in the DSODBCConfig.cfg file. To enable it, set
WLMON=1.
When WLM is turned on, the maximum number of running jobs can be set and prioritized. If
too many jobs are running at one time, then the resources (CPU, memory) are exhausted,
and none of the jobs run efficiently. By setting the maximum number of jobs low enough,
this situation is prevented.
The maximum number of jobs running can also be constrained by CPU usages and
memory usage. For examples, CPU usage can be constrained so that jobs will only run
when CPU usage is below 80%.
Jobs that cannot run because the maximum number has been reached wait in queues until
run slots become available. These queues can be prioritized. Jobs that are waiting in the
high priority queue have the greatest likelihood of getting the next available run slot.
When a job is run, the queue that it will wait in if necessary is selected.
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Queued jobs
Notes:
This graphic shows the Workload Management tab. In this example the maximum number
of running jobs has been set (artificially low) to 1. This means that only one job can run at a
time. Two jobs are waiting to run in a medium priority queue.
Notice in the graphic the list of available queues. Notice that some of these queues are
special purpose queues. There is a queue for Information Analyzer (IA) jobs, one for
Information Services Director (ISD) job, and one for Data Click jobs, as well as the three
general queues with different priorities.
9-20 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
You can use the Queue Management tab to specify the queue priorities. Different priority
rules can be used. In this example the queues are weighted according to the Priority
Weight rule. This rule bases priority on queue priority and time in the queue. This means
that if two jobs have been waiting in a queue for the same amount of time, and one of the
jobs is in a Low priority queue and one is a Medium priority queue, then the job in the
Medium priority queue will get the next available job slot.
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Performance Analysis
Notes:
9-22 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
The DataStage Director client contains a performance monitoring tool. To run it, select a
job, for example seqJob2, and then click Tools>New Monitor. As the job runs, the monitor
will display row throughput (rows/sec) for each stage in each partition.
There are several difficulties in using the Director Monitor to monitor the performance of
jobs: One major difficulty is when monitoring long-running jobs. The row throughput may
vary significantly over the course of the job run. It may be high in the beginning, but slow
down dramatically at a later time. It would be nice to have a record of these changes
throughout the job run that could be reviewed.
Another limitation of the Director Monitor is that it does not measure the system resources
while the job is running.
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Performance Analyzer
Visualization tool that provides insight into job runtime behavior
Offers several categories of visualizations:
Record throughput (rows/sec)
CPU utilization
Job timing
Job memory utilization
Physical machine utilization
Performance data to be visualized can be:
Filtered in selected ways, including
Hide startup processes
Hide license operators
Hide inserted operators
Isolated to selected stages (operators), partitions, and phases
Charts can be saved and printed
Notes:
Performance Analyzer is a visualization tool that provides insight into job runtime behavior.
In addition to record throughput, it measures CPU utilization, job timing, memory utilization,
and physical machine utilization. Several different types of graphs are available for viewing
these statistics.
9-24 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
To measure the performance of a job, open the job in Designer. On the Execution tab of
the Job Properties window, select Record job performance data in Job Properties. This
tells DataStage to collect performance data when the job runs. (This option can also be
selected on the General tab of the Job Run Options window.)
When the job runs, the performance data is collected. This collection has little impact on
the overall performance of the job.
After the job runs click the Performance Analysis icon. This opens the Performance
Analysis window for the job. The job can be run multiple times for comparison. The data
from each run is separately collected and stored.
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Example job
Notes:
This shows an example job. It has three input Row Generator stages going to a Funnel
stage, then a Sort stage, then a Remove Duplicates stage, then to a Switch stage to write
the data out to two Data Set stages.
9-26 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
This graphic shows the Job Timeline chart.
The Job Timeline chart breaks down the chart in terms of how long job processes take.
Here we see how long the each player process takes. A player process is a process
associated with an operator (stage) running on a node (partition).
In this example we are viewing the operators running in partition 0. There are tabs at the
top of the window to toggle from one partition to another.
The timeline covers the total time the job runs. Here we see that some stages ran for the
duration of the job; others ran for a portion of the time. In particular, the three Row
Generator stages ran for just a portion of the job run.
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Viewing by partition
Notice that the Row Generators stages are not displayed
Because they are running sequentially only in Partition 0
View by
partition
Notes:
In this example, the second partition has been selected. Notice that the Row Generator
stages are not displayed. This is because the Row Generator stages run sequentially, and
therefore in only one partition. By contrast, Sort stage operators run in both partitions in
parallel.
9-28 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Record throughput
Place the mouse cursor over a line at a particular point to
display the name of the stage and its throughput at that point
Run mouse
Rows per over line to
second identify the
stage
represented
Notes:
Select the Record Throughput chart to view the record throughput (rows/sec) of each
operator (stage) in each partition. Individual lines represent individual operators. You can
run your mouse over a line to display the name of the stage and the throughput at that point
in time.
Notice that we can view how the throughput of a stage changes over the job run. Some
stages have a fairly constant throughput; others change dramatically over the course of the
job run.
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Sort stage
CPU usage
Notes:
There are different types of charts you can use to display the data.
This shows CPU usage on a pie chart. This shows the amount of CPU usage of each stage
as a percentage of the total CPU usage. Notice that in this example the Sort stage uses
more of the CPU than the other stages.
This kind of information is invaluable when attempting to improve the performance of a job
with a different design. Clearly removing unnecessary sorts will have a major impact on
performance.
9-30 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Select stages
in a partition
to display
Select partitions
to display
Select the
stages to
display
Notes:
In the Stages folder you can select just the stages whose throughput you want to display.
Here just the Remove Duplicates stage is displayed. Stage selection can be done for any
chart. By default all stages are displayed.
You can also use the Job Tree and Partitions tab to select the results to display. The Job
Tree tab allows you to select stages in partitions to display. The Partitions tab allows you
to select partitions to display.
Similarly, the Phases folder (not shown) allows you to display what phases of a process to
display or filter out: Initialization, RunLocally(), and Post processing.
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Filters
By default, the activity of a number of processes and operators
are hidden
Allows you to focus on the comparable performance of the stages
Notes:
This graphic shows the Filters folder. By default all filters are enabled so that the activity of
a number of startup and overhead processes and operators is hidden.
The performance impact of these startup processes is less for longer running jobs and for
jobs processing large amounts of data. Comparisons of different job runs on different
amounts of data are more accurate if the impact of these processes is hidden.
9-32 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Resource Estimator
Notes:
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Use the Resource Estimation tool to estimate and predict resource utilization of parallel job
runs. The tool creates models to estimate the system resources for a job. There are two
types of models: Static and Dynamic. The former is based on a generated data sample
from the column definitions in the job design at compile time. The later is based on a
sampling of the actual input data at run time.
9-34 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Creating a model
Open a job in Designer
Open the Resource Estimation window
To create a model, click the Click Resource Model toolbar
button, then specify:
Name
Type of model: static or dynamic
For dynamic models, specify the data sampling method:
Automatic: Based on a set sample size according to stage type
Data range: Based on a specified number of records
You can also look at the actual resource usages for the input used
Called the actual model
Click Generate
Notes:
A resource estimation consists of a model of estimated resources. To create a model for a
job, first open the job in Designer. Then open the Resource Estimation window. You can
create either a static model or a dynamic model. After the model is generated, it will be
listed in the Models folder on the left panel of the window.
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The model contains several pieces of resource information. The model estimates both disk
space and scratch space. The static model estimates are based on worst-case scenarios.
For example, suppose the job writes rows of data out to a file. The size of the row that is
physically written may vary depending on the actual data written out in variable length
fields. The static model bases its estimates on the maximum possible size of the data. The
dynamic model, on the other hand, would base its estimates on a sample of the data it
runs.
CPU utilization cannot be determined unless the job is run on a sample of data. So CPU
utilization is not estimated in the static model.
The static model bases its estimates of the number of output records on the best-case
scenario given the size of the input (number of input records). For example, suppose there
are 1000 input records. In an actual job run, some of these records may not make it to the
output file. A constraint in a Transformer might filter some of these rows out. The static
model assumes that every input row makes it through the job. A dynamic model would
base its results on what actually happens during a job run.
9-36 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Projections
Estimate based on a specified size of the input data sources
within the context of a given model
Projections are applied to all existing models (except the
actual model
Creating a projection:
Click the Projection button in the Resource Estimation toolbar
Name
Specify the input size
Number of records
Megabytes
Use previous projection numbers
Notes:
The question often arises as to how much disk space will be needed to run this job? How
much will be needed if our current number of input records is multiplied tenfold? Projections
can be used to help answer these questions.
A projection estimates resource usage based on a specified size of the input data sources
within the context of a given model. The variable you can change is the amount of input.
You can specify an input size based on number of records or megabytes of input data.
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Models folder
Automatically
generated
static model
Notes:
This graphic shows the Resource Estimation window. In the Models folder is the static
model that was automatically generated for the job when the Resource Estimation
window was opened.
The Model Overview window lists the input data size the model is based on. The sampling
type is listed for the three input Row Generator stages. The sampling type is listed as Auto.
Each type of stage has a standard sampling method that is used. This type indicates that
the standard type for the stage was used.
9-38 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Projected number of
input records
Notes:
The Input Projections folder contains the generated projections. Here the projection
projects the number of input records that will be processed by each input stage given its
type and property settings.
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Total usage
Notes:
This graphic displays the Job Tree folder. The Job Tree folder lists all the components in
the job and their estimated resource usage.
In this example, the model projects that the Sort stage will consume roughly 175,000 MB of
scratch disk space. The model also projects that the target Data Set stages will each
consume a little over 100,000 MB of disk.
Notice also the reference to DataSet1 and DataSet2 in the stage list. These do not refer to
the target Data Set stages that the job is writing to. These are in-memory data sets that are
used internally by the job. Since they are in-memory, they do not consume any disk
resources.
9-40 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Stages folder
Resource estimates
by partition
Select stage
Throughput sizes based on
data size or number of
records
Copyright IBM Corporation 2007, 2012
Notes:
On the Stages folder you can select particular stages on which to view the estimates. In
this example, the Sort stage has been selected. The top right panel lists its resource usage
(scratch disk usage) by partition. The lower right panel lists input and output throughput by
partition. In other words, this lists the amount of data the stage processes during input and
during output.
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Charts folder
Notes:
In the Charts folder, you can select a particular chart that you want to view. Here the Disk
Requirements chart has been selected as an example.
9-42 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Creating a model
Here we are creating a Dynamic model based on samples of
actual data
Auto lets the tool decide the sample
Uncheck to specify your own sample
Generate
Model type
Notes:
Click the Create Resource Model icon in the toolbar to create a new model, either static or
dynamic. In the Model Name folder, specify a name for the new model. Then select its type
(static, dynamic) in the Model Type box.
In this example, the Dynamic model type has been selected. By default, the sampling
method is Auto. Remove the check to manually specify a sampling range. In this example,
the sample input for the third Row Generator stage consists of the first 500 records.
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Creating a projection
A projection allows you to estimate resource usage of stages
running in a partition based on specified input numbers
Projection name
Input units:
Amount of MB or Num
input records
Notes:
A projection allows you to estimate resource usage based on a projected amount of input
data. To create a projection specify the name of the projection and the input unit type. You
can specify the input units as megabytes or number of records.
9-44 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Checkpoint
1. What is the difference between a job sequence and an
ordinary DataStage job?
2. What command is used to start the Operations Console
services?
3. If Workload Management is turned on, what determines the
job's priority in taking the next available slot to run?
4. You can view the throughput (rows/sec) of a job on the
Designer canvas as it runs or in Director. What is the
advantage of monitoring the throughput of a job using the
Performance Analyzer tool?
Notes:
Write your answers here:
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Exercises Unit 09
In this lab exercise, you will:
Monitor jobs in DataStage Director
Start the Operations Console
services
Monitor jobs using the DataStage
Operations Console
Explore Workload Manager
Use Performance Analyzer to
analyze the performance of a job
Estimate the resources of a job
Notes:
9-46 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Unit summary
Having completed this unit, you should be able to:
Monitor the DataStage job log
Use the DataStage and QualityStage Operations Console
Manage workload
Use the Performance Analyzer tool
Use the Resource Estimator tool
Notes:
Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
9-48 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit objectives
After completing this unit, you should be able to:
Archive and package metadata assets using istool
Deploy and manage metadata assets using Information Server
Manager
Import metadata assets using Metadata Asset Manager
Browse metadata assets using Metadata Asset Manager
Manage duplicate metadata assets using Metadata Asset
Manager
Notes:
10-2 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Asset Interchange
Notes:
Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Asset interchange consists of the export of metadata from an Information Server repository
followed by the import of this exported metadata into the same or another repository. You
specify a set of related assets in the source repository to export to an archive file. For the
import you specify a set of related assets to import from an archive file.
The istool can be used to perform the interchange.
10-4 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
There are many uses for asset interchange. Some major uses are listed here.
The uses can be divided into two categories. One type of use involves moving metadata
assets from one repository to a different repository. These include moving assets from a
test system to a production system or from a development system to a test system.
Another type of use involves moving metadata assets from a repository to a file system and
then later back into the same repository. This might be done to backup a set of assets for
later recovery, or it might be done for archiving or versioning.
Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The istool utility is very powerful. It supports four basic commands: export, import, build
package, deploy package. The build package and deploy package functionality has
been captured into the Information Server Manager tool. This tool is discussed later in this
unit. Our focus in this topic is on the import and export functionality.
There are two common parameters in the istool command. You will always need to specify
authentication, that is, the services domain you are logging into and the user ID and
password you are using to do so. Secondly, you will always be specifying a path to the
archive file. The archive file is where the exported assets are or will be stored on the file
system, during an import or export.
10-6 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
The istool command uses an archive format called ISX. The archive contains a manifest
file and a set of files containing the serialized assets.
The archive file is a compressed, non-proprietary file. Its contents can be viewed by
standard tools such as WinZip and the Java SDK.
An archive consists of a manifest file, which describes the contents, and a set of files that
contain the assets.
Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
In this unit we will examine the istool import and export commands for DataStage. The
commands will be similar for other IS products. However, different product commands
support different parameters and options.
The -datastage keyword is used when importing and exporting DataStage assets. It is
followed by options and parameters specific to DataStage surrounded by single quotes.
DataStage Designer supports a type of export/import using a propietary dsx format. In
many cases, this type of import is sufficient, but it only available for DataStage, and istool
has some additional options. One limitation is that shared table relationships are lost in dsx
imports. Table definitions, that describe the format of files and tables, in DataStage can be
stored locally to DataStage or they can be made shared, to be available to other
Information Server products. Shared table relationships are not preserve across dsx
imports and exports.
10-8 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
In the istool export or import commands, you specify an "asset path" to identify the
assets to be exported.
Different keywords are used identify different types of assets. For example, the pjb
keyword identifies DataStage parallel jobs. The path can also include the asterisk (*) as a
wildcard character. So, for example, *.pjb would refer to all parallel jobs within the path
folder. The path identifies the DataStage server, the project hosted by the server, and a
folder within the project.
Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The istool command can also be used to import and export security assets, including
users and groups and their authorization roles. The -security keyword is used in the istool
command to specify users and groups to import or export as part of the archive. Related
metadata such as credential mappings can also be included.
10-10 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
In this example, the istool command is used to export parallel jobs in a DataStage project
folder named ISAdminFiles. The folder is in a project named DSProject, hosted by the
Engine system edserver.ibm.com. *.pjb identifies all parallel jobs in that project folder.
Here, the command is used to export to a file identified by the -archive parameter. The
archive path is specified in the string following the -datastage parameter.
Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Istool import
command
Notes:
In this example, the istool command is used to import an archive file into a DataStage
project. Key parts of the command are highlighted in the graphic.
Here, the command is used to import to a file identified by the -archive parameter. The
DataStage project to import into is specified by the string following the -datastage
parameter.
10-12 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
In this example, the istool command is used to export security assets. Key parts of the
command are highlighted in the graphic.
Here, the command is used to export to a file identified by the -archive parameter. The
security assets are specified in the string following the -security parameter. In the
command, the -securityUser -userident identifies the name of the user to be exported.
The related assets include the users roles and credentials.
Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
10-14 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
The istool command can be used to build and deploy assets. However, for DataStage
assets, Information Server Manager provides a GUI tool for doing this. Using Information
Server Manager, you can create packages of assets in one repository (Development / Test)
that can be deployed on a different repository (Production).
You can also use Information Server Manager to import and export DataStage assets using
the isx format.
Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Deploying packages
Selecting the assets
Select the domain
To add a domain, right-click in the Repository window
Log into the domain with IS Administrator ID
Right-click over Packages and then click New>Package to open a
new package
Building the package
Select the assets for the package
Drag them to the Package window
Click Build in the Package window
Deploying the package
Click Deploy in the Package window
Notes:
There are two steps involved in deploying a package of DataStage assets: Build the
package, and then deploy the package.
To build the package, you select the assets from the Repository window. Within
DataStage Designer, you only see the assets in a single project. In Information Server
Manager, you can view assets from any projects within the domain.
When you create a build, the set of selected assets are saved and available for
deployment. You can create any number of builds as more assets become available.
Any build can be deployed in any project in any Engine server in the domain. You can also
back out of a deployment by deleting the objects in the project, and then deploying an
earlier build in its place.
10-16 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Drag
assets to
Package
package
panel
Notes:
To add a DataStage domain, right-click in the Repository window. Then log into the domain
with an IS Administrator user ID.
To specify the package, drag the DataStage assets from the Repository window to the
Package window. Notice that the package can include any and all types of DataStage
objects, including jobs, sequences, table definitions, parameter sets, and so on.
After you define the package, click the Build button to add the package to the list of builds.
Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Deploy
Select
Engine
project
Select
Build
Notes:
To deploy a build, select the build in the list. Click the Deploy button, and then select the
Engine project in which to deploy the package. In this example, the package named
ISAdmin_Build2 is being deployed to a DataStage project named DSProject on the
EDSERVER.IBM.COM engine.
10-18 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Incremental builds
When a package changes you can create new builds
Any build can be deployed
Can rollback to previous builds DataStage
project
Latest
Build
Earlier
Build
Copyright IBM Corporation 2007, 2012
Notes:
You may at any time modify an existing package, by adding and removing assets, and
saving it as a new build. You can then deploy the new build or, if needed, rollback to a
previous build.
Suppose, for example, that Build1 is working well in production. Some enhancements are
made to some of the jobs and a new build, Build2, is created. When Build2 goes into
production, some problems occur. While those problems are being fixed, you can roll back
production to Build1.
Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Export
Select
objects
Notes:
You can also use Information Server Manger to import and export DataStage assets.
Information Server Manager provides a GUI interface to the import export functionality of
istool.
The export process is similar to creating a build. You select the assets for the package from
the Repository window. Then click Export to browse for a file location for the archive file.
10-20 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The Information Server Repository (XMETA) stores several different types of metadata,
including business metadata, technical metadata, and operational metadata. Some of the
metadata is metadata produced by Information Server products, for example, DataStage
jobs, which are produced by DataStage. Other metadata is consumed by is by Information
Server products, such as file descriptions of files read by DataStage jobs.
10-22 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
There is a metadata model, called the Common Model that defines the metadata assets
that can be stored in the Information Server Repository and their relationships to other
metadata assets.
You can view the Common Model within Metadata Workbench, on the Advanced>Model
View tab. Here, the objects in the Common Model and its extensions are listed and
documented.
The Common Model consists of a core model of objects and a number of extensions to
define and capture objects not found in the Common Model. Some of these extensions are
specific to Information Server products such as DataStage (Transformation model) and
FastTrack (Mapping Specification model). Others, such as the Business Intelligence
model, apply to objects that can be imported into the Repository for consumption by
Information Server products.
Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
External metadata
Common Model describes both metadata produced by IS
applications and external metadata consumed
Integrated with IS-produced metadata following the Common
Model format
Source of external metadata
Many types of external metadata can be imported into the IS
Repository using Metadata Asset Manager
Functionality within IS products
Hosts (systems that manage databases and other data resources) can
be imported into the IS Repository in FastTrack
Databases, database tables, schemas can be imported into the IS
Repository in FastTrack
Data files and structures can be imported into the IS Repository in
DataStage
Business categories and terms can be imported into the IS Repository in
Business Glossary
Notes:
The Common Model defines the metadata assets that are recognized by Information
Server, and these can include metadata assets that are produced by Information Server,
and it can include metadata that is imported into the Information Server repository to be
consumed by Information Server products.
There are many sources of this external metadata. Some of this external metadata can be
imported into the Repository using functionality with Information Server products. For
example, Hosts (systems that manage databases) and database objects can be imported
in FastTrack and Information Analyzer. Business categories and terms can be imported in
Business Glossary.
Metadata Asset Manager can also be used to import external metadata, and there are
types of metadata assets that can only be imported using Metadata Asset Manager.
10-24 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Model View
Common Model
Notes:
This graphic shows the Advanced > Model View tab in Metadata Workbench. In the left
panel you see a list of the Common Model and its extension models. Expand the model
folder to display the metadata assets defined in the model. In this graphic, the Common
Model objects are listed in the left panel. Select an object to display its definition in the right
panel.
In this example, the Host asset has been selected. Its definition is displayed in the right
panel. This includes a description of the class, and a list of its properties and relationships.
Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
To give you an idea of what is in the model, here are a couple of examples of metadata
assets defined in the Common Model. These are examples of assets that are consumed,
not produced, by Information Server products.
A Host is a computer that hosts databases or files. A Database contains database tables. A
Data File is collection of data organized into data structures of fields. In this respect, Data
Files are similar to database tables. Both of these assets are stored under Hosts, and
consumed by Information Server produced assets, such as DataStage jobs.
A BI Report contains information about physical and logical tables, among other objects.
Like database tables these objects can be consumed by Information Server assets, such
as DataStage jobs.
10-26 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
InfoSphere Metadata Asset Manager (IMAM) is the primary Information Server product for
managing external metadata assets, those consumed, but not produced, by Information
Server products. Like with Metadata Workbench, you can browse and search metadata
assets in the Repository, but IMAM is limited to external metadata.
IMAM also has import/export capabilities with respect to external metadata assets. In this
respect, it complements Metadata Workbench which does not have these capabilities.
Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Common
metadata roles
Figure 10-27. Logging into InfoSphere Metadata Asset Manager (IMAM) KM5021.0
Notes:
To log into Metadata Asset Manager (IMAM), open Internet Explorer and enter the IMAM
address: http://edserver.ibm.com:9080/ibm/imam/console. The user ID used to log into
IMAM must possess either the Common Metadata Administrator role, Common
Metadata User role, or the Common Metadata Importer role.
The Common Metadata User role allows the user to use the search and browse
functionality in IMAM.
The Common Metadata Importer role allows the user to create import areas and to import
metadata into the Repository.
The Common Metadata Administrator role enables the user to do anything in IMAM.
10-28 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Engine server
with installed
connectors
Figure 10-28. Metadata Interchange Servers KM5021.0
Notes:
Metadata Interchange Servers are defined on the Administration tab. In this graphic two
Servers are enabled. These Servers were configured when the Information Server Engine
clients were installed. In this example, EDCLIENT is the host name of the client system
and edserver.ibm.com is the name of the Information Server Engine system.
Metadata Interchange Servers are used to exchange metadata assets between the engine
client and server systems that have the bridges and connectors with the IS services
system. This enables BI metadata assets imported on my client system, using bridges and
connectors that only exist on my client system, to be saved into the Repository.
Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Metadata assets are first imported into a staging area. To create a new import staging area,
click New Import Area on the Import tab. Specify a name for the import area, and then
select the metadata interchange server you are using to import the metadata. The
metadata assets, and the bridges and connectors available to import the assets, will vary
depending on the metadata interchange server. For example, DB2 and DB2 connectors
may be installed on one server but not the other. Some engine client systems may have BI
metadata available that is not available on other engine client systems.
After you select the metadata interchange server, select the connector or bridge you will
use to import the metadata assets. For example, select the CA ERwin4 Data Modeler
bridge to import logical data models and physical data models from a CA AllFusion ERwin
4 file.
Click Next to move to the Import Parameters page. Here, in the case of an ERwin file, you
would browse for the file on the metadata interchange server system. Select a parameter to
display documentation about it.
10-30 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Import settings
Specify staging area requirements, either:
All imports
Imports where assets are merged
When the import contains duplicates
Imports with duplicates can be blocked
Staging area
requirements
Allow
duplicates?
Notes:
There are a number of settings that determine how imports will be handled. A Common
Metadata Administrator can change these settings. One setting determines the conditions
under which the user is required to view the metadata assets in the staging area before
they are imported to the repository. In this example, one of the conditions is if the metadata
assets may contain duplicates. This enables the user to examine the possible duplicates
before deciding whether to do the import.
Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Bridge
Notes:
In the Import area name, specify a name for the new import area. Optionally, add a
description. Then select the metadata interchange server you will be using for the import.
Different sets of metadata assets are accessible to different metadata interchange servers.
Choose the server that has access to the metadata assets you want to import.
In this example, EDCLIENT is the name of the metadata interchange server. This is a
DataStage client system where the BI bridges have been installed, including the CA Erwin
bridge.
10-32 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Import parameters
Select location of the import file
Specify path to import file
Configure other parameters as needed
Import file
location
Path to import
file
Notes:
In this example, the Erwin metadata assets are contained in an XML file located on the
EDCLIENT metadata interchange server system. The Metadata interchange server radio
button has been selected to indicate this. And a path to the file has been specified in the
File box.
There are a number of additional optional parameters that can be specified. Specify these
as needed.
Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
On this page you choose the type of import to perform. You can choose either an express
import or an managed import. An express import automatically imports the metadata
assets that have been loaded into the staging area into the Information Server Repository,
if all import settings requirements have been satisfied.
A managed import loads the assets into the staging area for you to preview, before you
decide to import the assets into the Repository. In this example, a managed import has
been selected.
10-34 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
After the metadata assets have been loaded into the staging area, you can perform an
analysis of the assets and preview them. Click the Analyze button to initiate the analysis.
The analysis generates a set of statistics about the assets, displayed in the lower left panel.
At the right panel, you can browse through the assets that have been loaded into the
staging area.
Click the Share to Repository button to import the assets into the Information Server
Repository. This button is not enabled until you perform the analysis and preview.
Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
In addition to importing BI metadata assets into the Repository, you can also browse the BI
metadata assets that are already in the Repository. Be aware that not all metadata assets
that are in the Repository can be viewed in IMAM. For example, DataStage jobs stored in
the Repository cannot be view from within IMAM. Only those types of assets that can be
imported using IMAM can be viewed in IMAM. To view all types of assets, use Metadata
Workbench.
The Browse Assets folders lists the types of metadata assets that can be viewed in IMAM.
These assets include BI metadata, data models of data resources, as well as physically
implemented data resources. With respect to the latter, for example, you can connect to a
database system and import metadata for its databases and database tables.
10-36 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
In this example, we are browsing through a logical data model of assets that were
contained in the XML file that was imported earlier. This particular model contains a
number of different entities, for example, an Accounting Unit entity.
Information about the assets you select in the middle panel is displayed in the right panel.
Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Checkpoint
1. What commands can you invoke with istool?
2. What GUI tools can you use to import and export DataStage
objects?
3. In Metadata Asset Manager, what is a "metadata interchange
server"?
4. In Metadata Asset Mangager, what is the difference between
an express import and a managed import?
Notes:
Write your answers here:
10-38 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Exercises Unit 10
In this lab exercise, you will:
Export DataStage assets using istool
Import assets using istool
Export security assets using istool
Create, build, and deploy a package
using Information Server Manager
Export assets using Information Server
Manager
View the DataStage assets in an
existing archive
Import metadata assets using Metadata
Asset Manager (IMAM)
View metadata assets using Metadata
Asset Manager (IMAM)
Manage duplicates
Notes:
Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit summary
Having completed this unit, you should be able to:
Archive and package metadata assets using istool
Deploy and manage metadata assets using Information Server
Manager
Import metadata assets using Metadata Asset Manager
Browse metadata assets using Metadata Asset Manager
Manage duplicate metadata assets using Metadata Asset
Manager
Notes:
10-40 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Copyright IBM Corp. 2007, 2012 Unit 11. Information Services Console Configuration 11-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit objectives
After completing this unit, you should be able to:
Configure Information Analyzer
Configure Information Services Director
Notes:
11-2 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Copyright IBM Corp. 2007, 2012 Unit 11. Information Services Console Configuration 11-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Architecture
Product Overview
Information Server Console
InfoSphere
Application DataStage
Server IS Console Web Console
DB2
Xmeta IADB
Notes:
The Information Server Console is the Information Analyzer and Information Services
Director front-end. The Information Server Web Console gives you access to security
controls for Information Server clients, including Information Analyzer and Information
Services Director.
Information Analyzer uses the DataStage Engine, also known as the Information Server
Engine for this reason, to run data analysis jobs. The resulting analysis data is loaded into
the Information Analyzer database (IADB).
Information Services Director also used the DataStage Engine as one of its service
providers.
XMETA is also, of course, used by Information Analyzer and Information Services Director
to store their objects.
11-4 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
After Information Server, along with Information Analyzer, is installed, some additional
configuration is needed for Information Analyzer. This includes creating an ODBC data
source connection to IADB and configuring Information Analyzer users and groups.
You also need to set the configuration options for the Analysis Database (IADB) and the
Analysis Engine (DataStage).
Copyright IBM Corp. 2007, 2012 Unit 11. Information Services Console Configuration 11-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
.odbc.ini file
entry
Notes:
An earlier unit discussed how to create ODBC data source connections. The same
procedure described earlier is used to define an ODBC connection to the IADB database.
The graphic shows how the DB2 IADB database entry is specified in the .odbc.ini file. The
main properties to configure are the Database (IADB), the IpAddress (host name of
services tier system), the LogonID and Password properties for connecting to IADB, and
the TcpPort used to connect to DB2 (50000).
11-6 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Information Server user IDs with Information Analyzer authorization roles are created in the
Information Server Web Console, as discussed in a previous unit. This graphic shows the
applicable roles in the Web Console.
Copyright IBM Corp. 2007, 2012 Unit 11. Information Services Console Configuration 11-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
User ID with
DataStage
credentials
Check Settings
Notes:
The Analysis Settings tab contains several sub-tabs. This graphic shows the Analysis
Engine sub-tab. As mentioned earlier, Information Analyzer uses the DataStage parallel
Engine to perform its analyses. Here you specify DataStage credentials for the Engine.
That is, you specify the operating system user ID and password of a user on the Engine
system.
By default, when Information Analyzer is installed a DataStage project named
ANALYZERPROJECT is created. The DataStage jobs used by Information Analyzer are
created in this project.
Click the Validate Settings button after to check the settings.
11-8 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Check Settings
Check Settings
Copyright IBM Corporation 2007, 2012
Notes:
The Analysis Settings tab contains several sub-tabs. This graphic shows the Analysis
Database sub-tab. Check the values in all the fields to ensure they reflect the actual values
of the systems configuration. In particular, pay attention to User Name, Password and
Analysis Connector DSN, since these values are the most likely to be changed during
installation. The User Name and Password boxes refer to the DB2 account created to log
into the IADB database.
Copyright IBM Corp. 2007, 2012 Unit 11. Information Services Console Configuration 11-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The IADB database contains tables used to store analysis results. It does not contain the
tables that contain the data to be analyzed. A connection to the source data tables must
also configured in Information Analyzer.
If an ODBC connection to the source database is to be used, then this ODBC connection
must also be configured, following the same procedure as for IADB. This data source must
also be available to the ANALYZERPROJECT DataStage project, just as for IADB. That is,
an entry must be made in the uvodbc.config file for that project.
Once the ODBC connection is created, a new data source connection within Information
Analyzer can be defined.
Table definitions will also need to be imported in Information Analyzer be the data in those
tables can be analyzed.
11-10 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Define source
Basic Tasks
Notes:
This graphic shows how to define a new data source (data store) in Information Analyzer.
Click Configuration>Sources in the Home pillar menu to open the Sources tab, shown in
the lower graphic. Select the host that owns the data source. In this graphic,
EDSERVER.IBM.COM is a host that is already defined in the Information Server
Repository. If the host of the data source is not listed, click New Host Computer to add it to
the Repository.
Click New Data Store to define the new source.
Copyright IBM Corp. 2007, 2012 Unit 11. Information Services Console Configuration 11-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Define source
Basic Tasks
Connector
name
Connector
Name of information
data store in
the Check
Repository connection
Copyright IBM Corporation 2007, 2012
Notes:
In this example, there is a DB2 database named SAMPLE. An ODBC connection to it has
been created. The ODBC connection is also named SAMPLE. Although this ODBC
connection has been created, it is not yet defined within the Information Server Repository.
The name of the data store is the name you want it to be known as in the Information
Server Repository. Best practice suggests that this name should match the physical name
of the database, but this is not required. For this reason, the data store is named SAMPLE
to match the name of the database.
We also need to specify how to connect to the data store. This is done in the middle panel.
The data connection (also called SAMPLE) is defined. It is an ODBC connector and its
connection string (DSN) is SAMPLE.
Metadata defining both the data store and the connector are now loaded into the
Repository. This information will be available to other Information Server products, such as
FastTrack.
11-12 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Import
metadata
Notes:
Once a data store has been defined, table definitions for tables in it can be imported into
the Repository. This is required before the data in those tables can be analyzed.
To import the table definitions, from the Home pillar menu select Metadata Management,
and then select Import Metadata. Expand the levels of the data source until you reach the
level for import. Select the tables, and then click Import.
Copyright IBM Corp. 2007, 2012 Unit 11. Information Services Console Configuration 11-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Creating a project
Basic Tasks
Project
type
New
project
Notes:
Like many of the Information Server products, before work can be done in Information
Analyzer, an Information Analyzer project must be created to do the work in. Multiple
projects can be created, each accessible by different sets of users.
To create a new project, first click New Project from the My Home tab. Give the project a
name and select its type, that is, Information Analyzer. Recall that the Information Server
Console is an interface to two kinds of projects: Information Analyzer projects and
Information Services Director projects. Be sure you select the correct type.
11-14 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Data
Source tab
Make imported
metadata available to
the project
Notes:
When you create a project, the Project Properties tab is opened with a number of
sub-tabs. On these sub-tabs you can configure the various properties of the project.
On the Data Sources tab you can select which data sources are available to the project. In
this example, the SAMPLE data store imported tables have been made available to the
project.
Copyright IBM Corp. 2007, 2012 Unit 11. Information Services Console Configuration 11-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Users tab
Browse for
users to add Specify project
to project roles for users
Notes:
On the Users tab you specify the users that have access to the project. These can include
any users that have been give Information Analyzer product roles in the Web Console.
Click on the Browse button to add and configure users for the project. In this example,
student has been added. In addition to adding users, you can specify their roles within the
project.
Different Information Analyzer users can have different roles within the project. The next
page defines these roles.
11-16 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Different roles have different authorizations. A user can be given multiple roles.
Copyright IBM Corp. 2007, 2012 Unit 11. Information Services Console Configuration 11-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
11-18 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Just as for Information Analyzer, access to Information Services Director (ISD) is also
through the Information Server Console. Just as for Information Analyzer, work is also done
in ISD projects.
Beyond configuring the project, the main task is to create ISD applications and to define the
information service connections for each.
Copyright IBM Corp. 2007, 2012 Unit 11. Information Services Console Configuration 11-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
ISD users
Click Browse to add users to the project
Select roles for the users
Notes:
The process of adding users to a project is the same as for Information Analyzer. For each
user, you can select one or more project roles. The Project Administrator role authorizes
the user to create and edit project properties and to create and delete applications.
The Designer role authorizes the user to add, delete, and edit services within an
application.
11-20 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
An application can contain one or more services. Once an application has been created, an
ISD Designer can create, delete, and edit services within the application.
Copyright IBM Corp. 2007, 2012 Unit 11. Information Services Console Configuration 11-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Information services connections are used to connect to service providers. Service
providers implement the logic that the service provides its consumers. A number of
different service providers can be used, including DB2, Federation Server, and DataStage.
DSServer is created during installation to connect to DataStage. Select the connection and
then click Open to edit the connection.
11-22 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
DataStage
user ID
Notes:
The primary thing needed is to specify a DataStage user ID. This user ID requires
DataStage Administrator or developer authorization, and must have DataStage credentials.
DataStage providers consist of a special type of DataStage job, one which has one or both
an ISD Input stage and an ISD Output stage. The former is used to pass values from the
service to the DataStage job. The latter is used to return output from the job to the service,
to be passed back to the service consumer.
Copyright IBM Corp. 2007, 2012 Unit 11. Information Services Console Configuration 11-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Provider
type
DB2
database
Notes:
When you configure a DB2 or Federation Server connection, you specify the type (DB2 or
Federation Server), the database host (edserver.ibm.com), and the database
(SAMPLE). This will enable, for example, DB2 SELECT statements within the SAMPLE
database to be used as service providers.
11-24 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Checkpoint
1. What client do you log into to gain access to Information
Analyzer?
2. What tasks do you need to do after IS installation to
configure IA?
3. Name two types of Information Services Director service
providers.
4. What makes a DataStage or QualityStage job the type of job
that can be used as a service provider?
Notes:
Write your answers here:
Copyright IBM Corp. 2007, 2012 Unit 11. Information Services Console Configuration 11-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Exercises Unit 11
In this lab exercise, you will:
Configure Information Analyzer settings
Configure an Information Analyzer data
source
Import table definitions for source data
tables
Create an Information Analyzer project
Configure an information services
application
Notes:
11-26 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Unit summary
Having completed this unit, you should be able to:
Configure Information Analyzer
Configure Information Services Director
Notes:
Copyright IBM Corp. 2007, 2012 Unit 11. Information Services Console Configuration 11-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
11-28 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit objectives
After completing this unit, you should be able to:
Install and deploy Information Server
Install fix packs and patches
Backup and restore Information Server
Describe the Engine High Availability option
Notes:
12-2 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Deployment models
One system for everything (only possible with Windows Server)
Domain Server
Engine
Windows
DB Server Client
Domain
Machine
Notes:
When Information Server is installed, its tiers (Client, Repository, Services, Engine) can be
deployed in different configurations. This graphic shows one Information Server
deployment option.
All Information Server components are installed on one computer system. This is only
possible on a Windows platform, because the Client tier only runs on Windows.
12-4 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Deployment models
Metadata Server, Repository, and Engine are on one system
Domain Server
Engine
Windows Client
DB Server
Domain
Machine
Notes:
In this deployment option, all the tiers are installed on one machine except for the Client
tier, which is installed on a Windows system. The Server system can be either a UNIX or
Windows system.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Deployment models
Different machine for Engine. Same machine for Repository and Services (WAS)
Domain Server
Engine
Windows Client
DB Server
Machine
Notes:
In this deployment option, the Engine is separated from the system containing the
Repository and Services tiers. The Client tier must be a Windows system. The system
containing the Repository and Services tiers can be either Unix or Windows.
Shown in this graphic is one Engine on one computer system. Also possible are multiple
Engines on either a single computer system or on separate computer systems.
12-6 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Deployment models
Multiple Engine machines. Same machine for Repository and Services (WAS)
Engine
Domain Server
Windows Client
Engine
DB Server
Domain
Machine
Notes:
Within a single Information Server domain, there can be multiple Engines. Although this
graphic shows two different computer systems, these multiple Engines can be on either
separate systems or be on a single system.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
12-8 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Suite installer
Installs all the products as part of a single Suite installation
All the tiers (Client, Engine, Repository, Services) are available in the
Suite installer
You select which tier or tiers you want to install on the system you are
currently on
You can select a subset of the products to install
Supports graphical installer on all platforms
Supports silent installation on all platforms
Supports console based installation on all platforms
Notes:
All of the tiers (Client, Engine, Repository, Domain) are available in the Suite installer. You
select which tier or tiers you want to install on the system you are currently on. For
example, if you are deploying to two systems, a Windows client system and a Linux server
system, you would run the installer on the Windows system to install the clients, and run
the installer on the Linux system to install the other tiers.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Installation steps - 1
Acquire the Information Server installation package
Copy the package to the computer you are installing on
In this example, there is a Linux Server and a Windows Client
Run the install on the Server first
In a terminal window, move to the location of the
uncompressed installation file (is-suite), then open the is-suite
folder
Enter the command shown to start the installation script
Install URL
Copyright IBM Corporation 2007, 2012
Notes:
This and subsequent pages go through the steps of the installation process. Begin by
copying the installation package to the computer you are installing of. In this example, the
Sever is Linux and the Client is Windows. All tiers except the Client tier are installed on a
single Linux system.
Begin by running the setup command. Output from the command is a URL that you paste
into a web browser. The rest of the installation process is done in the browser.
12-10 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Installation steps - 2
Copy and paste the URL into a Web browser session
Mozilla on Linux GUI used in this example
Click the Login button.
The installation Getting Started window is displayed
Click Next to move to the Firewall Requirement window
Click Next to go to the Early Requirements Check window
Be sure your system passes all requirements
Click Next to go to the Installation Directory window
Click Next to go to the Installation Type Selection window
For this example, we click New installation, the default
Other selections are: Add products, Add tiers
Notes:
The installation wizard then guides you through a set of pages. The first several pages are
listed and described here.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Installation steps - 3
Click Next to go to the Tier Selection window
Select the tiers to be installed on the system
Here, we select all three (non-client) tiers: Metadata repository,
Services, and Engine
Notes:
On the Tier Selection window you specify what tiers you want to install on the system you
are running the installation package on. Depending on your deployment option, this could
be one or more tiers. In this example, the Metadata Repository, Services, and Engine tiers
are installed on this one system. The Client tier is not available in this example because it
cannot be installed on a Linux system.
12-12 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Installation steps - 4
Click Next to move to the Product Selection window
In this example, we have selected all products
Notes:
This graphic shows the Product Selection page where you select the products you want
to install on the current system.
As you can see in this graphic, components of individual products may be installed on
multiple tiers. For example, if you install Metadata Workbench, it has components that get
installed on the Engine tier and the Services tier.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Installation steps - 5
Click Next to move to the Software License Agreement
window
Click Next to move to the DataStage Installation Options
window
Choose the IBM InfoSphere DataStage option to develop parallel
jobs and server jobs
Notes:
The graphic here shows the DataStage installation options. There are three types of jobs
that can be created in DataStage: parallel jobs, server jobs, and mainframe (MVS) jobs. In
this example, both server and parallel jobs can be developed, but not mainframe jobs.
12-14 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Installation steps - 6
Click Next to move to the High Availability Server Cluster
Configuration window
Select Server cluster configuration to deploy a cluster
Specify the virtual host name that will float to the current active
server
Notes:
The High Availability options are discussed later in this unit.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Given your tier selection, you now specify options for the WebSphere Application Server
(WAS), the database manager, and Information Server. These include user IDs and
passwords and port information.
12-16 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
The next series of pages are used to configure the database manager, which by default is
DB2. You can use either an existing DB2 installation or the installer can install DB2. Other
existing databases, such as Oracle, are supported.
The Operations Console uses a set of database tables. By default these tables will be
created in the XMETA, Repository database. Optionally, you can specify a separate
database for these tables.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Installation steps - 10
Click Next to specify the ASB agent port number and logging
agent port number
Notes:
On the Agent Ports Configuration window, you specify the ASB agent port number and
the logging agent port number.
12-18 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Installation steps - 11
Click Next to specify the Information Analyzer database (iadb)
and database owner (iauser)
Notes:
If Information Analyzer is installed, then a database that Information Analyzer uses will also
be installed. On this page, you specify the name of the database (iadb, by default) and the
database owner.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The DataStage administrator user ID is by default dsadm. You can either create this user
ID, along with several other user IDs, on the operating system in advance of the
installation, or you can choose to have the installer create this idea.
12-20 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Listed here are a series of installer pages used to configure DataStage and QualityStage.
One option to pay attention to here is the globalization support option, since this option
cannot be configured after installation.
By default one DataStage project named dstage is installed. You can optionally choose to
install additional projects. It is, however, not necessary to create additional projects during
installation, since these can be created after installation, in DataStage Administrator.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Prior to beginning the actual installation, the installation wizard then initiates a number of
tests to check whether the system requirements have been met for installing Information
Server.
If you get warnings, as shown above, open up the messages to see what specifically needs
to be done. You may get warnings about kernel parameter settings. Change these as
necessary. In Linux, you can make changes to kernel parameters by editing the
/etc/sysctl.conf file. Increase the values as suggested in the warning messages. Run
/sbin/sysctl -p to apply the changes.
If the requirements are satisfied, click Next to begin the installation.
12-22 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Client Installation
Notes:
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The client installation is similar, but simpler.
12-24 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Run setup.exe in the installation folder to begin the installation. This loads the installation
URL into a web browser.
Click Next repeatedly to move through the installation windows. Eventually, you will reach
the Product Selection window, shown in the graphic. Select the clients for any products
you installed on the Server.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
You can optionally choose to register your client system as a Metadata Interchange Agent.
Recall that these agents are used to import business intelligence (BI) metadata into the
Repository in Metadata Asset Manager. In order to perform the registration, the installer
must connect to the services system as an Information Server administrator. On this page,
you specify the name of the host, the port used to communicate with it, and the user ID and
password of the Information Server administrator.
12-26 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Like for the Server installation, just before the actual installation begins, the installation
package will check that the system requirements have been met. Fix any errors and
evaluate any warnings before continuing with the installation.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
12-28 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Version.xml file
Located in the /IBM/InformationServer directory on client and
server systems
Documents the installation history, the products installed, and
the status of the installation
Look for status=SUCCESS
Look for list of products installed and their versions
Notes:
After you complete the Information Server installation on the client and server, you should
check whether it installed correctly. There are a number of checks that you can do.
First examine the version.xml file on both the server and client systems. This file
documents the products that are installed and gives a status for each. Verify the list of
products installed and verify that they installed successfully.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
This graphic shows an example of a server version.xml file. Notice that it states that
Information 9.1 has been installed and that its status is SUCCESS. Notice also that it
lists the products that were installed.
12-30 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
This graphic shows an example of a client version.xml file. Notice that it states that
Information 9.1 has been installed and that its status is SUCCESS. Notice also that it
lists the products and components that were installed.
The lists of installed products can differ between the client and server. Some products,
such as Blueprint Director, only exist on the client. Similarly, some products or components,
such as IS Recovery, exist only on the server. (IS Recovery is discussed later in this unit.)
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Client tests
Verify that you can ping the services Server
Confirms that there is connectivity between the client and server
systems
Verify that the Information Server (IS) Web Console Login
window appears
Test the Engine
In the IS Web Console, create a DataStage administrator user ID
Set up Engine credentials for the DataStage administrator
Verify that you can log into the DataStage test project (dstage1) in the
DataStage Designer client
Notes:
On the client, first verify that you have connectivity with the server. Verify that you can ping
the server.
Next, open the Information server Web Console. If the Login window does not come up,
then either Information Server is not running or you are not able to connect to it.
It is also important to test the Engine. In the Web Console, create a DataStage
administrator ID and set up Engine credentials for the ID. Then verify that you can log into
DataStage Designer. You might also create a simple DataStage parallel job with a
Transformer stage and see if it compiles. This will test whether the server system has the
correct C++ compiler installed and configured.
12-32 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Server tests
If the Client tests fail, it may be because Information Server is not up
and running
To test whether the server is up, change to the WAS /InfoSphere/bin
directory, then run the serverStatus.sh script
You may be required to enter your WAS administrator user ID and password
Notes:
If you cannot open the Web Console on the client, it may be that Information Server is not
up and running. To check this, run the serverStatus.sh script on the server. Verify that
server1 is started. If server1 is not started, check the WAS log files to determine what the
problem is. This was discussed in an earlier unit.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Figure 12-33. Installing Information Server Fix Packs and Patches KM5021.0
Notes:
12-34 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Fix packs are a cumulative set of updates for a particular release. You only need to install
the latest fix pack, as it includes previous fixes. Fix packs are available from IBM Fix
Central.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Be sure to use the latest version of the Update Installer. Since the Update Installer changes
frequently, you should check each time you install a fix pack or patch.
A fix pack consists of two files. The Read Me file provides instructions for installing the
pack. The actual pack consists of an *.ispkg file.
12-36 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
You can run the Installer in either graphical or command-line mode. You should be logged
in as root whenever you install a patch. Be sure to review the Read Me file accompanying
the patch before you perform the install.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
It is recommended that you shut down and restart Information Server before applying a fix
pack to ensure that no Information Server processes that could affect the installation are
running. Generally, fix packs are applied to all tiers and should be applied in the order
shown here. If there are exceptions, this will be noted in the Read Me file.
12-38 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
After you install the Fix Pack, you should verify it. Start up each of the clients to verify they
work. Check in the Version.xml file that the pack was installed and that it has a Success
status.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
12-40 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
You can use the isrecovery tool to back up and restore Information Server. It is important to
note that the isrecovery tool does not back up the Information Server software. To restore
Information Server, it would be necessary to re-install Information Server and any fix packs
and patches that have been added before you attempt the restore operation. Additionally, it
is important to note that the isrecovery tool does not backup the Information Server clients.
As discussed earlier, Information Server tiers can be installed on multiple systems. When
attempting to backup Information Server, it is necessary to backup all the tiers in the same
session. While the backup is taking place, there can be no active client connections and
Information Server must be placed in maintenance mode.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Before you place Information Server in maintenance mode, you should close all user
sessions. You can use the SessionAdmin.sh command with the -ill-user-sessions option
to do this. After all sessions have been closed, you use the -set-maint-mode ON option to
place Information Server in maintenance mode. While Information Server is in
maintenance mode, non-administrative users will not be able to log into Information Server
clients.
12-42 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Backup procedure
Run the SessionAdmin.sh command to stop all Information
Server user sessions
Run the SessionAdmin.sh command to put Information
Server in maintenance mode
Run isrecovery.sh to open backup wizard
Follow the instructions in the wizard
Creates a response file
Contains Information Server system information needed for the backup
Documents what is to be backed up
Run isrecovery.sh resp <responseFile>
Backup must be performed on all domain systems where
software tiers are installed
Notes:
After Information Server is in maintenance mode, you can run isrecovery.sh to start the
backup process. Using the isrecovery.sh backup wizard, you first specify how you want to
perform the backup. This information is put into a response file. Afterwards, you can run
isrecovery.sh with the -resp option to initiate the backup.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
In the GUI, there are two sections: the Back Up sections and the Restore section. Click
Get Started in the Back Up section to begin generating a response file for a backup.
12-44 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
As you move through the backup wizard pages, you are prompted to specify different
backup options and to provide information necessary to perform the backup.
Two system folders are used by the IS Recovery tool. Both folders must be empty. The
archive directory is the location of the generated backup archive files. The work directory is
a directory used by the backup process.
Two databases can be backed up: the XMETA repository database and the Information
Analyzer database. You can choose whether to let the tool perform the backups or whether
to allow you to manually perform the backups. It you choose the latter, scripts will be
generated and put into the /Recovery/DatabaseSupport/Metadata folder.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The IS Recovery tool backs up the set of crucial Information Server files. You can in
addition have the tool backup additional files you consider important. These might include
log files, QualityStage reference files, and sequential files used by DataStage jobs. The
additional files are listed in a text file. Each line of the text file provides a path to one of the
files. In the IS Recovery wizard, you specify the name and path to this text file.
The IS Recovery tool wizard generates a response file. It does not itself perform the
backup. After the response file is generated, you can exit the wizard and run the
isrecovery.sh resp /Recovery/recovery_backup.xml command to perform the actual
backup.
12-46 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
The restore procedure works in a similar procedure. Click Get Started under Restore to
begin the recovery process. Just as for the backup, the IS Recovery tool wizard generates
a response file. It does not itself perform the restore. After the response file is generated,
you can exit the wizard and run the isrecovery.sh resp
/Recovery/recovery_restore.xml command to perform the actual restore.
The wizard collects the information needed to perform the restore. Before you perform the
restore, the computers in which the recovery is performed and the Information Server
installation software must match what it was at the time of the initial installation, plus any
additional fix packs and patches that have been installed.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The restoration will configure Information Server as it was configured at the time of the
backup, and it will restore the objects in the XMETA and Information Analyzer repositories
at the time of the backup. Additional files you listed for backup will also be restored.
After the response file is generated, you can exit the wizard and run the isrecovery.sh
resp /Recovery/recovery_restore.xml command to perform the actual restore.
12-48 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The growth of the Information Server respository databases (XMETA and the Information
Analyzer databases) needs to be monitored and planned for.
You should assume that XMETA will continue to grow over time, as more and more objects
are created and stored in it. These objects include Information Server produced objects,
such as DataStage jobs, logging events data, and metadata, including operational
metadata and BI metadata imported into the Repository using Metadata Asset Manager.
12-50 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
51
Copyright IBM Corporation 2007, 2012
Notes:
Information Analyzer generally uses a database separate from XMETA to store its analysis
results. By default, this database is named IADB. Initially, this IADB is empty. Tables to
store the analysis results are created when an analysis is initiated.
It is difficult to predict the growth of the IADB database, since this depends on how
Information is used and how much it is used. Regular monitoring of this database is
recommended to determine the growth pattern.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
XMETA and IADB can be on the same database server instance but in
different databases
Typical configuration
Default configuration
XMETA and IADB can be on two different database server instances,
one using DB2, the other Oracle
Supported configuration, some customers configure deployment this way
XMETA and IADB are developed using two different application access
designs
XMETA is designed as Object-Relational database
IADB is designed as a 3NF Relational database
52
Copyright IBM Corporation 2007, 2012
Notes:
XMETA and IADB can be located in the same database, with different schemas, but this is
not recommended for performance reasons. XMETA and IADB have different
characteristics in terms of sizing, change frequency, and performance.
There are two different design approaches used in table creation for XMETA and IADB.
XMETA is designed as an Object-Relational database. IADB is designed as a 3NF
relational database.
12-52 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
IADB sizing
Size of Information Analysis Database depends on source
system analysis requirements
Sampled vs. actual data
Actual requires more storage
Total size of all analyzed source data
Retention policy for existing analysis results and baselines
Recommendation:
Start with minimum of 300GB
Plan for four times the size of total source data
Detailed IADB sizing formula is available in
Information Server Capacity Planning Overview
53
Copyright IBM Corporation 2007, 2012
Notes:
The size of IADB depends on the source system analysis requirements. If samples of data
can be used instead of the actual data, then less storage will be needed. Another factor is
the retention policy for the analysis results. A longer term retention policy will obviously
require more storage than a shorter term retention policy.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
12-54 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
This unit focuses on Engine High Availability (HA) solutions. Information Server also has
HA solutions for the Services and Repository tiers as well.
HA uses redundancy to increase the availability of the Engine. HA ensures that if an
Engine system goes down, an alternative Engine system can take over. This eliminates
single points of failure. If one Engine system goes down, there will always be another
Engine system that can take over. In order for the system to go down as a whole, multiple
Engines systems must fail at the same time.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Active-Passive topology
IS software is installed on a file system shared by multiple
computers
HA software is used to cluster the computers
Active-Passive model
The active Server hosts the IS Server instance
The passive Server or Servers are started but not running IS
HA software on all Servers maintains a heartbeat
Sent from the active Server to the passive Servers periodically
Indicates to the passive Server that the active Server is still active
When the active Server fails (heartbeat ends), the HA software
restarts IS on the passive Server (which then becomes the new active
Server
Notes:
Information Server software is installed on a file system shared by multiple computers. The
HA software is used to cluster the computers. At any given time, one of the computers is
active, that is, it hosts the running DataStage Server instance. The other computers in the
cluster are passive; they are running but not hosting the DataStage Server instance.
HA software on all the computers in the cluster maintains a heartbeat. The heartbeat
informs the passive computers that the active computer is still active. If the active computer
goes down, the heartbeat is not sent. A passive computer then restarts Information Server,
thereby becoming the new active computer.
12-56 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
HA Active-Passive model
Active Passive
Server Server
Heartbeat
Notes:
This graphic illustrates an HA cluster. Notice that the active server in this diagram is
running the Engine, Services, and Database software tiers. the passive Server is running
with the HA management software, but the Information Server software is not running on it.
In this configuration, there are only two computers: one active and one passive. You can
add additional passive computers increases the redundancy.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Installation configuration
Host name alias that will always refer to the active Server
Alias moves between the active and passive systems
Clients connect using the alias
IS services are unavailable during the period between the time
of the initial active Server failure to when the new Server
(formerly passive) is operational
Client connections are broken and need to be reestablished
Running DataStage jobs abort and would need to be reset and
restarted
Notes:
The active Server is referred to by a Host name alias. This alias is always used to refer to
the active Server. If the active Server goes down, the alias is moved to the passive
computer chosen to be the next active computer.
It is important to realize that when the active computer goes down, DataStage stops for a
time, until the new active computer restarts it. This means that any DataStage jobs that
were running at the time of the failure will have aborted. When the cluster comes back up,
they will need to be reset and restarted. The HA solution reduces downtime; it does not
completely eliminate it.
12-58 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Engine HA
DataStage parallel Engine supports distributed job processing
DataStage parallel jobs can run on multiple nodes
Nodes can be associated with processors on different computers
connected over a network (grid)
Resource manager software can be used to dynamically reassign the
nodes used to run a job to those that are active
When jobs fail (because an active Server goes down)
The resource manager creates a new configuration file to run the failed job
only on nodes that are now active
IS supports grid implementations on Red Hat Enterprise Linux only using IBM
LoadLeveler resource management software
Notes:
The DataStage parallel Engine supports distributed job processing. That is, DataStage jobs
can be running on multiple nodes associated with multiple physical computer systems.
If a job fails, resource manager software can be used to dynamically reassign the nodes
used to run the job to those that are associated with computers that are running. It does
this by dynamically creating a new configuration file.
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Checkpoint
1. Can more than one DataStage Server exist in the same
Information Server domain?
2. What HA solutions are available for Information Server?
3. What do you need to install a fix pack?
4. In HA, what is the purpose of the host name alias?
5. What is maintenance mode?
6. What command is used to backup (or restore) Information
Server?
Notes:
Write your answers here:
12-60 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Exercise 12
In this lab exercise, you will:
Put Information Server into maintenance
mode
Use IS Recovery to backup Information
Server
Use IS Recovery to restore Information
Server
Take Information Server out of
maintenance mode
Notes:
Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit summary
Having completed this unit, you should be able to:
Install and deploy Information Server
Install fix packs and patches
Backup and restore Information Server
Describe the Engine High Availability option
Notes:
12-62 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Unit objectives
After completing this unit, you should be able to:
View audit trace files on the server
View audit trace files on the client
Generate an ISA Lite Basic System summary report
Generate an ISA Lite PX Engine Configuration Test report
Notes:
13-2 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Audit tracing
Helps determine the action being performed at a point of
failure
When the action occurred
User that initiated the action
Two areas of auditing:
Server Audit Tracing
Includes project creation and deletion
Client Audit Tracing
Includes Client login and logout, compilation, and so on
Notes:
If failures occur there are several sources of information you can look at for clues. Audit
tracing helps determine the action being performed at a point of failure. There are two
areas of auditing: Server audit tracing and Client audit tracing. Each provides useful
information.
Notes:
Server audit tracing traces when projects are created and deleted, and it provides
information about each of these events that occurs. The information is contained in the
/InformationServer/Server/DSEngine/DSAuditTrace.log file.
After the file header, which is generated when the audit file is created, each event is
recorded. This file will continue to grow as new events are recorded. You can delete the file
at any time. If you do, the file will be recreated when the next audit event occurs.
13-4 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
After the file heading, the file records both project creation and project deletion messages.
Samples of these are shown. A graphic example of the file is displayed on the next page.
The format of the audit messages is displayed here. There are several lines of messages
recorded for each event. The information displayed includes when the DataStage project
was created or deleted, what its name is, the name of the system hosting the project, and
error messages if applicable.
Project
creation
Project
creation
Copyright IBM Corporation 2007, 2012
Notes:
This graphic shows part of a sample DSQuditTrace.log file. The first row is the heading. It
identifies the Engine and provides information about its system.
Following the header are project creation messages. Two sets of messages are
high-lighted. The first provides information about the creation of the DataStage project
named ANALYZERPROJECT, which is a project created during Information Server
installation for use by Information Analyzer. The second set of high-lighted messages
provides information about the creation of a project named DSProject.
13-6 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
Client audit tracing covers the main actions the DataStage client performs, including login,
logout, import, export, and job compilation. The trace information goes into the existing
dstage_wrapper_trace.log files used by the DataStage clients.
To locate the directory containing the files, start at the Windows home directory of
DataStage user. For example, if the user is student, on the Client image, in Windows
Explorer, open the Documents and Settings>student>ds_logs folder. The folder
contains a number of log files.
Notes:
Shown in this graphic is an example of one of the client trace files. This one is named
dstage_wrapper_trace_20.log. The user on this system in this example is student. The
path to this log file is C:\Documents and
Settings\student\ds_logs\dstage_wrapper_trace_20.log.
From the log file shown here, we can determine that several jobs were opened and
compiled and then closed.
13-8 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
ISA Lite
Notes:
ISA Lite
Provides the ability to gather problem data and diagnose
issues across the Information Server suite
Recommended method of gathering customer problem data
The ISA Lite tool will retrieve information from the DataStage
Server audit trace file:
<IS_HOME>/Server/DSEngine/DSAuditTrace.log
The ISA Lite tool will also retrieve information from any report
archive files generated:
<USER_HOME>\Application Data\IBM\Information
Server\DataStage Client\<client-tag>\Error Reports\*.zip
The ISA Lite tool also incorporates the DataStage SyncProject
tool to aid in determining and resolving DataStage project
inconsistencies
Notes:
ISA Lite provides the ability to gather problem data and diagnose issues across the
Information Server suite. ISA retrieves information from a variety of sources including the
audit trace files.
ISA Lite can also be helpful during the installation and testing of Information Server. You
can use it to check whether your system has the prerequisites necessary for the
installation. You can use it to verify an installation after it has been performed.
ISA Lite is also used when submitting problems to the IBM Information Server Support
staff. The data generated from ISA Lite can be sent to IBM Support to aid them in
diagnosing and solving the problem.
13-10 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
ISA Lite also has functionality for restoring corrupt DataStage projects. The existing
architecture of DataStage involves the inclusion of two repositories, XMETA and the
DSEngine repository. Sometimes these repositories can get out of sync. ISALite can be
used to test the repositories and, if necessary, to restore them.
0 Issues Found.
Overall Summary
---------------
2 Issues found.
Notes:
This graphic shows an example of sync project report generated in ISA Lite. In this
example, several DataStage projects were examined by ISA Lite for problems.
Two issues were found in the DataStage project named dstage9. In the first case, the
XMETA repository contains a DataStage job named testjob. But the corresponding
DSEngine repository project is missing that job. In the second case, there is a disparity in
how a job property is named in the two repositories.
13-12 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
ISA Lite is opened from the command line. On the Server, open a terminal. Execute the
command to change to the /IBM/InformationServer/ISALite directory, for example: cd
/opt/IBM/InformationServer/ISALite. Then run ISA Lite by executing the following
command: ./runISALite.sh.
You need root authority to use ISA Lite.
Select
data
collection
option
Path to
collection
file
Start collecting
data
Copyright IBM Corporation 2007, 2012
Notes:
The ISA Lite opening window lists problems it can collect information about. You first select
the type of problem. In this example, a Basic System Summary report will be generated.
Next you specify the file name for the collected data. The generated file will consist of a
compressed .zip file.
When the tool runs it will prompt you for additional information as needed, such as the
Information Server home directory. You will also have the option of transferring the
information to IBM Support.
13-14 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Notes:
The ISA Lite results zip file contains a summary report file, SYSTEM-SUMMARY.html file.
An example of this file is shown here. The report consists of a table of contents with links to
different sections of information.
Checkpoint
1. What information does the DSAuditTrace.log files contain?
2. What tool is the recommended method of gathering customer
problem data?
Notes:
Write your answers here:
13-16 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0
Student Notebook
Uempty
Exercises Unit 13
In this lab exercise, you will:
View audit trace files on the Server
View audit trace files on the Client
Generate an ISA Lite Basic System
summary Report
Generate an ISA Lite PX Engine
Configuration Test Report
Notes:
Unit summary
Having completed this unit, you should be able to:
View audit trace files on the server
View audit trace files on the client
Generate an ISA Lite Basic System summary report
Generate an ISA Lite PX Engine Configuration Test report
Notes:
13-18 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
backpg
Back page