SG 247845
SG 247845
SG 247845
IBM PowerHA
SystemMirror 7.1
for AIX
Learn how to plan for, install, and configure
PowerHA with the Cluster Aware AIX component
Dino Quintero
Shawn Bodily
Brandon Boles
Bernhard Buehler
Rajesh Jeyapaul
SangHee Park
Minh Pham
Matthew Radford
Gus Schlachter
Stefan Velica
Fabiano Zimmermann
ibm.com/redbooks
International Technical Support Organization
March 2011
SG24-7845-00
Note: Before using this information and the product it supports, read the information in “Notices” on
page ix.
This edition applies to the IBM PowerHA SystemMirror Version 7.1 and IBM AIX Version 6.1 TL6 and 7.1 as
the target.
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
The team who wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Contents v
9.4.1 The rootvg system event. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
9.4.2 Testing the loss of the rootvg volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
9.4.3 Loss of rootvg: What PowerHA logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
9.5 Simulation of a crash in the node with an active resource group . . . . . . . . . . . . . . . . 289
9.6 Simulations of CPU starvation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
9.7 Simulation of a Group Services failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
9.8 Testing a Start After resource group dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
9.8.1 Testing the standard configuration of a Start After resource group dependency 298
9.8.2 Testing application startup with Startup Monitoring configured. . . . . . . . . . . . . . 298
9.9 Testing dynamic node priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
Chapter 11. Installing IBM Systems Director and the PowerHA SystemMirror plug-in .
325
11.1 Installing IBM Systems Director Version 6.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
11.1.1 Hardware requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
11.1.2 Installing IBM Systems Director on AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
11.1.3 Configuring and activating IBM Systems Director. . . . . . . . . . . . . . . . . . . . . . . 328
11.2 Installing the SystemMirror plug-in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
11.2.1 Installing the SystemMirror server plug-in. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
11.2.2 Installing the SystemMirror agent plug-in in the cluster nodes . . . . . . . . . . . . . 330
11.3 Installing the clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
11.3.1 Installing the common agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
11.3.2 Installing the PowerHA SystemMirror agent . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
Chapter 12. Creating and managing a cluster using IBM Systems Director . . . . . . . 333
12.1 Creating a cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
12.1.1 Creating a cluster with the SystemMirror plug-in wizard . . . . . . . . . . . . . . . . . . 334
12.1.2 Creating a cluster with the SystemMirror plug-in CLI . . . . . . . . . . . . . . . . . . . . 339
12.2 Performing cluster management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
12.2.1 Performing cluster management with the SystemMirror plug-in GUI wizard. . . 341
12.2.2 Performing cluster management with the SystemMirror plug-in CLI. . . . . . . . . 347
12.3 Creating a resource group with the SystemMirror plug-in GUI wizard . . . . . . . . . . . 349
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator . . 419
14.1 Planning for TrueCopy/HUR management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
14.1.1 Software prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
14.1.2 Minimum connectivity requirements for TrueCopy/HUR . . . . . . . . . . . . . . . . . . 420
14.1.3 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
14.2 Overview of TrueCopy/HUR management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
14.2.1 Installing the Hitachi CCI software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
14.2.2 Overview of the CCI instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
14.2.3 Creating and editing the horcm.conf files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
14.3 Scenario description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
14.4 Configuring the TrueCopy/HUR resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
14.4.1 Assigning LUNs to the hosts (host groups). . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
14.4.2 Creating replicated pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
14.4.3 Configuring an AIX disk and dev_group association. . . . . . . . . . . . . . . . . . . . . 443
Contents vii
14.4.4 Defining TrueCopy/HUR managed replicated resource to PowerHA . . . . . . . . 451
14.5 Failover testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
14.5.1 Graceful site failover for the Austin site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
14.5.2 Rolling site failure of the Austin site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
14.5.3 Site re-integration for the Austin site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
14.5.4 Graceful site failover for the Miami site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
14.5.5 Rolling site failure of the Miami site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
14.5.6 Site re-integration for the Miami site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
14.6 LVM administration of TrueCopy/HUR replicated pairs. . . . . . . . . . . . . . . . . . . . . . . 463
14.6.1 Adding LUN pairs to an existing volume group . . . . . . . . . . . . . . . . . . . . . . . . . 463
14.6.2 Adding a new logical volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
14.6.3 Increasing the size of an existing file system . . . . . . . . . . . . . . . . . . . . . . . . . . 468
14.6.4 Adding a LUN pair to a new volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs.
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
AIX® Lotus® Redpaper™
DB2® Power Systems™ Redbooks (logo) ®
Domino® POWER5™ solidDB®
DS4000® POWER6® System i®
DS6000™ POWER7™ System p®
DS8000® POWER7 Systems™ System Storage®
Enterprise Storage Server® PowerHA™ Tivoli®
FileNet® PowerVM™ TotalStorage®
FlashCopy® POWER® WebSphere®
HACMP™ pureScale™ XIV®
IBM® Redbooks®
Snapshot, NetApp, and the NetApp logo are trademarks or registered trademarks of NetApp, Inc. in the U.S.
and other countries.
Java, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other
countries, or both.
Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States,
other countries, or both.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
IBM® PowerHA™ SystemMirror 7.1 for AIX® is a major product announcement for IBM in the
high availability space for IBM Power Systems™ Servers. This release now has a deeper
integration between the IBM high availability solution and IBM AIX. It features integration with
the IBM Systems Director, SAP Smart Assist and cache support, the IBM System Storage®
DS8000® Global Mirror support, and support for Hitachi storage.
This IBM Redbooks® publication contains information about the IBM PowerHA SystemMirror
7.1 release for AIX. This release includes fundamental changes, in particular departures from
how the product has been managed in the past, which has necessitated this Redbooks
publication.
This Redbooks publication highlights the latest features of PowerHA SystemMirror 7.1 and
explains how to plan for, install, and configure PowerHA with the Cluster Aware AIX
component. It also introduces you to PowerHA SystemMirror Smart Assist for DB2®. This
book guides you through migration scenarios and demonstrates how to monitor, test, and
troubleshoot PowerHA 7.1. In addition, it shows how to use IBM Systems Director for
PowerHA 7.1 and how to install the IBM Systems Director Server and PowerHA SystemMirror
plug-in. Plus, it explains how to perform disaster recovery using DS8700 Global Mirror and
Hitachi TrueCopy and Universal Replicator.
This publication targets all technical professionals (consultants, IT architects, support staff,
and IT specialists) who are responsible for delivering and implementing high availability
solutions for their enterprise.
Dino Quintero is a Project Leader and IT generalist with the ITSO in Poughkeepsie, NY. His
areas of expertise include enterprise continuous availability planning and implementation,
enterprise systems management, virtualization, and clustering solutions. He is currently an
Open Group Master Certified IT Specialist - Server Systems. Dino holds a Master of
Computing Information Systems degree and a Bachelor of Science degree in Computer
Science from Marist College.
Rajesh Jeyapaul is the technical lead for IBM Systems Director Power Server management.
His focus is on improving PowerHA SystemMirror, DB2 pureScale™, and the AIX Runtime
Expert plug-in for System Director. He has worked extensively with customers and
specialized in performance analysis under the IBM System p® and AIX environment. His
areas of expertise includes IBM POWER® Virtualization, high availability, and system
management. He has coauthored DS8000 Performance Monitoring and Tuning, SG24-7146,
and Best Practices for DB2 on AIX 6.1 for POWER Systems, SG24-7821. Rajesh holds a
Master in Software Systems degree from the University of BITS, India, and a Master of
Business Administration (MBA) degree from the University of MKU, India.
SangHee Park is a Certified IT Specialist in IBM Korea. He is currently working for IBM
Global Technology Services in Maintenance and Technical Support. He has 5 years of
experience in Power Systems. His areas of expertise include AIX, PowerHA SystemMirror,
and PowerVM™ Virtualization. SangHee holds a bachelor degree in aerospace and
mechanical engineering from Korea Aerospace University.
Minh Pham is currently a Development Support Specialist for PowerHA and HACMP in
Austin, Texas. She has worked for IBM for 10 years, including 6 years in System p
microprocessor development and 4 years in AIX development support. Her areas of expertise
include core and chip logic design for System p and AIX with PowerHA. Minh holds a
Bachelor of Science degree in Electrical Engineering from the University of Texas at Austin.
Matthew Radford is a Certified AIX Support Specialist in IBM UK. He is currently working for
IBM Global Technology Services in Maintenance and Technical Support. He has worked at
IBM for 13 years and is a member of the UKI Technical Council. His areas of expertise include
AIX, and PowerHA. Matthew coauthored Personal Communications Version 4.3 for Windows
95, 98 and NT, SG24-4689. Matthew holds a Bachelor of Science degree in Information
Technology from the University of Glamorgan.
Gus Schlachter is a Development Support Specialist for PowerHA in Austin, TX. He has
worked with HACMP for over 15 years in support, development, and testing. Gus formerly
worked for CLAM/Availant and is an IBM-certified Instructor for HACMP.
Stefan Velica is an IT Specialist who is currently working for IBM Global Technologies
Services in Romania. He has five years of experience in Power Systems. He is a Certified
Specialist for IBM System p Administration, HACMP for AIX, High-end and Entry/Midrange
DS Series, and Storage Networking Solutions. His areas of expertise include IBM System
Storage, PowerVM, AIX, and PowerHA. Stefan holds a bachelor degree in electronics and
telecommunications engineering from Politechnical Institute of Bucharest.
Bob Allison
Catherine Anderson
Chuck Coleman
Bill Martin
Darin Meyer
Keith O'Toole
Ashutosh Rai
Hitachi Data Systems
David Bennin
Ella Buslovich
Richard Conway
Octavian Lascu
ITSO, Poughkeepsie Center
Patrick Buah
Michael Coffey
Mark Gurevich
Felipe Knop
Paul Moyer
Skip Russell
Stephen Tovcimak
IBM Poughkeepsie
Eric Fried
Frank Garcia
Kam Lee
Gary Lowther
Deb McLemore
Ravi A. Shankar
Preface xiii
Stephen Tee
Tom Weaver
David Zysk
IBM Austin
Nick Fernholz
Steven Finnes
Susan Jasinski
Robert G. Kovacs
William E. (Bill) Miller
Rohit Krishna Prasad
Ted Sullivan
IBM USA
Philippe Hermes
IBM France
Manohar R Bodke
Jes Kiran
Anantoju Srinivas
IBM India
Claudio Marcantoni
IBM Italy
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
Send your comments in an email to:
redbooks@us.ibm.com
Preface xv
xvi IBM PowerHA SystemMirror 7.1 for AIX
1
For an introduction to high availability and IBM PowerHA SystemMirror 7.1, see the “IBM
PowerHA SystemMirror for AIX” page at:
http://www.ibm.com/systems/power/software/availability/aix/index.html
This section provides an overview of RSCT, its components, and the communication paths
between these components. Several helpful IBM manuals, white papers, and Redbooks
publications are available about RSCT. This section focuses on the components that affect
PowerHA SystemMirror.
To find the most current documentation for RSCT, see the RSCT library in the IBM Cluster
Information Center at:
http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.
cluster.rsct.doc%2Frsctbooks.html
For a more detailed description of the RSCT components, see the IBM Reliable Scalable
Cluster Technology: Administration Guide, SA22-7889, at the following web address:
http://publibfp.boulder.ibm.com/epubs/pdf/22788919.pdf
As shown in Figure 1-1 on page 3, RSCT 3.1 can operate without CAA in “non-CAA” mode.
You use the non-CAA mode if you use one of the following products:
PowerHA versions before PowerHA 7.1
A mixed cluster with PowerHA 7.1 and older PowerHA versions
Existing RSCT Peer Domains (RPD) that were created before RSCT 3.1 was installed
A new RPD, when you specify during creation that the system must not use or create a
CAA cluster
Figure 1-1 shows both modes in which RSCT 3.1 can be used (with or without CAA). The left
part shows the non-CAA mode, which is equal to the older RSCT versions. The right part
shows the CAA-based mode. The difference between these modes is that Topology Services
has been replaced with CAA.
Resource
Resource
Topology
Services
CAA
AIX AIX
RSCT 3.1 is available for both AIX 6.1 and AIX 7.1. To use CAA, for RSCT 3.1 on AIX 6.1, you
must have TL 6 or later installed.
CAA on AIX 6.1 TL 6: The use of CAA on AIX 6.1 TL 6 is enabled only for PowerHA 7.1.
The main communication goes from PowerHA to Group Services (grpsvcs), then to Topology
Services (topsvcs), and back to PowerHA. The communication path from PowerHA to RMC is
used for PowerHA Process Application Monitors. Another case where PowerHA uses RMC is
when a resource group is configured with the Dynamic Node Priority policy.
Example 1-1 lists the cluster processes on a running PowerHA 7.1 cluster.
Group Services subsystem name: Group Services now uses the subsystem name
cthags, which replaces grpsvcs. Group Services is now started with a different control
script (cthags) and in turn from a different subsystem name cthags.
File sets: CAA is provided by the non-PowerHA file sets bos.cluster.rte, bos.ahafs, and
bos.cluster.solid. The file sets are on the AIX Install Media or in the TL6 of AIX 6.1.
More information: For more information about CAA, see Cluster Management,
SC23-6779, and the IBM AIX Version 7.1 Differences Guide, SG24-7910.
CAA provides a set of tools and APIs to enable clustering on the AIX operating system. CAA
does not provide the application monitoring and resource failover capabilities that PowerHA
provides. PowerHA uses the CAA capabilities. Other applications and software programs can
use the APIs and command-line interfaces (CLIs) that CAA provides to make their
applications and services “Cluster Aware” on the AIX operating system.
Figure 1-2 on page 4 illustrates how applications can use CAA. The following products and
parties can use CAA technology:
RSCT (3.1 and later)
PowerHA (7.1 and later)
VIOS (CAA support in a future release)
Third-party ISVs, service providers, and software products
The following sections explain the concepts of the CAA central repository, RSCT changes,
and how PowerHA 7.1 uses CAA.
In a three-node cluster configuration, the third node acts as a standby for the
other two nodes. The solid subsystem (solid and solidHAC) is not running,
and the file systems (/clrepos_private1 and /clrepos_private2) are not
mounted.
If a failure occurs on the primary or secondary nodes of the cluster, the third
node activates the solid subsystem. It mounts either the primary or secondary
file system, depending on the node that has failed. See 1.2.3, “The central
repository” on page 9, for information about file systems.
clconfd The clconfd subsystem runs on each node of the cluster. The clconfd daemon
wakes up every 10 minutes to synchronize any necessary cluster changes.
For information about the RSCT changes, see 1.1.2, “Architecture changes for RSCT 3.1” on
page 3.
CAA repository disk: The CAA repository disk is reserved for use by CAA only. Do not
attempt to change any of it. The information in this chapter is provided for information only
to help you understand the purpose of the new disk and file system structure.
Figure 1-6 shows an overview of the CAA repository disk and its structure.
If you installed and configured PowerHA 7.1, your cluster repository disk is displayed as
varied on (active) in lspv output as shown in Figure 1-7 on page 10. In this figure, the disk
label has changed to caa_private0 to remind you that this disk is for private use by CAA only.
Figure 1-7 on page 10 also shows a volume group, called caavg_private, which must always
be varied on (active) when CAA is running. CAA is activated when PowerHA 7.1 is installed
If you have a configured cluster and find that caavg_private is not varied on (active), your
CAA cluster has a potential problem. See Chapter 10, “Troubleshooting PowerHA 7.1” on
page 305, for guidance about recovery in this situation.
chile:/ # lspv
hdisk1 000fe4114cf8d1ce None
caa_private0 000fe40163c54011 caavg_private active
hdisk3 000fe4114cf8d2ec None
hdisk4 000fe4114cf8d3a1 diskhb
hdisk5 000fe4114cf8d441 None
hdisk6 000fe4114cf8d4d5 None
hdisk7 000fe4114cf8d579 None
hdisk8 000fe4114cf8d608 ny_datavg
hdisk0 000fe40140a5516a rootvg active
Figure 1-7 lspv command showing the caa_private repository disk
You can view the structure of caavg_private from the standpoint of a Logical Volume
Manager (LVM) as shown in Figure 1-8. The lsvg command shows the structure of the file
system.
This file system has a special reserved structure. CAA mounts some file systems for its own
use as shown in Figure 1-9 on page 11. The fslv00 file system contains the solidDB
database mounted as /clrepos_private1 because the node is the primary node of the
cluster. If you look at the output for the second node, you might have /clrepos_private2
mounted instead of /clrepos_private1. See 1.2.1, “CAA daemons” on page 8, for an
explanation of the solid subsystem.
Important: CAA creates a file system for solidDB on the default lv name (fslv00, fslv01).
If you have a default name of lv for existing file systems that is outside of CAA, ensure that
both nodes have the same lv names. For example, if node A has the names fslv00,
fslv01, and fslv02, node B must have the same names. You must not have any default lv
names in your cluster nodes so that CAA can use fslv00, fslv01 for the solidDB.
Also a /aha, which is a special pseudo file system, is mounted in memory and used by the
AHAFS. See “Autonomic Health Advisor File System” on page 11 for more information.
For more information about CAA, see Cluster Management, SC23-6779, at the following web
address:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.clusteraware/c
lusteraware_pdf.pdf
chile:/ # mount
node mounted mounted over vfs date options
-------- --------------- --------------- ------ ------------ ---------------
/dev/hd4 / jfs2 Sep 30 13:37 rw,log=/dev/hd8
/dev/hd2 /usr jfs2 Sep 30 13:37 rw,log=/dev/hd8
/dev/hd9var /var jfs2 Sep 30 13:37 rw,log=/dev/hd8
/dev/hd3 /tmp jfs2 Sep 30 13:37 rw,log=/dev/hd8
/dev/hd1 /home jfs2 Sep 30 13:38 rw,log=/dev/hd8
/dev/hd11admin /admin jfs2 Sep 30 13:38 rw,log=/dev/hd8
/proc /proc procfs Sep 30 13:38 rw
/dev/hd10opt /opt jfs2 Sep 30 13:38 rw,log=/dev/hd8
/dev/livedump /var/adm/ras/livedump jfs2 Sep 30 13:38
rw,log=/dev/hd8
/aha /aha ahafs Sep 30 13:46 rw
/dev/fslv00 /clrepos_private1 jfs2 Sep 30 13:52
rw,dio,log=INLINE
Figure 1-9 AHAFS file system mounted
The event information is retrieved from CAA, and any changes are communicated by using
AHAFS events. RSCT Group Services uses the AHAFS services to obtain events on the
Gossip protocol: The gossip protocol determines the node configuration and then
transmits the gossip packets over all available networking and storage communication
interfaces. If no storage communication interfaces are configured, only the traditional
networking interfaces are used. For more information, see “Cluster Aware concepts” at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.clusterawar
e/claware_concepts.htm
IP network interfaces
IBM PowerHA communicates over available IP interfaces using a multicast address. PowerHA
use all IP interfaces that are configured with an address and are in an UP state as long as
they are reachable across the cluster.
Cluster communication requires the use of a multicast IP address. You can specify this
address when you create the cluster, or you can have one generated automatically when you
synchronize the initial cluster configuration.
An overlap of the multicast addresses might be generated by default in the case of two
clusters with interfaces in the same virtual LAN (VLAN). This occurs when their IP
addresses are similar to the following example:
x1.y.z.t
x2.y.z.t
The netmon.cf configuration file is not required with CAA and PowerHA 7.1.
The range 224.0.0.0–224.0.0.255 is reserved for local purposes, such as administrative and
maintenance tasks. The data that they receive is never forwarded by multicast routers.
Similarly, the range 239.0.0.0–239.255.255.255 is reserved for administrative scooping.
These special multicast groups are regularly published in the assigned numbers RFC.1
If multicast traffic is present in the adjacent network, you must ask the network administrator
for multicast IP address allocation for your cluster. Also, ensure that the multicast traffic
generated by any of the cluster nodes is properly forwarded by the network infrastructure
toward the other cluster nodes. The Internet Group Management Protocol (IGMP) must be
enabled.
Interface states
Network interfaces can have any of the following common states. You can see the interface
state in the output of the lscluster -i command, as shown in Example 1-2 on page 16.
UP The interface is up and active.
STALE The interface configuration data is stale, which happens when
communication has been lost, but was previously up at some point.
DOWN SOURCE HARDWARE RECEIVE / SOURCE HARDWARE TRANSMIT
The interface is down because of a failure to receive or transmit, which
can happen in the event of a cabling problem.
DOWN SOURCE SOFTWARE
The interface is down in AIX software only.
Enabling SAN fiber communication: To enable SAN fiber communication for cluster
communication, you must configure the Target Mode Enable attribute for FC adapters. See
Example 4-4 on page 57 for details.
The Virtual SCSI (VSCSI) SAN heartbeat depends on VIOS 2.2.0.11-FP24 SP01.
Interface state
The SAN-based communication (SFWCOM) interface has one state available, the UP state.
The UP state indicates that the SFWCOM interface is active. You can see the interface state
in the output of the lscluster -i command as shown in Example 1-2 on page 16.
When the underlying hardware infrastructure is available, you can proceed with the PowerHA
cluster topology configuration. The heartbeat starts right after the first successful “Verify and
Synchronization” operation, when the CAA cluster is created and activated by the PowerHA.
Interface states
The Central cluster repository-based communication (DPCOM) interface has the following
available states. You can see the interface state in the output of the lscluster -i command,
which is shown in Example 1-2.
UP AIX_CONTROLLED
Indicates that the interface is UP, but under AIX control. The user
cannot change the status of this interface.
UP RESTRICTED AIXCONTROLLED
Indicates that the interface is UP and under AIX system control, but is
RESTRICTED from monitoring mode.
STALE The interface configuration data is stale. This state occurs when
communication is lost, but was up previously at some point.
When the system determines that the node has lost the normal network or storage interfaces,
the system activates (unrestrict) the cluster repository disk interface (dpcom) and begins
using it for communications. At this point, the interface state changes to UP AIX_CONTROLLED
(unrestricted, but still system controlled).
Point of contact
The output of the lscluster -m command shows a reference to a point of contact as shown in
Example 1-3 on page 19. The local node is displayed as N/A, and the remote node is
displayed as en0 UP. CAA monitors the state and points of contact between the nodes for both
communication interfaces.
A point of contact indicates that a node has received a packet from the other node over the
interface. The point-of-contact status UP indicates that the packet flow is continuing. The
point-of-contact monitor tracks the number of UP points of contact for each communication
interface on the node. If this count reaches zero, the interface is marked as reachable through
the cluster repository disk only.
------------------------------
Heartbeat monitoring is performed by sending and receiving gossip packets across the
network with the multicast protocol. CAA uses heartbeat monitoring to determine
communication problems that need to be reflected in the cluster information.
The round-trip time value is shown in the output of the lscluster -i and lscluster -m
commands. The mean deviation in network rtt is the average round-trip time, which is
automatically managed by CAA. Unlike previous versions of PowerHA and HACMP, no
heartbeat tuning is necessary. See Example 1-2 on page 16 and Figure 1-11 for more
information.
Statistical projections are directly employed to compute node-down events. By using normal
network dropped packet rates and the projected round-trip times with mean deviations, the
cluster can determine when a packet was lost or not sent. Each node monitors the time when
a response is due from other nodes in the cluster. If a node finds that a node is overdue, a
node down protocol is initiated in the cluster to determine if the node is down or if network
isolation has occurred.
This algorithm is self-adjusting to load and network conditions, providing a highly reliable and
scalable cluster. Expected round-trip times and variances rise quickly when load conditions
cause delays. Such delays cause the system to wait longer before setting a node down state.
Such a state provides for a high probability of valid state information. (Quantitative
probabilities of errors can be computed.) Conversely, expected round-trip times and variances
fall quickly when delays return to normal.
A key feature of IBM Systems Director is a consistent user interface with a focus on driving
common management tasks. IBM Systems Director provides a unified view of the total IT
environment, including servers, storage, and network. With this view, users can perform tasks
with a single tool, IBM Systems Director.
To learn more about the advantages of IBM Systems Director, see the PowerHA 7.1
presentation by Peter Schenke at:
http://www-05.ibm.com/ch/events/systems/pdf/6_PowerHA_7_1_News.pdf
User Interface
Director Agent Web-based interface
Automatically installed on AIX 7.1 & AIX V6.1 TL06 Command-line
interface
AIX
P D Director
PowerHA
Agent
P D
P D
P D
Secure communication
P D
P D Director Server
Central point of control
Supported on AIX, Linux,
P D Discovery of clusters and Windows
and resources Agent manager
IP address takeover via IP aliasing is now the only supported IPAT option. SAN heartbeat,
provided by the CAA repository disk, and FC heartbeat, as described in the following section,
have replaced all point-to-point (non-IP) network types.
In PowerHA SystemMirror 7.1, the SMIT panel has the following key changes:
Separation of menus by function
Addition of the Custom Cluster Configuration menu
Removal of Extended Distance menus from the base product
Removal of unsupported dialogs or menus
Changes to some terminology
New dialog for specifying repository and cluster IP address
Many changes in topology and resource menus
Figure 2-1 The screens shown after running the smitty hacmp command
In PowerHA SystemMirror 7.1, the smitty sysmirror (or smit sysmirror) command provides
a new fast path to the PowerHA start menu in SMIT. The old fast path (smitty hacmp) is still
valid.
The “Initialization and Standard Configuration” path has been split into two paths: Cluster
Nodes and Networks and Cluster Applications and Resources. For more details about these
paths, see 2.3.4, “Cluster Standard Configuration menu” on page 29. Some features for the
Extended Configuration menu have moved to the Custom Cluster Configuration menu. For
more details about custom configuration, see 2.3.5, “Custom Cluster Configuration menu” on
page 30.
smitty sysmirror
Figure 2-2 PowerHA SMIT start panel
Although the SMIT path did not change, some of the wording has changed. For example, the
word “HACMP” was replaced with “Cluster Services.” The path with the new wording is smitty
hacmp System Management (C-SPOC) PowerHA SystemMirror Services, and then
you select either the “Start Cluster Services” or “Stop Cluster Services” menu.
Figure 2-3 The screens that are shown when running the smitty clstart command
This version has a more logical flow. The topology configuration and management part is in
the “Cluster Nodes and Networks” menu. The resources configuration and management part
is in the “Cluster Applications and Resources” menu.
Figure 2-4 shows some tasks and where they have moved to. The dotted line shows where
Smart Assist was relocated. The Two-Node Cluster Configuration Assistant no longer exists.
Snapshot Configuration
These policies are insufficient for supporting some complex applications. For example, the
FileNet application server must be started only after its associated database is started. It
does not need to be stopped if the database is brought down for some time and then started.
The Start After and Stop After dependencies use source and target resource group
terminology. The source resource group depends on the target resource group as shown in
Figure 2-8.
db_rg
Target
Start After
Source
app_rg
Figure 2-8 Start After resource group dependency
Similarly, for Stop After dependency, the target resource group must be offline on any node in
the cluster before a source (dependent) resource group can be brought offline on a node.
Resource groups are acquired in parallel and without any dependency.
A resource group can serve as both a target and a source resource group, depending on
which end of a given dependency link it is placed. You can specify three levels of
dependencies for resource groups. You cannot specify circular dependencies between
resource groups.
A Start After dependency applies only at the time of resource group acquisition. During a
resource group release, these resource groups do not have any dependencies. A Start After
source resource group cannot be acquired on a node until its target resource group is fully
functional. If the target resource group does not become fully functional, the source resource
group goes into an OFFLINE DUE TO TARGET OFFLINE state. If you notice that a resource group
is in this state, you might need to troubleshoot which resources need to be brought online
manually to resolve the resource group dependency.
When a resource group in a Start After target role falls over from one node to another, the
resource groups that depend on it are unaffected.
After the Start After source resource group is online, any operation (such as bring offline or
move resource group) on the target resource group does not affect the source resource
group. A manual resource group move or bring resource group online on the source resource
group is not allowed if the target resource group is offline.
A Stop After dependency applies only at the time of a resource group release. During
resource group acquisition, these resource groups have no dependency between then. A
Stop After source resource group cannot be released on a node until its target resource group
is offline.
When a resource group in a Stop After source role falls over from one node to another, its
related target resource group is released as a first step. Then the source (dependent)
resource group is released. Next, both resource groups are acquired in parallel, assuming
that no start after or tparent-child dependency exists between these resource groups.
A manual resource group move or bring resource group offline on the Stop After source
resource group is not allowed if the target resource group is online.
Summary: In summary, the source Start After and Stop After target resource groups have
the following dependencies:
Source Start After target: The source is brought online after the target resource group.
Source Stop After target: The source is brought offline after the target resource group.
Figure 2-9 Comparing Start After, Stop After, and parent-child resource group (rg) dependencies
If you configure a Start After dependency between two resource groups in your cluster, the
applications in these resource groups are started in the configured sequence. To ensure that
this process goes smoothly, configure application monitors and use a Startup Monitoring
mode for the application included in the target resource group.
For a configuration example, see 5.1.6, “Configuring Start After and Stop After resource
group dependencies” on page 96.
A user-defined resource type is one that you can define for a customized resource that you
can add to a resource group. A user-defined resource type contains several attributes that
describe the properties of the instances of the resource type.
When you create a user-defined resource type, you must choose processing order among
existing resource types. PowerHA SystemMirror processes the user-defined resources at the
beginning of the resource acquisition order if you choose the FIRST value. If you chose any
other value, for example, VOLUME_GROUP, the user-defined resources are acquired after varying
on the volume groups. Then they are released before varying off the volume groups. These
resources are existing resource types. You can choose from a pick list in the SMIT menu.
DISK
Acquisition Order
FILE SYSTEM
Userdefined
Resource
SERVICE IP
APPLICATION
The cluster manager queries the Resource Monitoring and Control (RMC) subsystem every
3 minutes to obtain the current value of these attributes on each node. Then the cluster
manager distributes them cluster-wide. For an architecture overview of PowerHA and RSCT,
see 1.1.3, “PowerHA and RSCT” on page 5.
The return code of a user-defined script is used in determining the destination node.
When you select one of the criteria, you must also provide values for the DNP script path and
DNP timeout attributes for a resource group. PowerHA executes the supplied script and
collects the return codes from all nodes. If you choose the cl_highest_udscript_rc policy,
collected values are sorted. The node that returned the highest value is selected as a
candidate node to fall over. Similarly, if you choose the cl_lowest_nonzero_udscript_rc
policy, collected values are sorted. The node that returned lowest nonzero positive value is
selected as a candidate takeover node. If the return value of the script from all nodes is the
same or zero, the default node priority is considered. PowerHA verifies the script existence
and the execution permissions during verification.
Time-out value: When you select a time-out value, ensure that it is within the time period
for running and completing a script. If you do not specify a time-out value, a default value
equal to the config_too_long time is specified.
For information about configuring the dynamic node priority, see 5.1.8, “Configuring the
dynamic node priority (adaptive failover)” on page 102.
To restrict people from using these commands in the command line, you can change the
default value from yes to no:
1. Locate the following line in the /etc/environment file:
CLUSTER_OVERRIDE=yes
2. Change the line to the following line:
CLUSTER_OVERRIDE=no
If the CLUSTER_OVERRIDE variable has the value no, you see an error message similar to the
one shown in Example 2-1.
In this case, use the equivalent C-SPOC CLI called cli_chfs. See the C-SPOC man page for
more details.
Deleting the CLUSTER_OVERRIDE variable: You also see the message shown in
Example 2-1 if you delete the CLUSTER_OVERRIDE variable in your /etc/environment file.
The passive state allows only read access to a volume group special file and the first 4 KB of
a logical volume. Write access through standard LVM is not allowed. However, low-level
commands, such as dd, can bypass LVM and write directly to the disk.
The new CAA disk fencing feature prevents writes from any other nodes to the disk device,
invalidating the potential for a lower-level operation, such as dd, to succeed. However, any
system that has access to that disk might be a member of the CAA cluster. Therefore, its still
important to zone the storage appropriately so that only cluster nodes have the disks
configured.
The PowerHA SystemMirror 7.1 announcement letter explains this fencing feature as a
storage framework that is embedded in the operating system to aid in storage device
management. As part of the framework, fencing disks or disk groups are supported. Fencing
shuts off write access to the shared disks from any entity on the node (irrespective of the
privileges associated with the entity trying to access the disk). Fencing is exploited by
PowerHA SystemMirror to implement strict controls in regard to shared disks and their access
solely from one the nodes that is sharing the disk. Fencing ensures that, when the workload
moves to another node for continuing operations, access to the disks on the departing node is
turned off for write operations.
1)rg_move_acquire
r
es
Start
Cluste
process_resources (NONE)
lls
servic
RC process_resources (SERVICE_LABELS)
acquire_service_addr
acquire_aconn_service en0 net_ether_01
clstrmgrES process_resources (DISKS)
process_resources (VGS)
Event process_resources (LOGREDO)
cal process_resources (FILESYSTEMS)
Manager ls process_resources (SYNC_VGS)
RC process_resources (TELINIT)
process_resources (NONE)
< Event Summary >
2) rg_move_complete
for each RG: process resources (APPLICATIONS)
start_server app01
process_resources (ONLINE)
process_resources (NONE)
< Event Summary >
The following section explains what happens when a subsequent node joins the cluster.
Example 2-3 Debug file showing the process of another node joining the cluster
Debug file:
[TE_JOIN_NODE_DEP] r
[TE_RG_MOVE_ACQUIRE]
[TE_JOIN_NODE_DEP_COMPLETE]i
g
n nin
ru t
ar
clstrmgrES clstrmgrES St ster s
u e
Event Messages Event Cl rvic
Manager Manager ca s e
ll 1) rg_move_release
ca C
1) rg_move_release
ll
R
R
C
nothing
fallback to higher node
call
2) rg_move_acquire
(see node leaves slide)
ca
RC
Same sequence as
ll
2) rg_move_acquire node 1 up (previous visual)
RC
Nothing
3) rg_move_complete
3)rg_move_complete for each RG:
nothing process resources (APPLICATIONS)
start_server app02
process_resources (ONLINE)
If no fallback, rg_move_release is not done process_resources (NONE)
< Event Summary >
Figure 2-12 Another node joining the cluster
The next section explains what happens when a node leaves the cluster voluntarily.
Node failure
The situation is slightly different if the node on the right fails suddenly. Because a node is not
in a position to run any events, the calls to process_resources listed under the right node are
not run as shown in Figure 2-13.
ning p
run Sto ter
s
Clu vices
clstrmgrES clstrmgrES
ca
Event Event ll ser 1) rg_move_release
Manager Messages Manager
for each RG:
RC
ca C
process_resources (RELEASE)
ll
1) rg_move_release
R
process_resources (APPLICATIONS)
stop_server app02
nothing process_resources (FILESYSTEMS)
ca
process_resources (VGS)
ll
ca
2) rg_move_acquire
RC
process_resources (SERVICE_LABELS)
ll
RC
Example 2-4 shows details about the process flow from the clstrmgr.debug file.
cluster.log node2
Nov 23 06:24:21 AIX: EVENT START: rg_move_release node1 1
Nov 23 06:24:21 AIX: EVENT START: rg_move node1 1 RELEASE
Nov 23 06:24:21 AIX: EVENT START: stop_server appActrl
Nov 23 06:24:21 AIX: EVENT START: stop_server appBctrl
Nov 23 06:24:22 AIX: EVENT COMPLETED: stop_server appBctrl 0
Nov 23 06:24:24 AIX: EVENT COMPLETED: stop_server appActrl 0
Nov 23 06:24:27 AIX: EVENT START: release_service_addr
Nov 23 06:24:28 AIX: EVENT COMPLETED: release_service_addr 0
Nov 23 06:24:29 AIX: EVENT START: release_takeover_addr
Nov 23 06:24:30 AIX: EVENT COMPLETED: release_takeover_addr 0
Nov 23 06:24:30 AIX: EVENT COMPLETED: rg_move node1 1 RELEASE 0
Nov 23 06:24:30 AIX: EVENT COMPLETED: rg_move_release node1 1 0
Nov 23 06:24:32 AIX: EVENT START: rg_move_fence node1 1
Nov 23 06:24:32 AIX: EVENT COMPLETED: rg_move_fence node1 1 0
Nov 23 06:24:34 AIX: EVENT START: rg_move_fence node1 2
Nov 23 06:24:35 AIX: EVENT COMPLETED: rg_move_fence node1 2 0
Nov 23 06:24:35 AIX: EVENT START: rg_move_acquire node1 2
Nov 23 06:24:35 AIX: EVENT START: rg_move node1 2 ACQUIRE
Nov 23 06:24:35 AIX: EVENT COMPLETED: rg_move node1 2 ACQUIRE 0
Nov 23 06:24:35 AIX: EVENT COMPLETED: rg_move_acquire node1 2 0
Nov 23 06:24:41 AIX: EVENT START: rg_move_complete node1 2
Nov 23 06:24:41 AIX: EVENT COMPLETED: rg_move_complete node1 2 0
Nov 23 06:24:51 AIX: EVENT START: node_down_complete node2
Nov 23 06:24:52 AIX: EVENT COMPLETED: node_down_complete node2 0
CAA cluster: PowerHA SystemMirror creates the CAA cluster automatically. You do not
manage the CAA configuration or state directly. You can use the cluster commands to view
the CAA status directly.
Download and install the latest service packs for AIX and PowerHA from IBM Fix Central at:
http://www.ibm.com/support/fixcentral
The following file sets on the AIX base media are required:
rsct.basic.rte
rsct.compat.basic.hacmp
rsct.compat.clients.hacmp
The appropriate versions of RSCT for the supported AIX releases are also supplied with the
PowerHA installation media.
More information: For a list of the supported FC adapters, see “Setting up cluster storage
communication” in the AIX 7.1 Information Center at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.
clusteraware/claware_comm_setup.htm
See the readme files that are provided with the base PowerHA file sets and the latest service
pack. See also the PowerHA SystemMirror 7.1 for AIX Standard Edition Information Center
at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.doc/doc/base/
powerha.htm
The nodes of your cluster can be any system on which the installation of AIX 6.1 TL6 or
AIX 7.1 is supported, either as a full system partition or as a logical partition (LPAR).
Design methodologies can help eliminate network and disk single points of failure (SPOF) by
using redundant configurations. Use at least two network adapters connected to different
Ethernet switches in the same virtual LAN (VLAN). (PowerHA also supports the use of
EtherChannel.) Similarly, use dual-fabric storage area network (SAN) connections to the
storage subsystems with at least two Fibre Channel (FC) adapters and appropriate multipath
drivers. Also use Redundant Array of Independent Disks (RAID) technology to protect data
from any disk failure.
3.2.2 Requirements for the multicast IP address, SAN, and repository disk
Cluster communication requires the use of a multicast IP address. You can specify this
address when you create the cluster, or you can have one be generated automatically. The
ranges 224.0.0.0–224.0.0.255 and 239.0.0.0–239.255.255.255 are reserved for
administrative and maintenance purposes. If multicast traffic is present in the adjacent
network, you must ask the network administrator for a multicast IP address allocation. Also,
ensure that the multicast traffic that is generated by each of the cluster nodes is properly
forwarded by the network infrastructure to any other cluster node.
If you use SAN-based heartbeat, you must have zoning setup to ensure connectivity between
host FC adapters. You also must activate the Target Mode Enabled parameter on the involved
FC adapters.
Hardware redundancy at the storage subsystem level is mandatory for the Cluster Repository
disk. Logical Volume Manager (LVM) mirroring of the repository disk is not supported. The disk
CAA support: Currently CAA only supports the repository disk Fibre Channel or SAS
disks as described in the “Cluster communication” topic in the AIX 7.1 Information Center
at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.
clusteraware/claware_comm_benifits.htm
TL6: AIX must be at a minimum version of AIX 6.1 TL6 (6.1.6.0) on all nodes before
migration. Use of AIX 6.1 TL6 SP2 or later is preferred.
Most migration scenarios require a two-part upgrade. First, you migrate AIX to the minimum
version of AIX 6.1 TL6 on all nodes. You must reboot each node after upgrading AIX. Second,
you migrate to PowerHA 7.1 by using the offline, rolling, or snapshot scenario as explained in
Chapter 7, “Migrating to PowerHA 7.1” on page 151.
Support for vSCSI: CAA repository disk support for virtual SCSI (vSCSI) is officially
introduced in AIX 6.1 TL6 SP2 and AIX 7.1 SP2. You can create a vSCSI disk
repository at AIX 6.1 TL6 base levels, but not at SP1. Alternatively, direct SAN
connection logical unit numbers (LUNs) or N_Port ID Virtualization (NPIV) LUNs are
supported with all versions.
3.5 Storage
This section provides details about storage planning considerations for high availability of
your cluster implementation.
For additional information about the shared disk, see the PowerHA SystemMirror Version 7.1
for AIX Standard Edition Concepts and Facilities Guide, SC23-6751. See also the PowerHA
SystemMirror Version 7.1 announcement information or the PowerHA SystemMirror Version
7.1 for AIX Standard Edition Planning Guide, SC23-6758-01, for a complete list of supported
devices.
The following disks are supported (through Multiple Path I/O (MPIO)) for the repository disk:
All FC disks that configure as MPIO
IBM DS8000, DS3000, DS4000®, DS5000, XIV®, ESS800, SAN Volume Controller (SVC)
EMC: Symmetrix, DMX, CLARiiON
HDS: 99XX, 96XX, OPEN series
IBM System Storage N series/NetApp®: All models of N series and all NetApp models
common to N series
VIOS vSCSI
All IBM serial-attached SCSI (SAS) disks that configure as MPIO
SAS storage
Support for third-party multipathing software: At the time of writing, some third-party
multipathing software was not supported.
The following FC and SAS adapters are supported for connection to the repository disk:
4 GB Single-Port Fibre Channel PCI-X 2.0 DDR Adapter (FC 1905; CCIN 1910)
4 GB Single-Port Fibre Channel PCI-X 2.0 DDR Adapter (FC 5758; CCIN 280D)
4 GB Single-Port Fibre Channel PCI-X Adapter (FC 5773; CCIN 5773)
4 GB Dual-Port Fibre Channel PCI-X Adapter (FC 5774; CCIN 5774)
4 Gb Dual-Port Fibre Channel PCI-X 2.0 DDR Adapter (FC 1910; CCIN 1910)
4 Gb Dual-Port Fibre Channel PCI-X 2.0 DDR Adapter (FC 5759; CCIN 5759)
8 Gb PCI Express Dual Port Fibre Channel Adapter (FC 5735; CCIN 577D)
8 Gb PCI Express Dual Port Fibre Channel Adapter 1Xe Blade (FC 2B3A; CCIN 2607)
3 Gb Dual-Port SAS Adapter PCI-X DDR External (FC 5900 and 5912; CCIN 572A)
More information: For the most current list of supported storage adapters for shared disks
other than the repository disk, contact your IBM representative. Also see the “IBM
PowerHA SystemMirror for AIX” web page at:
http://www.ibm.com/systems/power/software/availability/aix/index.html
The PowerHA software supports the following disk technologies as shared external disks in a
highly available cluster:
SCSI drives, including RAID subsystems
FC adapters and disk subsystems
Data path devices (VPATH): SDD 1.6.2.0, or later
Virtual SCSI (vSCSI) disks
Support for vSCSI: CAA repository disk support for vSCSI is officially introduced in
AIX 6.1 TL6 SP2 and AIX 7.1 SP2. You can create a vSCSI disk repository at AIX 6.1
TL6 base levels, but not at SP1. Alternatively, direct SAN connection LUNs or NPIV
LUNs are supported with all versions.
You can combine these technologies within a cluster. Before choosing a disk technology,
review the considerations for configuring each technology as described in the following
section.
AIX MPIO is an architecture that uses PCMs. The following PCMs are all supported:
SDDPCM
HDLM PCM
AIXPCM
SDDPCM only supports DS6000™, DS8000, SVC, and some models of DS4000. HDLM PCM
only supports Hitachi storage devices. AIXPCM supports all storage devices that System p
servers and VIOS support. AIXPCM supports storage devices from over 25 storage vendors.
Support for third-party multipath drivers: At the time of writing, other third-party
multipath drivers (such as EMC PowerPath, and Veritas) are not supported. This limitation
is planned to be resolved in a future release.
See the “Support Matrix for Subsystem Device Driver, Subsystem Device Driver Path Control
Module, and Subsystem Device Driver Device Specific Module” at:
http://www.ibm.com/support/docview.wss?rs=540&uid=ssg1S7001350
Also check whether the coexistence of different multipath drivers using different FC ports on
the same system is supported for mixed cases. For example, the cluster repository disk might
be a on storage or FC adapter other than the shared data disks.
3.6 Network
The networking requirements for PowerHA SystemMirror 7.1 differ from all previous versions.
This section focuses specifically on the differences of the following requirements:
Multicast address
Network interfaces
Subnetting requirements for IPAT via aliasing
Host name and node name
Other network considerations
– Single adapter networks
– Virtual Ethernet (VIOS)
For additional information, and details about common features between versions, see the
PowerHA for AIX Cookbook, SG24-7739.
In previous versions, the network Failure Detection Rate (FDR) policy was tunable, which is
no longer true in PowerHA SystemMirror 7.1.
If the networks are a single adapter configuration, both the base and service IP addresses
are allowed on the same subnet.
Virtual Ethernet
In previous versions, when using virtual Ethernet, users configured a special formatted
netmon.cf file to ping additional external interfaces or addresses by using specific outbound
interfaces. The netmon.cf configuration file no longer applies.
For the cluster SAN-based communication channel, two extra zones are created as shown in
Example 4-1. One zone includes the fcs0 ports of each server, and the other zone includes
the fcs1 ports of each server.
Fabric2:
zone: Syndey_fcs1__Perth_fcs1
10:00:00:00:c9:74:c1:6f
10:00:00:00:c9:77:20:d9
This dual zone setup provides redundancy for the SAN communication channel at the Cluster
Aware AIX (CAA) storage framework level. The dotted lines in Figure 4-2 represent the
initiator-to-initiator zones added on top of the conventional ones, connecting host ports to
storage ports.
The multipath driver being used is the AIX native MPIO. In Example 4-3, the mpio_get_config
command shows identical LUNs on both nodes, as expected.
X in fcsX: In the following steps, the X in fcsX represents the number of the FC adapters.
You must complete this procedure for each FC adapter that is involved in cluster
SAN-based communication.
1. Unconfigure fcsX:
rmdev -Rl fcsX
fcsX device busy: If the fcsX device is busy when you use the rmdev command, enter
the following commands:
chdev -P -l fcsX -a tme=yes
chdev -P -l fscsiX -a dyntrk=yes -a fc_err_recov=fast_fail
Example 4-4 illustrates the procedure for port fcs0 on node sydney.
Depending on the functionality required for your environment, additional file sets might be
selected for installation.
PowerHA SystemMirror 7.1 for AIX Standard Edition includes the Smart Assists images. For
more details about the Smart Assists functionality and new features, see 2.2, “New features”
on page 24.
The PowerHA for IBM Systems Director agent file set comes with the base installation media.
To learn more about PowerHA SystemMirror for IBM Systems Director, see 5.3, “PowerHA
SystemMirror for IBM Systems Director” on page 133.
Installation from a CD is more appropriate for small environments. Use NFS export and
import for remote nodes to avoid multiple CD maneuvering or image copy operations.
The following section provides an example of how to use a NIM server to install the PowerHA
software.
In Example 4-6, the lslpp command lists the prerequisites that are already installed and the
ones that are missing in a single output.
Figure 4-3 shows selection of the appropriate lpp_source on the NIM server, aix6161, by
following the path smitty nim Install and Update Software Install Software. You
select all of the required file sets on the next panel.
Install Software
Update Installed Software to Latest Level (Update All)
Install Software Bundle
Update Software by Fix (APAR)
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
• Select the LPP_SOURCE containing the install images •
• •
• Move cursor to desired item and press Enter. •
• •
• aix7100g resources lpp_source •
• aix7101 resources lpp_source •
• aix6161 resources lpp_source •
• ha71sp1 resources lpp_source •
• aix6060 resources lpp_source •
• aix6160-SP1-only resources lpp_source •
• •
• F1=Help F2=Refresh F3=Cancel •
• Esc+8=Image Esc+0=Exit Enter=Do •
F1• /=Find n=Find Next •
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 4-3 Installing the prerequisites: Selecting lpp_source
Ty••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Pr• Software to Install •
• •
[T• Move cursor to desired item and press Esc+7. Use arrow keys to scroll. •
* • ONE OR MORE items can be selected. •
* • Press Enter AFTER making all selections. •
• •
• [MORE...2286] •
• + 6.1.6.1 POWER HA Business Resiliency solidDB •
• + 6.1.6.0 POWER HA Business Resiliency solidDB •
• •
• > bos.clvm ALL •
• + 6.1.6.0 Enhanced Concurrent Logical Volume Manager •
• •
• bos.compat ALL •
• + 6.1.6.0 AIX 3.2 Compatibility Commands •
• [MORE...4498] •
[M• •
• F1=Help F2=Refresh F3=Cancel •
F1• Esc+7=Select Esc+8=Image Esc+0=Exit •
Es• Enter=Do /=Find n=Find Next •
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 4-4 Installing the prerequisites: Selecting the file sets
After installing from the NIM server, ensure that each node remains at the initial version of AIX
and RSCT, and check the software consistency, as shown in Example 4-7.
sydney:/ # lppchk -v
Example 4-8 lists the contents of the lpp_source. As mentioned previously, both the Smart
Assist file sets and PowerHA for IBM Systems Director agent file set come with the base
media.
nimres1:/nimres1:/ # ls /nimrepo/lpp_source/HA71
.toc
cluster.adt.es
cluster.doc.en_US.assist
cluster.doc.en_US.assist.db2.html.7.1.0.1.bff
cluster.doc.en_US.assist.oracle.html.7.1.0.1.bff
cluster.doc.en_US.assist.websphere.html.7.1.0.1.bff
cluster.doc.en_US.es
cluster.doc.en_US.es.html.7.1.0.1.bff
cluster.doc.en_US.glvm.html.7.1.0.1.bff
cluster.es.assist
cluster.es.assist.common.7.1.0.1.bff
cluster.es.assist.db2.7.1.0.1.bff
cluster.es.assist.domino.7.1.0.1.bff
cluster.es.assist.ihs.7.1.0.1.bff
cluster.es.assist.sap.7.1.0.1.bff
cluster.es.cfs
cluster.es.cfs.rte.7.1.0.1.bff
cluster.es.client
cluster.es.client.clcomd.7.1.0.1.bff
cluster.es.client.lib.7.1.0.1.bff
cluster.es.client.rte.7.1.0.1.bff
cluster.es.cspoc
cluster.es.director.agent
cluster.es.migcheck
cluster.es.nfs
cluster.es.server
cluster.es.server.diag.7.1.0.1.bff
cluster.es.server.events.7.1.0.1.bff
Example 4-9 shows the file sets that were selected for the test environment and installed from
the lpp_source that was prepared previously. Each node requires a PowerHA license.
Therefore, you must install the license file set.
Then verify the installed software as shown in Example 4-10. The prompt return by the lppchk
command confirms the consistency of the installed file sets.
To work around the problem shown in Figure 4-5, manually import the volume group on the
other node by using the following command:
importvg -L test_vg hdiskx
After the volume group is added to the other node, the synchronization and verification are
then completed.
You can perform most administration tasks with any of these options. The option that you
choose depends on which one you prefer and which one meets the requirements of your
environment.
Locating available options: If you are familiar with the SMIT paths from an earlier
version, and need to locate a specific feature, use the “Can’t find what you are looking for
?” feature from the main SMIT menu to list and search the available options.
To enter the top-level menu, use the new fast path, smitty sysmirror. The fast path on earlier
versions, smitty hacmp, still works. From the main menu, the highlighted options shown in
Figure 5-1 are available to help with topology and resources configuration. Most of the tools
necessary to configure cluster components are under “Cluster Nodes and Networks” and
“Cluster Applications and Resources.” Some terminology has changed, and the interface
looks more simplified for easier navigation and management.
PowerHA SystemMirror
Because topology monitoring has been transferred to CAA, its management has been
simplified. Support for non-TCP/IP heartbeat has been transferred to CAA and is no longer a
separate configurable option. Instead of multiple menu options and dialogs for configuring
non-TCP/IP heartbeating devices, a single option is available plus a window (Figure 5-2) to
specify the CAA cluster repository disk and the multicast IP address.
The top resource menus keep only the commonly used options, and the less frequently used
menus are deeper in the hierarchy, under a new Custom Cluster Configuration menu. This
menu includes various customizable and advanced options, similar to the “Extended
Configuration” menu in earlier versions. See 2.3, “Changes to the SMIT panel” on page 25,
for a layout that compares equivalent menu screens in earlier versions with the new screens.
The Verify and Synchronize functions now have a simplified form in most of the typical menus,
while the earlier customizable version is available in more advanced contexts.
Application server versus application controller: Earlier versions used the term
application server to refer to the scripts that are used to start and stop applications under
SystemMirror control. In version 7.1, these scripts are referred to as application
controllers.
A System Events dialog is now available in addition to the user-defined events and pre- and
post-event commands for predefined events from earlier versions. For more information about
this dialog, see 9.4, “Testing the rootvg system event” on page 286.
SSA disks are no longer supported in AIX 6.1, and the RSCT role has been diminished.
Therefore, some related menu options have been removed. See Chapter 2, “Features of
PowerHA SystemMirror 7.1” on page 23, for more details about the new and obsolete
features.
For a topology configuration, SMIT provides two possible approaches that resemble the
previous Standard and Extended configuration paths: typical configuration and custom
configuration.
Typical configuration
The smitty sysmirror Cluster Nodes and Networks Initial Cluster Setup (Typical)
configuration path provides the means to configure the basic components of a cluster in a few
steps. Discovery and selection of configuration information is automated, and default values
are provided whenever possible. If you need to use specific values instead of the default
Custom configuration
Custom cluster configuration options are not typically required or used by most customers.
However they provide extended flexibility in configuration and management options. These
options are under the Custom Cluster Configuration option in the top-level panel. If you want
complete control over which components are added to the cluster, and create them piece by
piece, you can configure the cluster topology with the SMIT menus. Follow the path Custom
Cluster Configuration Initial Cluster Setup (Custom). With this path, you can also set
your own node and network names, other than the default ones. Alternatively, you can choose
only specific network interfaces to support the clustered applications. (By default, all IP
configured interfaces are used.)
Resources configuration
The Cluster Applications and Resources menu in the top-level panel groups the commonly
used options for configuring resources, resource groups, and application controllers.
Other resource options that are not required in most typical configurations are under the
Custom Cluster Configuration menu. They provide dialogs and options to perform the
following tasks:
Configure a custom disk, volume group, and file system methods for cluster resources
Customize resource recovery and service IP label distribution policy
Customize and event
Most of the resources menus and dialogs are similar to their counterparts in earlier versions.
For more information, see the existing documentation about the previous releases listed in
“Related publications” on page 519.
By using this setup, we can present various aspects of a typical production implementation,
such as topology redundancy or more complex resource configuration. As an example, we
configure SAN-based heartbeating and introduce the new Start After and Stop After resource
group dependencies.
Prerequisite: Before reading this section, you must have configured all your networks and
storage devices as explained in 3.2, “Hardware requirements” on page 44.
The /etc/cluster/rhosts directory must be populated with all cluster IP addresses before
using PowerHA SystemMirror. This process was done automatically in earlier versions, but is
now a required, manual process. The addresses that you enter in this file must include the
addresses that resolve to the host name of the cluster nodes. If you update this file, you must
refresh the clcomd subsystem with the refresh -s clcomd command.
Important: Previous releases used the clcomdES subsystem, which read information from
the /usr/es/sbin/cluster/etc/rhosts directory. The clcomdES subsystem is no longer
used. Therefore, you must configure the clcomd subsystem as explained in this section.
Also, ensure that you have one unused shared disk available for the cluster repository.
Example 5-1 shows the lspv command output on the systems sydney and perth. The first
part shows the output from the node sydney, and the second part shows the output from
perth.
---------------------------------------------------------------------------
perth:/ # lspv
hdisk0 00c1f1707c6092fe rootvg active
hdisk1 00c1f170fd6b4d9d dbvg
hdisk2 00c1f170fd6b50a5 appvg
hdisk3 00c1f170fd6b5126 None
Node names: The sydney and perth node names have no implication on extended
distance capabilities. The names have been used only for node names.
Defining a cluster
To define a cluster, follow these steps:
1. Use the smitty sysmirror or smitty hacmp fast path.
2. In the PowerHA SystemMirror menu (Figure 5-4), select the Cluster Nodes and
Networks option.
PowerHA SystemMirror
4. In the Initial Cluster Setup (Typical) menu (Figure 5-6), select the Setup a Cluster, Nodes
and Networks option.
5. From the Setup a Cluster, Nodes, and Networks panel (Figure 5-7 on page 72), complete
the following steps:
a. Specify the repository disk and the multicast IP address.
The cluster name is based on the host name of the system. You can use this default or
replace it with a name you want to use. In the text environment, the cluster is named
australia.
b. In the New Nodes field, define the IP label that you want to use to communicate to the
other systems. In this example, we plan to build a two-node cluster where the two
systems are named sydney and perth. If you want to create a cluster with more than
two nodes, you can specify more than one system by using the F4 key. The advantage
is that you do not get typographical errors, and you can verify that the /etc/hosts file
contains your network addresses.
The Currently Configured Node(s) field lists all the configured nodes or lists the host
name of the system you are working on if nothing is configured so far.
c. Press Enter.
[Entry Fields]
* Cluster Name [australia]
New Nodes (via selected communication paths) [perth] +
Currently Configured Node(s) sydney
Figure 5-7 Setup a Cluster, Nodes and Networks panel
The COMMAND STATUS panel (Figure 5-8) indicates that the cluster creation completed
successfully.
COMMAND STATUS
[TOP]
Cluster Name: australia_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: None
Cluster IP Address:
There are 2 node(s) and 1 network(s) defined
NODE perth:
Network net_ether_01
perth 192.168.101.136
NODE sydney:
Network net_ether_01
sydney 192.168.101.135
perth:
[MORE...93]
Figure 5-8 Cluster creation completed successfully
Reminder: After you change the /etc/cluster/rhosts directory, enter the refresh -s
clcomd command.
COMMAND STATUS
When you look in more detail at the output, you might notice that the system adds your entries
to the cluster configuration and runs a discovery on the systems. You also get information
about the discovered shared disks that are listed.
Multicast address not specified: If you did not specify a multicast address, you
can see the one that AIX chose for you in the output of the cltopinfo command.
c. Press Enter.
[Entry Fields]
* Cluster Name australia
* Repository Disk [None] +
Cluster IP Address []
+--------------------------------------------------------------------------+
| Repository Disk |
| |
| Move cursor to desired item and press Enter. |
| |
| hdisk3 |
| |
| F1=Help F2=Refresh F3=Cancel |
F1| F8=Image F10=Exit Enter=Do |
F5| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 5-10 Define Repository and Cluster IP Address panel
COMMAND STATUS
[TOP]
Cluster Name: australia
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: hdisk3
Cluster IP Address:
There are 2 node(s) and 1 network(s) defined
NODE perth:
Network net_ether_01
perth 192.168.101.136
NODE sydney:
Network net_ether_01
sydney 192.168.101.135
[BOTTOM]
Figure 5-11 COMMAND STATUS showing OK for adding a repository disk
This process only updates the information in the cluster configuration. If you use the lspv
command on any nodes in the cluster, each node still shows the same output as listed in
Example 5-1 on page 70. When the cluster is synchronized the first time, both the CAA
cluster and repository disk are created.
Example 5-2 shows a configuration that uses host names in the FQDN format.
-------------------------------
NODE busan.itso.ibm.com
-------------------------------
127.0.0.1 loopback localhost # loopback (lo0) name/address
::1 loopback localhost # IPv6 loopback (lo0) name/address
192.168.101.143 seoul-b1.itso.ibm.com seoul-b1 # Base IP label 1
192.168.101.144 busan-b1.itso.ibm.com busan-b1 # Base IP label 1
192.168.201.143 seoul-b2.itso.ibm.com seoul-b2 # Base IP label 2
192.168.201.144 busan-b2.itso.ibm.com busan-b2 # Base IP label 2
10.168.101.43 seoul.itso.ibm.com seoul # Persistent IP
10.168.101.44 busan.itso.ibm.com busan # Persistent IP
10.168.101.143 poksap-db.itso.ibm.com poksap-db # Service IP label
10.168.101.144 poksap-en.itso.ibm.com poksap-en # Service IP label
10.168.101.145 poksap-er.itso.ibm.com poksap-er # Service IP label
seoul.itso.ibm.com:/ # cllsif
Adapter Type Network Net Type Attribute Node IP Address Hardware Address Interface
Name Global Name Netmask Alias for HB Prefix Length
busan-b1 boot net_ether_01 ether public busan 192.168.101.144 en0 255.255.255.0
24
busan-b2 boot net_ether_01 ether public busan 192.168.201.144 en2 255.255.255.0
24
poksap-er service net_ether_01 ether public busan 10.168.101.145
255.255.255.0 24
poksap-en service net_ether_01 ether public busan 10.168.101.144
255.255.255.0 24
poksap-db service net_ether_01 ether public busan 10.168.101.143
255.255.255.0 24
seoul-b1 boot net_ether_01 ether public seoul 192.168.101.143 en0 255.255.255.0
24
seoul-b2 boot net_ether_01 ether public seoul 192.168.201.143 en2 255.255.255.0
24
poksap-er service net_ether_01 ether public seoul 10.168.101.145
255.255.255.0 24
poksap-en service net_ether_01 ether public seoul 10.168.101.144
255.255.255.0 24
poksap-db service net_ether_01 ether public seoul 10.168.101.143
255.255.255.0 24
seoul.itso.ibm.com:/ # cllsnode
Node busan
Interfaces to network net_ether_01
Communication Interface: Name busan-b1, Attribute public, IP address 192.168.101.144
Communication Interface: Name busan-b2, Attribute public, IP address 192.168.201.144
Communication Interface: Name poksap-er, Attribute public, IP address 10.168.101.145
Communication Interface: Name poksap-en, Attribute public, IP address 10.168.101.144
Communication Interface: Name poksap-db, Attribute public, IP address 10.168.101.143
Node seoul
Interfaces to network net_ether_01
Communication Interface: Name seoul-b1, Attribute public, IP address 192.168.101.143
Communication Interface: Name seoul-b2, Attribute public, IP address 192.168.201.143
Communication Interface: Name poksap-er, Attribute public, IP address 10.168.101.145
Communication Interface: Name poksap-en, Attribute public, IP address 10.168.101.144
Communication Interface: Name poksap-db, Attribute public, IP address 10.168.101.143
# LPAR names
seoul.itso.ibm.com:/ # clcmd uname -n
-------------------------------
NODE seoul.itso.ibm.com
seoul.itso.ibm.com:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
sapdb ONLINE seoul
OFFLINE busan
# The output below shows that CAA always use the hostname for its node names
# The Power HA nodenames are: seoul, busan
seoul.itso.ibm.com:/ # lscluster -c
Cluster query for cluster korea returns:
Cluster uuid: 02d20290-d578-11df-871d-a24e50543103
Number of nodes in cluster = 2
Cluster id for node busan.itso.ibm.com is 1
Primary IP address for node busan.itso.ibm.com is 10.168.101.44
Cluster id for node seoul.itso.ibm.com is 2
Primary IP address for node seoul.itso.ibm.com is 10.168.101.43
Number of disks in cluster = 2
for disk cldisk2 UUID = 428e30e8-657d-8053-d70e-c2f4b75999e2 cluster_major = 0 cluster_minor = 2
for disk cldisk1 UUID = fe1e9f03-005b-3191-a3ee-4834944fcdeb cluster_major = 0 cluster_minor = 1
Multicast address for cluster is 228.168.101.43
As a preliminary step, add the base IP aliases in /etc/cluster/rhosts file on each node and
refresh the CAA clcomd daemon. Example 5-3 illustrates this step on the node sydney.
Cluster
Nodes
Networks
Network Interfaces
Define Repository Disk and Cluster IP Address
Figure 5-12 initial Cluster Setup (Custom) panel for a custom configuration
Add/Change/Show a Cluster
[Entry Fields]
* Cluster Name [australia]
Figure 5-13 Adding a cluster
Add a Node
Entry Fields]
* Node Name [sydney]
Communication Path to Node [sydney] +
Figure 5-14 Add a Node panel
Add a Network
[Entry Fields]
* Network Name [ether01]
* Network Type ether
* Netmask(IPv4)/Prefix Length(IPv6) [255.255.252.0]
Figure 5-15 Add a Network panel
[Entry Fields]
* IP Label/Address [sydneyb2] +
* Network Type ether
* Network Name ether01
* Node Name [sydney] +
Network Interface []
Figure 5-16 Add a Network Interface panel
[Entry Fields]
* Cluster Name australia
* Repository Disk [hdisk1] +
Cluster IP Address []
Figure 5-17 Define Repository Disk and Cluster IP Address panel
Figure 5-18 shows an example where the Automatically correct errors found during
verification? option changed from the default value of No to Yes.
[Entry Fields]
* Verify, Synchronize or Both [Both] +
* Include custom verification library checks [Yes] +
* Automatically correct errors found during [Yes] +
verification?
Example 5-5 shows a summary configuration of the CAA cluster created during the
synchronization phase.
For more details about the CAA cluster status, see the following section.
------------------------------
Example 5-7 shows detailed interface information provided by the lscluster -i command. It
shows information about the network interfaces and the other two logical interfaces that are
used for cluster communication:
sfwcom The node connection to the SAN-based communication channel.
dpcom The node connection to the repository disk.
Storage
Volume Groups
Logical Volumes
File Systems
Physical Volumes
Figure 5-19 C-SPOC storage panel
The Volume Groups option is the preferred method for creating a volume group, because it
is automatically configured on all of the selected nodes. Since the release of PowerHA 6.1,
most operations on volume groups, logical volumes, and file systems no longer require
these objects to be in a resource group. Smart menus check for configuration and state
problems and prevent invalid operations before they can be initiated.
Volume Groups
Volume Groups
PVID: This step automatically creates physical volume IDs (PVIDs) for the unused (no
PVID) shared disks. A shared disk might have different names on selected nodes, but
the PVID is the same.
Logical Volumes
7. In the C-SPOC Storage panel (Figure 5-19 on page 86), define the logical volumes and
file systems by selecting the Logical Volumes and File Systems options. The
intermediate and final panels for these actions are similar to those panels in previous
releases.
You can list the file systems that you created by following the path C-SPOC Storage
File Systems List All File Systems by Volume Group. The COMMAND STATUS
panel (Figure 5-24) shows the list of file systems for this example.
COMMAND STATUS
Smart Assists: The “Make Applications Highly Available (Use Smart Assists)” function
leads to a menu of all installed Smart Assists. If you do not see the Smart Assist that you
need, verify that the corresponding Smart Assist file set is installed.
Resources
4. In the Application Controller Scripts panel (Figure 5-28), select the Add Application
Controller Scripts option.
[Entry Fields]
* Application Controller Name [dbac]
* Start Script [/HA71/db_start.sh]
* Stop Script [/HA71/db_stop.sh]
Application Monitor Name(s) +
Figure 5-29 Adding application controller scripts
The configuration of the applications is completed. The next step is to configure the service IP
addresses.
+--------------------------------------------------------------------------+
| Network Name |
| |
| Move cursor to desired item and press Enter. |
| |
| ether01 (192.168.100.0/22 192.168.200.0/22) |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 5-31 Network Name subpanel for the Add a Service IP Label/Address option
5. In the Add a Service IP Label/Address panel, which changes as shown in Figure 5-32, in
the IP Label/Address field, select the service address that you want to add.
You can use the Netmask(IPv4)/Prefix Length(IPv6) field to define the netmask. With IPv4,
you can leave this field empty. The Network Name field is prefilled.
[Entry Fields]
* IP Label/Address sydneys +
Netmask(IPv4)/Prefix Length(IPv6) []
* Network Name ether01
Figure 5-32 Details of the Add a Service IP Label/Address panel
You have now finished configuring the resources. In this example, you defined one service IP
address. If you need to add more service IP addresses, repeat the steps as indicated in this
section.
As explained in the following section, the next step is to configure the resource groups.
Resource Groups
4. In the Add a Resource Group panel (Figure 5-34), as in previous versions of PowerHA,
specify the resource group name, the participating nodes, and the policies.
[Entry Fields]
* Resource Group Name [dbrg]
* Participating Nodes (Default Node Priority) [sydney perth] +
Next, you synchronize the cluster nodes. If the Verify and Synchronize Cluster Configuration
task is successfully completed, you can start your cluster. However, you might first want to
see if the CAA cluster was successfully created by using the lscluster -c command.
5.1.6 Configuring Start After and Stop After resource group dependencies
In this section, you configure a Start After resource group dependency and similarly create a
Stop After resource group dependency. For more information about Start After and Stop After
resource group dependencies, see 2.5.1, “Start After and Stop After resource group
dependencies” on page 32.
To add a new dependency, in the Configure Start After Resource Group Dependency menu,
select the Add Start After Resource Group Dependency option. In this example, we
already configured the dbrg and apprg resource groups. The apprg resource group is defined
as the source (dependent) resource group as shown in Figure 5-37.
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
• Select the Source Resource Group •
• •
• Move cursor to desired item and press Enter. •
• •
• apprg •
• dbrg •
• •
• F1=Help F2=Refresh F3=Cancel •
• Esc+8=Image Esc+0=Exit Enter=Do •
F1• /=Find n=Find Next •
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 5-37 Selecting the source resource group of a Start After dependency
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
• Select the Target Resource Group •
• •
• Move cursor to desired item and press Esc+7. •
• ONE OR MORE items can be selected. •
• Press Enter AFTER making all selections. •
• •
• dbrg •
• •
• F1=Help F2=Refresh F3=Cancel •
• Esc+7=Select Esc+8=Image Esc+0=Exit •
F1• Enter=Do /=Find n=Find Next •
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 5-38 Selecting the target resource group of a Start After dependency
[Entry Fields]
* Monitor Name [dbam]
* Application Controller(s) to Monitor dbac +
* Monitor Mode [Both] +
* Monitor Method [/HA71/db_mon.sh]
Monitor Interval [30] #
Hung Monitor Signal [] #
* Stabilization Interval [120] #
* Restart Count [3] #
Restart Interval [] #
* Action on Application Failure [fallover] +
Notify Method [/HA71/db_stop.sh]
Cleanup Method [/HA71/db_start.sh]
Restart Method []
Figure 5-39 Adding the dbam custom application monitor
[Entry Fields]
* Monitor Name appam
Application Controller(s) to Monitor appac +
* Monitor Mode [Long-running monitori> +
* Monitor Method [/HA71/app_mon.sh]
Monitor Interval [30] #
Hung Monitor Signal [9] #
* Stabilization Interval [15] #
Restart Count [3] #
Restart Interval [594] #
* Action on Application Failure [fallover] +
Notify Method []
Cleanup Method [/HA71/app_stop.sh]
Restart Method [/HA71/app_start.sh]
Figure 5-40 Configuring the appam application monitor and appac application controller
For a series of tests performed on this configuration, see 9.8, “Testing a Start After resource
group dependency” on page 297.
[Entry Fields]
* Resource Type Name [my_resource_type]
* Processing order [] +
Verification Method []
Verification Type [Script] +
Start Method []
Stop Method []
+--------------------------------------------------------------------------+
¦ Processing order ¦
¦ ¦
¦ Move cursor to desired item and press Enter. ¦
¦ ¦
¦ FIRST ¦
¦ WPAR ¦
¦ VOLUME_GROUP ¦
¦ FILE_SYSTEM ¦
¦ SERVICEIP ¦
¦ TAPE ¦
¦ APPLICATION ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
F1¦ Esc+8=Image Esc+0=Exit Enter=Do ¦
Es¦ /=Find n=Find Next ¦
Es+--------------------------------------------------------------------------+
Figure 5-41 Adding a user-defined resource type
3. After you create your own resource, add it to the resource group. The resource group can
be shown in the pick list. This information is stored in the HACMresourcetype,
HACMPudres_def, and HACMPudresouce cluster configuration files.
DNP script for the nodes: Ensure that all nodes have the DNP script and that the
script has executable mode. Otherwise, you receive an error message while running the
synchronization or verification process.
For a description of this test scenario, see 9.9, “Testing dynamic node priority” on
page 302.
[MORE...11]
Removing a cluster consists of deleting the PowerHA definition and deleting the CAA cluster
from AIX. Removing the CAA cluster is the last step of the Remove operation as shown in
Figure 5-43.
COMMAND STATUS
Normally, deleting the cluster with this method removes both the PowerHA SystemMirror and
the CAA cluster definitions from the system. If a problem is encountered while PowerHA is
trying to remove the CAA cluster, you might need to delete the CAA cluster manually. For
more information, see Chapter 10, “Troubleshooting PowerHA 7.1” on page 305.
After you remove the cluster, ensure that the caavg_private volume group is no longer
displayed as shown in Figure 5-44.
To see the possible values for the attributes, use the man clvt command.
For a list of actions, you can use clmgr command with no arguments. See “The clmgr
command” on page 106 and Example 5-10 on page 106.
Most of the actions in the list provide aliases. Table 5-1 shows the current actions and their
abbreviations and aliases.
move mov, mv
recover rec
sync sy
verify ve
manage mg
For a list, you can use clmgr with no arguments. See “The clmgr query command” on
page 107 and Example 5-11 on page 107.
Most of these object classes in the list provide aliases. Table 5-2 on page 105 lists the current
object classes and their abbreviations and aliases.
Cluster cl
site si
node no
interface in, if
network ne, nw
resource_group rg
service_ip se
persistent_ip pe, pi
For a list of the actions that are currently supported, see 5.2.1, “The clmgr action commands”
on page 104.
For a list, you can use clmgr command with no arguments. See “The clmgr command” on
page 106 and Example 5-10 on page 106.
For a list of object classes that are currently supported, see 5.2.2, “The clmgr object classes”
on page 105.
For a list, use the clmgr command with no arguments. See “The clmgr query command” on
page 107 and Example 5-11 on page 107.
For most of these actions and object classes, abbreviations and aliases are available. These
commands are not case-sensitive. You can find more details about the actions and their
aliases in “The clmgr action commands” on page 104. For more information about object
classes, see “The clmgr object classes” on page 105.
Error messages: At the time of writing, the clmgr error messages referred to clvt. This
issue will be fixed in a future release so that it references clmgr.
clmgr [-c|-x] [-S] [-v] [-f] [-D] [-l {low|med|high|max}] [-T <ID>] \
[-a {<ATTR#1>,<ATTR#2>,<ATTR#n>,...}] <ACTION> <CLASS> [<NAME>] \
[-h | <ATTR#1>=<VALUE#1> <ATTR#2>=<VALUE#2> <ATTR#n>=<VALUE#n> ...]
clmgr [-c|-x] [-S] [-v] [-f] [-D] [-l {low|med|high|max}] [-T <ID>] \
[-a {<ATTR#1>,<ATTR#2>,<ATTR#n>,...}] -M "
<ACTION> <CLASS> [<NAME>] [<ATTR#1>=<VALUE#1> <ATTR#n>=<VALUE#n> ...]
.
.
."
ACTION={add|modify|delete|query|online|offline|...}
CLASS={cluster|site|node|network|resource_group|...}
As mentioned previously, most clmgr actions and object classes provide aliases. Another
helpful feature of the clmgr command is the ability to understand abbreviated commands. For
example, the previous command can be shortened as follows:
# clmgr q cl
For more details about the capability of the clmgr command, see 5.2.1, “The clmgr action
commands” on page 104, and 5.2.2, “The clmgr object classes” on page 105. See also the
man pages listed in Appendix D, “The clmgr man page” on page 501.
You can also use more complex search expressions. Example 5-14 shows how you can use
simple regular expression command. In addition, you can search on more than one field, and
only those objects that match all provided searches are displayed.
The -a option
Some query commands produce a rather long output. You can use the -a (attributes) option
to obtain shorter output and for information about a single value as shown in Example 5-15.
You can also use this option to get information about several values as shown in
Example 5-16.
Example 5-16 shows how to get information about the state and the location of a resource
group. The full output of the query command for the nfsrg resource group is shown in
Example 5-31 on page 123.
The -v option
The -v (verbose) option is helpful when used with the query action as shown in
Example 5-18. You use this option almost exclusively in IBM Systems Director to scan the
cluster for information.
STATE="ONLINE"
CURRENT_NODE="berlin"
munich:/ #
If you do not use the -v option with the query action as shown in Example 5-18, you see an
error message similar to the one in Example 5-19.
Example 5-19 Error message when not using the -v option for query all resource groups
munich:/ # clmgr -a STATE,current query rg
munich:/ #
Example 5-20 The command to return a single value from the clmgr command
# clmgr -cSa state query rg rg1
ONLINE
#
Example 5-21 Help for adding resource group using the clmgr command
# clmgr add resource_group -h
# Available options for "clvt add resource_group":
<RESOURCE_GROUP_NAME>
NODES
PRIMARYNODES
SECONDARYNODES
FALLOVER
FALLBACK
STARTUP
FALLBACK_AT
SERVICE_LABEL
APPLICATIONS
VOLUME_GROUP
FORCED_VARYON
VG_AUTO_IMPORT
FILESYSTEM
FSCHECK_TOOL
RECOVERY_METHOD
FS_BEFORE_IPADDR
EXPORT_FILESYSTEM
EXPORT_FILESYSTEM_V4
MOUNT_FILESYSTEM
STABLE_STORAGE_PATH
WPAR_NAME
NFS_NETWORK
SHARED_TAPE_RESOURCES
DISK
AIX_FAST_CONNECT_SERVICES
COMMUNICATION_LINKS
WLM_PRIMARY
WLM_SECONDARY
MISC_DATA
CONCURRENT_VOLUME_GROUP
NODE_PRIORITY_POLICY
NODE_PRIORITY_POLICY_SCRIPT
NODE_PRIORITY_POLICY_TIMEOUT
SITE_POLICY
Object class names between the angle brackets (<>) are required information, which does not
mean that all the other items are optional. Some items might not be marked because of other
dependencies. In Example 5-22 on page 112, only CLUSTER_NAME is listed as required, but
because of the new CAA dependency, the REPOSITORY (disk) is also required. For more
details about how to create a cluster using the clmgr command, see “Configuring a new
cluster using the clmgr command” on page 113.
To configure a PowerHA cluster by using the clmgr command, follow these steps:
1. Configure the cluster:
# clmgr add cluster de_cluster NODES=munich,berlin REPOSITORY=hdisk4
For details, see “Configuring a new cluster using the clmgr command” on page 113.
2. Configure the service IP addresses:
# clmgr add service_ip alleman NETWORK=net_ether_01 NETMASK=255.255.255.0
# clmgr add service_ip german NETWORK=net_ether_01 NETMASK=255.255.255.0
For details, see “Defining the service address using the clmgr command” on page 118.
3. Configure the application server:
# clmgr add application_controller http_app \
> STARTSCRIPT="/usr/IBM/HTTPServer/bin/apachectl -k start" \
> STOPSCRIPT="/usr/IBM/HTTPServer/bin/apachectl -k stop"
For details, see “Defining the application server using the clmgr command” on page 120.
4. Configure a resource group:
# clmgr add resource_group httprg VOLUME_GROUP=httpvg NODES=munich,berlin \
> SERVICE_LABEL=alleman APPLICATIONS=http_app
Command and syntax of clmgr: To ensure a robust and easy-to-use SMIT interface,
when using the clmgr command or CLI to configure or manage the PowerHA cluster, you
must use the correct command and syntax.
Preliminary setup
Prerequisite: In this section, you must know how to set up the prerequisites for a PowerHA
cluster.
The IP interfaces are already defined and the shared volume groups and file systems have
been created. The host names of the two systems are munich and berlin. Figure 5-45 shows
the disks and shared volume groups that are defined so far. hdisk4 is used as the CAA
repository disk.
munich:/ # lspv
hdisk1 00c0f6a012446137 httpvg
hdisk2 00c0f6a01245190c httpvg
hdisk3 00c0f6a012673312 nfsvg
hdisk4 00c0f6a01c784107 None
hdisk0 00c0f6a07c5df729 rootvg active
munich:/ #
Figure 5-45 List of available disks
munich:/ # netstat -i
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en0 1500 link#2 a2.4e.58.a0.41.3 23992 0 24516 0 0
en0 1500 192.168.100 munich 23992 0 24516 0 0
en1 1500 link#3 a2.4e.58.a0.41.4 2 0 7 0 0
en1 1500 100.168.200 munichb1 2 0 7 0 0
en2 1500 link#4 a2.4e.58.a0.41.5 4324 0 7 0 0
en2 1500 100.168.220 munichb2 4324 0 7 0 0
lo0 16896 link#1 16039 0 16039 0 0
lo0 16896 127 localhost.locald 16039 0 16039 0 0
lo0 16896 localhost6.localdomain6 16039 0 16039 0 0
munich:/ #
Figure 5-46 Defined network interfaces
Before you use the clmgr add cluster command, you must know which disk will be used for
the CAA repository disk. Example 5-23 shows the command and its output.
Table 5-3 provides more details about the command and arguments that are used.
berlin:
Hdisk: hdisk1
PVID: 00c0f6a012446137
VGname: httpvg
VGmajor: 100
Conc-capable: Yes
VGactive: No
Quorum-required:Yes
Hdisk: hdisk2
PVID: 00c0f6a01245190c
VGname: httpvg
VGmajor: 100
Conc-capable: Yes
VGactive: No
Quorum-required:Yes
munich:
Hdisk: hdisk1
PVID: 00c0f6a012446137
VGname: httpvg
VGmajor: 100
Conc-capable: Yes
VGactive: No
Quorum-required:Yes
berlin:
Hdisk: hdisk3
PVID: 00c0f6a012673312
VGname: nfsvg
VGmajor: 200
Conc-capable: Yes
VGactive: No
Quorum-required:Yes
munich:
Hdisk: hdisk2
berlin:
Hdisk: hdisk4
PVID: 00c0f6a01c784107
VGname: None
VGmajor: 0
Conc-capable: No
VGactive: No
Quorum-required:No
munich:
Hdisk: hdisk3
PVID: 00c0f6a012673312
VGname: nfsvg
VGmajor: 200
Conc-capable: Yes
VGactive: No
Quorum-required:Yes
berlin:
Hdisk: hdisk0
PVID: 00c0f6a048cf8bfd
VGname: rootvg
VGmajor: 10
Conc-capable: No
VGactive: Yes
Quorum-required:Yes
FREEMAJORS: 35..99,101..199,201...
munich:
Hdisk: hdisk4
PVID: 00c0f6a01c784107
VGname: None
VGmajor: 0
Conc-capable: No
VGactive: No
Quorum-required:No
Hdisk: hdisk0
PVID: 00c0f6a07c5df729
VGname: rootvg
VGmajor: 10
Conc-capable: No
VGactive: Yes
Quorum-required:Yes
FREEMAJORS: 35..99,101..199,201...
Nodename berlin.
Communication path munich discovered a new node. Hostname is munich. Adding it to
the configuration with
Nodename munich.
Discovering IP Network Connectivity
Discovered [10] interfaces
IP Network Discovery completed normally
munich:/ #
Example 5-24 Output of the cltopinfo command after creating cluster definitions
munich:/ # cltopinfo
Cluster Name: de_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: hdisk4
Cluster IP Address:
There are 2 node(s) and 2 network(s) defined
NODE berlin:
Network net_ether_01
berlinb2 100.168.220.141
berlinb1 100.168.200.141
Network net_ether_010
berlin 192.168.101.141
NODE munich:
Network net_ether_01
munichb1 100.168.200.142
munichb2 100.168.220.142
Network net_ether_010
munich 192.168.101.142
The clmgr add cluster command: The clmgr add cluster command automatically runs
discovery on IP and volume group harvesting. It results in adding the IP network interfaces
automatically to the cluster configuration.
Table 5-4 provides more details about the command and arguments that are used.
Table 5-4 Defining the service address using the clmgr command
Action, object class, or Value used Comment
argument
NETWORK net_ether_01 The network name from the cltopinfo command used
previously
NETMASK 255.255.255.0 Optional; when you specify a value, use the same one
that you used in setting up the interface.
To check the configuration up to this point, use the cltopinfo command again. Example 5-26
shows the current configuration.
Table 5-5 provides more details about the command and arguments that are used.
Table 5-5 Defining the application server using the clmgr command
Action, object class, or Value used Comment
argument
Compared to the smit functions, by using the clmgr command, you create a resource group
and its resources in one step. Therefore, you must ensure that you have defined all the
service IP addresses and your application servers.
Two resource groups have been created. The first one uses only the items needed for this
resource group (httprg), so that the system used the default values for the remaining
arguments. Table 5-6 provides more details about the command and arguments that are
used.
Table 5-6 Defining the resource groups using the clmgr (httprg) command
action, object class, or Value used comment
argument
VOLUME_GROUP httpvg The volume group used for this resource group.
For the second resource group in the test environment, we specified more details because we
did not want to use the default values (nfsrg). Table 5-7 provides more details about the
command and arguments that we used.
Table 5-7 Defining the resource groups using the clmgr (nfsrg) command
Action, object class, or Value used Comment
argument
VOLUME_GROUP nfsvg The volume group use for this resource group.
Example 5-28 shows the commands that are used to define the resource groups listed in
Table 5-6 on page 120 and Table 5-7.
To see the configuration up to this point, use the clmgr query command. Example 5-29
shows how to check which resource groups you defined.
Example 5-29 Listing the defined resource groups using the clmgr command
munich:/ # clmgr query rg
httprg
nfsrg
munich:/ #
Next, you can see the content that you created for the resource groups. Example 5-30 shows
the content of the httprg. As discussed previously, the default values for this resource group
were used as much as possible.
Now you can see the content that was created for the resource groups. Example 5-31 shows
the content of the nfsrg resource group.
Verifying and propagating the changes: After using the clmgr command to modify the
cluster configuration, enter the clmgr verify cluster and clmgr sync cluster commands
to verify and propagate the changes to all nodes.
Example 5-32 shows usage of the clmgr sync cluster command to synchronize the cluster
and the command output.
Example 5-32 Synchronizing the cluster using the clmgr sync cluster command
munich:/ # clmgr sync cluster
Retrieving data from available cluster nodes. This could take a few minutes.
berlin net_ether_010
munich net_ether_010
http_app httprg
Completed 50 percent of the verification checks
Completed 60 percent of the verification checks
Completed 70 percent of the verification checks
Completed 80 percent of the verification checks
Completed 90 percent of the verification checks
Completed 100 percent of the verification checks
Node: Network:
---------------------------------- ----------------------------------
WARNING: Not all cluster nodes have the same set of HACMP filesets installed.
The following is a list of fileset(s) missing, and the node where the
fileset is missing:
Fileset: Node:
-------------------------------- --------------------------------
node: berlin. Clverify can automatically populate this file to be used on a client
node, if executed in
auto-corrective mode.
WARNING: There are IP labels known to HACMP and not listed in file
/usr/es/sbin/cluster/etc/clhosts.client on
node: munich. Clverify can automatically populate this file to be used on a client
node, if executed in
auto-corrective mode.
WARNING: Network option "nonlocsrcroute" is set to 0 and will be set to 1 on
during HACMP startup on the
following nodes:
berlin
munich
following nodes:
berlin
munich
Grace periods must be enabled before NFSv4 stable storage can be used.
HACMP will attempt to fix this opportunistically when acquiring NFS resources on
this node however the change
won't take effect until the next time that nfsd is started.
If this warning persists, the administrator should perform the following steps to
enable grace periods on
munich:/ #
When the migration finishes successfully, the CAA repository disk is now defined. Figure 5-47
shows the disks before the cluster synchronization, which are the same as those shown in
Figure 5-45 on page 113.
munich:/ # lspv
hdisk1 00c0f6a012446137 httpvg
hdisk2 00c0f6a01245190c httpvg
hdisk3 00c0f6a012673312 nfsvg
hdisk4 00c0f6a01c784107 None
hdisk0 00c0f6a07c5df729 rootvg active
munich:/ #
Figure 5-47 List of available disks before sync
Figure 5-48 shows the output of the lspv command after the synchronization. In our example,
hdisk4 is now converted into a CAA repository disk and is listed as caa_private0.
munich:/ # lspv
hdisk1 00c0f6a012446137 httpvg
hdisk2 00c0f6a01245190c httpvg
hdisk3 00c0f6a012673312 nfsvg
caa_private0 00c0f6a01c784107 caavg_private active
hdisk0 00c0f6a07c5df729 rootvg active
munich:/ #
Figure 5-48 List of available disks after using the cluster sync command
Example 5-33 show the command that we used and some of the output from using this
command. To start the clinfo command, we used the CLINFO=true argument. We did not
want a broadcast message. Therefore, we also defined the BROADCAST=false argument.
/usr/es/sbin/cluster/diag/cl_ver_alias_topology[335] return 0
Node: Network:
---------------------------------- ----------------------------------
berlin net_ether_010
munich net_ether_010
WARNING: Network option "nonlocsrcroute" is set to 0 and will be set to 1 on during HACMP
startup on the following nodes:
munich
WARNING: Network option "ipsrcrouterecv" is set to 0 and will be set to 1 on during HACMP
startup on the following nodes:
munich
/usr/es/sbin/cluster/diag/clwpardata[325] exit 0
WARNING: Node munich has cluster.es.nfs.rte installed however grace periods are not fully
enabled on this node. Grace periods must be enabled before NFSv4 stable storage can be used.
HACMP will attempt to fix this opportunistically when acquiring NFS resources on this node
however the change won't take effect until the next time that nfsd is started.
If this warning persists, the administrator should perform the following steps to enable grace
periods on munich at the next planned downtime:
1. stopsrc -s nfsd
2. smitty nfsgrcperiod
3. startsrc -s nfsd
munich:/ #
Starting all nodes in a cluster: The clmgr online cluster start_cluster command
starts all nodes in a cluster by default.
Example 5-49 shows that all nodes are now up and running.
Colon-delimited format
When using the colon-delimited output format (-c), you can use the -S option to silence or
eliminate the header line.
The main log file for clmgr debugging is the /var/hacmp/log/clutils.log file. This log file
includes all standard error and output from each command.
The return codes used by the clmgr command are standard for all commands:
RC_UNKNOWN=-1 A result is not known. It is useful as an initializer.
RC_SUCCESS=0 No errors were detected; the operation seems to have
been successful.
RC_ERROR=1 A general error has occurred.
RC_NOT_FOUND=2 A specified resource does not exist or could not be
found.
RC_MISSING_INPUT=3 Some required input was missing.
RC_INCORRECT_INPUT=4 Some detected input was incorrect.
RC_MISSING_DEPENDENCY=5 A required dependency does not exist.
RC_SEARCH_FAILED=6 A specified search failed to match any data.
Example 5-36 lists the format of the trace information in the clutils.log file.
The following line shows an example of how the clutils.log file might be displayed:
CLMGR:0:resource_common:SerializeAsAssociativeArray()[537](0.704):13327:9765002:90
44114: unset 'array[AIX_LEVEL0]'
Example 5-37 shows some lines from the clutils.log file (not using trace).
Example 5-38 Using the TAIL argument when viewing the content of the clmgr log file
# clmgr view log clutils.log TAIL=1000 | wc -l
1000
#
Example 5-40 shows how to list the last five clmgr query commands that were run.
Director client agent of PowerHA SystemMirror is installed on cluster nodes in the same
manner as PowerHA SystemMirror itself by using the installp command. The Director
server and PowerHA server plug-in installation require a separate effort. You must download
them from the external website and manually install them on a dedicated system. This system
does not have to be a PowerHA system.
To learn about installing the Systems Director and PowerHA components, and their use for
configuration and management tasks, see Chapter 12, “Creating and managing a cluster
using IBM Systems Director” on page 333.
This chapter explains how to configure a hot standby two-node IBM PowerHA SystemMirror
7.1 cluster using the Smart Assist for DB2. The lab cluster korea is used for the examples with
the participating nodes seoul and busan.
Example 6-1 Additional file sets required for installing Smart Assist
seoul:/ # clcmd lslpp -l cluster.es.assist.common cluster.es.assist.db2
-------------------------------
NODE seoul
-------------------------------
Fileset Level State Description
----------------------------------------------------------------------------
Path: /usr/lib/objrepos
cluster.es.assist.common 7.1.0.1 COMMITTED PowerHA SystemMirror Smart
Assist Common Files
cluster.es.assist.db2 7.1.0.1 COMMITTED PowerHA SystemMirror Smart
Assist for DB2
-------------------------------
NODE busan
-------------------------------
Fileset Level State Description
----------------------------------------------------------------------------
Path: /usr/lib/objrepos
cluster.es.assist.common 7.1.0.1 COMMITTED PowerHA SystemMirror Smart
Assist Common Files
cluster.es.assist.db2 7.1.0.1 COMMITTED PowerHA SystemMirror Smart
Assist for DB2
-------------------------------
NODE busan
-------------------------------
hdisk0 00c0f6a089390270 rootvg active
caa_private0 00c0f6a01077342f caavg_private active
cldisk2 00c0f6a0107734ea pokvg
cldisk1 00c0f6a010773532 pokvg
6.1.4 Creating the DB2 instance and database on the shared volume group
Before launching the PowerHA Smart Assist for DB2, you must have already created the DB2
instance and DB2 database over the volume groups that are shared by both nodes.
In Example 6-4, the home for the POK database was created in the /db2/POK/db2pok shared
file system of the volume group pokvg. The instance was created in the /db2/db2pok shared
file system, which is the home directory for user db2pok. The instance was created in the
primary node only as far as the structures are created over a shared volume group.
-------------------------------
NODE busan
-------------------------------
db2pok:!:203:101::/db2/db2pok:/usr/bin/ksh
seoul:/ # su - db2pok
seoul:/db2/db2pok # db2start
Non-DPF database support: Smart Assist for DB2 supports only non-DPF databases.
2. Mount the file systems as shown in Example 6-7 so that Smart Assist for DB2 can
discover the available instances and databases.
The DB2 instance is active on the node where Smart Assist for DB2 is going to be
executed as shown in Example 6-8.
seoul:/db2/db2pok # db2ilist
db2pok
seoul:/db2/db2pok # db2start
09/24/2010 11:38:53 0 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
seoul:/db2/db2pok # db2pd -
Database Partition 0 -- Active -- Up 0 days 00:00:10
Example 6-9 Editing and adding the service IP label to the db2nodes.cfg file
seoul:/ # cat /db2/db2pok/sqllib/db2nodes.cfg
0 poksap-db 0
The .rhosts file (Example 6-10) for the DB2 instance owner has all the base, persistent,
and service addresses. It also has the right permissions.
4. Find the path for the binary files and then export the variable as shown in Example 6-11.
The DSE_INSTALL_DIR environment variable is exported as a root user with the actual path
for the DB2 binary files. If more than one DB2 version is installed, choose the version that
you to use for your high available instance.
Example 6-11 Finding the DB2 binary files and exporting them
seoul:/db2/db2pok # db2level
DB21085I Instance "db2pok" uses "64" bits and DB2 code release "SQL09050" with
level identifier "03010107".
Informational tokens are "DB2 v9.5.0.0", "s071001", "AIX6495", and Fix Pack
"0".
Product is installed at "/opt/IBM/db2/V9.5".
4. In the Add an Application to the PowerHA SystemMirror Configuration panel, select Select
Configuration Mode.
5. In the Select Configuration Mode panel (Figure 6-2), select Automatic Discovery and
Configuration.
8. Select the DB2 instance name. In this case, only one instance, db2pok, is available as
shown in Figure 6-4.
db2pok
9. Using the available pick lists (F4), edit the Takeover Node, DB2 Instance Database to
Monitor, and Service IP Label fields as shown in Figure 6-5. Press Enter.
Tip: You can edit the Application Name field and change it to have a more meaningful
name.
Example 6-12 The configured resource group for the DB2 instance
seoul:/ # /usr/es/sbin/cluster/utilities/cllsres
APPLICATIONS="db2pok_ApplicationServer"
FILESYSTEM=""
FORCED_VARYON="false"
FSCHECK_TOOL="logredo"
FS_BEFORE_IPADDR="false"
RECOVERY_METHOD="parallel"
SERVICE_LABEL="poksap-db"
SSA_DISK_FENCING="false"
VG_AUTO_IMPORT="false"
VOLUME_GROUP="pokvg"
USERDEFINED_RESOURCES=""
seoul:/ # /usr/es/sbin/cluster/utilities/cllsgrp
db2pok_ResourceGroup
10.Administrator task: Verify the start and stop scripts that were created for the resource
group.
a. To verify the scripts, use the odmget or cllsserv commands or the SMIT tool as shown
in Example 6-13.
seoul:/ # /usr/es/sbin/cluster/utilities/cllsserv
db2pok_ApplicationServer /usr/es/sbin/cluster/sa/db2/sbin/cl_db2start
db2pok /usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok
db2pok_ApplicationServer
11.Administrator task: Verify which custom and process application monitors were created by
Smart Assist for DB2. In our example, the application monitors are db2pok_SQLMonitor
and db2pok_ProcessMonitor.
a. Run the following path for seoul: smitty sysmirror Cluster Applications and
Resources Resources Configure User Applications (Scripts and
Monitors) Application Monitors Configure Custom Application Monitors
Change/Show Custom Application Monitor.
b. In the Application Monitor to Change panel (Figure 6-8), select db2pok_SQLMonitor
and press Enter.
db2pok_SQLMonitor
d. Run the following path for seoul: smitty sysmirror Cluster Applications and
Resources Resources Configure User Applications (Scripts and
Monitors) Application Monitors Configure Process Application Monitors
Change/Show Process Application Monitor.
e. In the Application Monitor to Change panel (Figure 6-10), select
db2pok_ProcessMonitor and press Enter.
db2pok_ProcessMonitor
seoul:/db2/db2pok # db2stop
09/24/2010 12:02:56 0 0 SQL1064N DB2STOP processing was successful.
SQL1064N DB2STOP processing was successful.
seoul:/ # lsvg -o
caavg_private
rootvg
5. Start the cluster on both nodes, seoul and busan, by running smitty clstart.
6. In the Start Cluster Services panel (Figure 6-13 on page 149), complete these steps:
a. For Start now, on system restart or both, select now.
b. For Start Cluster SErvices on these nodes, enter [seoul busan].
c. For Manage Resource Groups, select Automatically.
d. For BROADCAST message at startup, select false.
e. For Startup Cluster Information Daemon, select true.
f. For Ignore verification errors, select false.
g. For Automatically correct errors found during cluster start?, select yes.
h. Press Enter.
Tip: The log file for the Smart Assist is in the /var/hacmp/log/sa.log file. You can use the
clmgr utility to easily view the log, as in the following example:
clmgr view log sa.log
When the PowerHA cluster starts, the DB2 instance is automatically started. The application
monitors start after the defined stabilization interval as shown in Example 6-17.
Example 6-17 Checking the status of the high available cluster and the DB2 instance
seoul:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
db2pok_Resourc ONLINE seoul
OFFLINE busan
seoul:/ # su - db2pok
seoul:/db2/db2pok # db2pd -
Database Partition 0 -- Active -- Up 0 days 00:19:38
Your DB2 instance and database are now configured for high availability in a hot-standby
PowerHA SystemMirror configuration.
TL6: AIX must be at a minimum version of AIX 6.1 TL6 (6.1.6.0) on all nodes before
migration. Use of AIX 6.1 TL6 SP2 or later is preferred.
For more information about migration considerations, see 3.4, “Migration planning” on
page 46.
Important: A nondisruptive upgrade is not available in PowerHA 7.1, because this version
is the first one to use Cluster Aware AIX (CAA).
With the introduction of PowerHA 7.1, you now use the features of CAA introduced in AIX 6.1
TL6 and AIX 7.1. For more information about the new features of this release, see 2.2, “New
features” on page 24.
The migration process now has two main cluster components: CAA and PowerHA. This
process involves updating your existing PowerHA product and configuring the CAA cluster
component.
The clmigcheck command: The clmigcheck command automatically creates the CAA
cluster when it is run on the last node.
For a detailed explanation about the clmigcheck process, see 7.2.2, “Premigration
checking: The clmigcheck program” on page 157.
At this stage, the clmigcheck process has run on the last node of the cluster. The CAA
cluster is now created and CAA has established communication with the other node.
Figure 7-3 Extract from the clstrmgr.debug file showing the migration protocol
CAA communication: The grpsvcs SRC subsystem is active until you restart the
system. This subsystem is now communicating with CAA and not topsvcs as shown in
Figure 7-4.
Figure 7-5 shows the services that are running after migration, including cthags.
clcomd instances: You can have two instances of the clcomd daemon in the cluster, but
never on a given node. After PowerHA 7.1 is installed on a node, the clcomd daemon is
run, and the clcomdES daemon does not exist. AIX 6.1.6.0 and later with a back-level
PowerHA version (before version 7.1) only runs the clcomdES daemon even though the
clcomd daemon exists.
The clcomd daemon uses port 16191, and the clcomdES daemon uses port 6191. When
migration is complete, the clcomdES daemon is removed.
The clcomdES daemon: The clcomdES daemon is removed when the older PowerHA
software version is removed (snapshot migration) or overwritten by the new PowerHA 7.1
version (rolling or offline migration).
Command profile: The clmigcheck command is not a PowerHA command, but the
command is part of bos.cluster and is in the /usr/sbin directory.
The clmigcheck program uses the mkcluster command and passes the cluster parameters
from the existing PowerHA cluster, along with the repository disk and multicast address (if
applicable). Figure 7-7 shows an example of the mkcluster command being called.
A warning message is displayed for certain unsupported elements, such as disk heartbeat as
shown in Figure 7-9.
The second function of the clmigcheck program is to prepare the CAA cluster environment.
This function is performed when you select option 3 (Enter repository disk and multicast IP
addresses) from the menu.
When you select this option, the clmigcheck program stores the information entered in the
/var/clmigcheck/clmigcheck.txt file. This file is also copied to the /var/clmigcheck
directory on all nodes in the cluster. This file contains the physical volume identifier (PVID) of
the repository disk and the chosen multicast address. If PowerHA is allowed to choose a
multicast address automatically, the NULL setting is specified in the file. Figure 7-10 shows
an example of the clmigcheck.txt file.
CLUSTER_TYPE:STANDARD
CLUSTER_REPOSITORY_DISK:000fe40120e16405
CLUSTER_MULTICAST:NULL
Figure 7-10 Contents of the clmigcheck.txt file
Upon running the clmigcheck command, the command checks to see if the clmigcheck.txt
file exists. If the clmigcheck.txt file exists and the node is not the last node in the cluster to
be migrated, the panel shown in Figure 7-11 is displayed. It contains a message indicating
that you can now upgrade to the later level of PowerHA.
clmigcheck: This is not the first node or last node clmigcheck was run on.
No further checking is required on this node. You can install the new
version of PowerHA SystemMirror.
-----------------------------------------------------------------------
Figure 7-11 The clmigcheck panel after it has been run once and before the PowerHA upgrade
The clmigcheck program checks the installed version of PowerHA to see if it has been
upgraded. This step is important to determine which node is the last node to be upgraded in
the cluster. If it is the last node in the cluster, then additional configuration operations must be
completed along with creating and activating the CAA cluster.
Important: You must run the clmigcheck program before you upgrade PowerHA. Then
upgrade PowerHA one node at a time, and run the clmigcheck program on the next node
only after you complete the migration on the previous node. If you do not run the
clmigcheck program specifically on the last node, the cluster is still in migration mode
without creating the CAA cluster. For information about how to resolve this situation, see
10.4.7, “The ‘Cluster services are not active’ message” on page 323.
ERROR: This program is intended for PowerHA configurations prior to version 7.1
The version currently installed appears to be: 7.1.0
Figure 7-12 clmigcheck panel after PowerHA has been installed on a node.
Figure 7-13 shows an extract from the /tmp/clmigcheck/clmigcheck.log file that was taken
when the clmigcheck command ran on the last node in a three-node cluster migration. This
file shows the output by the clmigcheck program when checking whether this node is the last
node of the cluster.
ck_lastnode: oldnodes = 1
clmigcheck: This is the last node to run clmigcheck, create the CAA cluster
Figure 7-13 Extract from clmigcheck.log file showing the lslpp last node checking
Also the environment has one resource group that includes one service IP, two volume
groups, and application monitoring. This environment also has an IBM HTTP server as the
application. Figure 7-14 shows the relevant resource group settings.
The snapshot migration method requires all cluster nodes to be offline for some time. It
requires removing previous versions of PowerHA and installing AIX 6.1 TL6 or later and the
new version of PowerHA 7.1.
In this scenario, to begin, PowerHA 5.5 SP4 is on AIX 6.1.3 and migrated to PowerHA 7.1
SP1 on AIX 6.1 TL6. The network topology consists of one IP network using IPAT via
replacement and the disk heartbeat network. Both of these network types are no longer
supported. However, if you have an IPAT via replacement configuration, the clmigcheck script
generates an error message as shown in Figure 7-15. You must remove this configuration to
proceed with the migration.
IPAT via replacement configuration: If your cluster has an IPAT via replacement
configuration, remove or change to the IPAT via alias method before starting the migration.
Creating a snapshot
Create a snapshot by entering the smit cm_add_snap.dialog command while your cluster is
running.
The clcomd subsystem is now part of AIX and requires the fully qualified host names of all
nodes in the cluster to be listed in the /etc/cluster/rhosts file. Because AIX was
updated, a restart is required.
3. Because you updated the AIX image, restart the system before you continue with the next
step.
After restarting the system, you can see the clcomd subsystem from the caa subsystem
group that is up and running. The clcomdES daemon, which is part of PowerHA, is also
running as shown in Figure 7-18.
The clmigcheck menu options: In the clmigcheck menu, option 1 and 2 review the
cluster configurations. Option 3 gathers information that is necessary to create the CAA
cluster during its execution on the last node of the cluster. In option 3, you define a cluster
repository disk and multicast IP address. Selecting option 3 means that you are ready to
start the migration.
h = help
Figure 7-20 shows the warning message “This will be removed from the configuration
during the migration”. Because it is only a warning message, you can continue with the
migration. After completing the migration, verify that the disk heartbeat is removed.
When option 2 of clmigcheck is completed without error, proceed with option 3 as shown in
Figure 7-21.
1 = 000fe4114cf8d1ce(hdisk1)
2 = 000fe4114cf8d3a1(hdisk4)
3 = 000fe4114cf8d441(hdisk5)
4 = 000fe4114cf8d4d5(hdisk6)
5 = 000fe4114cf8d579(hdisk7)
You can create a NULL entry for the multicast address. Then, AIX generates one such
address as shown in Figure 7-23. Keep this value as the default so that AIX can generate the
multicast address.
If you make a NULL entry, AIX will generate an appropriate address for you.
You should only specify an address if you have an explicit reason to do
so, but are cautioned that this address cannot be changed once the
configuration is activated (i.e. migration is complete).
h = help
prompt_mcast: Called
validate_mcast: Called
write_file: Called
# cat /var/clmigcheck/clmigcheck.txt
CLUSTER_TYPE:STANDARD
CLUSTER_REPOSITORY_DISK:000fe4114cf8d1ce
CLUSTER_MULTICAST:NULL
Figure 7-25 The /var/clmigcheck/clmigcheck.txt file
When PowerHA 7.1 is installed, this information is used to create the HACMPsircol.odm file as
shown in Figure 7-26. This file is created when you finish restoring the snapshot in this
scenario.
HACMPsircol:
name = "canada_cluster_sircol"
id = 0
uuid = "0"
repository = "000fe4114cf8d1ce"
ip_address = ""
nodelist = "brazil,algeria"
backup_repository1 = ""
backup_repository2 = ""
algeria:/ #
Figure 7-26 The HACMPsircol.odm file
After you install the new PowerHA 7.1 file sets, you can see that the clcomdES daemon has
disappeared. You now have the clcomd daemon, which is part of CAA, instead of the clcomdES
daemon.
3. Stop and start the clcomd daemon instead by using the following command:
refresh -s clcomd
4. To verify that the clcomd subsystem is working, use the clrsh command. If it does not
work, correct any problems before proceeding as explained in Chapter 10,
“Troubleshooting PowerHA 7.1” on page 305.
Restoring a snapshot
To restore a snapshot, follow the path smitty hacmp Cluster Nodes and Networks
Manage the Cluster Snapshot Configuration Restore the Cluster Configuration
From a Snapshot for restoring a snapshot.
Warning: unable to verify inbound clcomd communication from node "algeria" to the local node,
"".
Warning: unable to verify inbound clcomd communication from node "brazil" to the local node,
"".
/usr/es/sbin/cluster/utilities/clsnapshot[2139]: apply_CS[125]:
communication_check: line 52: local: not found
When you finish restoring the snapshot, the CAA cluster is created based on the repository
disk and multicast address based in the /var/clmigcheck/clmigcheck.txt file.
Sometimes the synchronization or verification fails because the snapshot cannot create the
CAA cluster. If you see an error message similar to the one shown in Figure 7-30, look in the
/var/adm/ras/syslog.caa file and correct the problem.
ERROR: Problems encountered creating the cluster in AIX. Use the syslog
facility to see output from the mkcluster command.
ERROR: Creating the cluster in AIX failed. Check output for errors in local
cluster configuration, correct them, and try synchronization again.
ERROR: Updating the cluster in AIX failed. Check output for errors in local
cluster configuration, correct them, and try synchronization again.
After completing all the steps, check the CAA cluster configuration and status on both nodes.
First, the caavg_private volume group is created and varied on as shown in Figure 7-31.
algeria:/ # lspv
hdisk2 000fe4114cf8d258 algeria_vg
hdisk3 000fe4114cf8d2ec brazil_vg
hdisk8 000fe4114cf8d608 diskhb
caa_private0 000fe40120e16405 caavg_private active
hdisk0 000fe4113f087018 rootvg active
algeria:/ #
Figure 7-31 The caavg_private volume group varied on
algeria:/ # lscluster -m
Calling node query for all nodes
Node query number of nodes examined: 2
------------------------------
After the clmigcheck command is done running, you can remove the older version of
PowerHA and install PowerHA 7.1.
# lspv
caa_private0 000fe40120e16405 caavg_private active
hdisk2 000fe4114cf8d258 algeria_vg
hdisk3 000fe4114cf8d2ec brazil_vg
hdisk0 000fe4113f087018 rootvg active
#
Figure 7-34 The lspv output after restoring the snapshot
where:
<cluster_name> is canada_cluster.
+hdiskX is +hdisk2.
hdsiskY is hdisk3.
The two shared disks are now included in the CAA shared disk as shown in Figure 7-35.
Now hdisk2 and hdisk3 are changed to cldisk. The hdisk name from the lspv command
shows the cldiskX instead of the hdiskX as shown in Figure 7-36.
algeria:/ # lspv
caa_private0 000fe40120e16405 caavg_private active
cldisk1 000fe4114cf8d258 algeria_vg
cldisk2 000fe4114cf8d2ec brazil_vg
hdisk8 000fe4114cf8d608 diskhb
hdisk0 000fe4113f087018 rootvg active
algeria:/ #
Figure 7-36 The lspv command showing cldisks for shared disks
When you use the lscluster command to perform the check, you can see that the shared
disks (cldisk1 and cldisk2) are monitored by the CAA service. Keep in mind that two types
of disks are in CAA. One type is the repository disk that is shown as REPDISK, and the other
type is the shared disk that is shown as CLUSDISK. See Figure 7-37 on page 175.
1 Create a snapshot.
7.3.4 Summary
A snapshot migration to PowerHA 7.1 entails running the clmigcheck program. Before you
begin the migration, you must prepare for it by installing AIX 6.1.6 or later and checking if any
part of the configuration is unsupported.
Then you run the clmigcheck command to review your PowerHA configuration and verify that
is works with PowerHA 7.1. After verifying the configuration, you specify a repository disk and
multicast address for the CAA service, which are essential components for the CAA service.
After you successfully complete the clmigcheck procedure, you can install PowerHA 7.1. The
CAA service is made while you restore your snapshot. PowerHA 7.1 uses the newly
configured CAA service for event monitoring and heartbeating.
The cluster is using virtualized resources provided by VIOS for network and storage. Rootvg
(hdisk0) is also hosted from the VIOS. The backing devices are provided from a DS4800
storage system.
The network topology is configured as IPAT via aliasing. Also disk heartbeating is used over
the shared storage between all the nodes.
The cluster contains two resource groups: newyork_rg and test_rg. The newyork_rg resource
group hosts the IBM HTTP Server application, and the test_rg resource group hosts a test
script application. The node priority for newyork_rg is node chile, and test_rg is node
serbia. Node scotland is running in a standby node capacity.
7.4.1 Planning
Before beginning a rolling migration, you must properly plan to ensure that you are ready to
proceed. For more information, see 7.1, “Considerations before migrating” on page 152. The
migration to PowerHA 7.1 is different from previous releases, because of the support for CAA
integration. Therefore, see also 7.2, “Understanding the PowerHA 7.1 migration process” on
page 153.
Ensure that the cluster is stable on all nodes and is synchronized. With a rolling migration,
you must be aware of the following restrictions while performing the migration, because a
mixed-software-version cluster is involved:
Do not perform synchronization or verification while a mixed-software-version cluster
exists. Such actions are not allowed in this case.
Do not make any cluster configuration changes.
Do not perform a Cluster Single Point Of Control (C-SPOC) operation while a
mixed-software-version cluster exists. Such action is not allowed in this case.
Try to perform the migration during one maintenance period, and do not leave your cluster
in a mixed state for any significant length of time.
CAA-specific file sets: You must install the CAA specific bos.cluster and bos.ahafs
file sets because update_all does not install them.
3. Decide which shared disk you to use for the CAA private repository (scotland node). See
7.1, “Considerations before migrating” on page 152, for more information.
Previous volume disk group: The disk must be a clean logical unit number (LUN) that
does not contain a previous volume group. If you have a previous volume group on this
disk, you must remove it. See 10.4.5, “Volume group name already in use” on
page 320.
You do not need to take any action because the disk-based heartbeating is
automatically removed during migration. Because three disk heartbeat networks are in
the configuration, this warning message is displayed three times, once for each
network. If no errors are detected, you see the message shown in Figure 7-44.
Press Enter after this last panel, and you return to the main menu.
1 = 000fe40120e16405(hdisk1)
2 = 000fe4114cf8d258(hdisk2)
3 = 000fe4114cf8d2ec(hdisk3)
4 = 000fe4013560cc77(hdisk5)
5 = 000fe4114cf8d4d5(hdisk6)
6 = 000fe4114cf8d579(hdisk7)
c. Enter the multicast address as shown in Figure 7-46. You can specify a multicast, or
you can have clmigcheck automatically assign one. For more information about
multicast addresses, see 1.3.1, “Communication interfaces” on page 13. Press Enter
and you return to the main menu.
If you make a NULL entry, AIX will generate an appropriate address for
you.
You should only specify an address if you have an explicit reason to do
so, but are cautioned that this address cannot be changed once the
configuration is activated (i.e. migration is complete).
h = help
clmigcheck: This is not the first node or last node clmigcheck was run on.
No further checking is required on this node. You can install the new
version of PowerHA SystemMirror.
6. Upgrade PowerHA on the scotland node to PowerHA 7.1 SP1. Because the cluster
services are down, you can perform a smitty update_all to upgrade PowerHA.
7. When this process is complete, modify the new rhosts definition for CAA as shown in
Figure 7-48. Although in this scenario, we used network addresses, you can also add the
short name for the host name into rhosts considering that you configured the /etc/hosts
file correctly. See “Creating a cluster with host names in the FQDN format” on page 75, for
more information.
/etc/cluster
# cat rhosts
192.168.101.111
192.168.101.112
192.168.101.113
Figure 7-48 Extract showing the configured rhosts file
Restarting the cluster: You do not need to restart the cluster after you upgrade
PowerHA.
8. Start PowerHA on the scotland node by issuing the smitty clstart command. The node
should be able to rejoin the cluster. However, you receive warning messages about mixed
versions of PowerHA.
After PowerHA is started on this node, move any resource groups that the next node is
hosting onto this node so that you can migrate the second node in the cluster. In this
scenario, the serbia node is hosting the test_app_rg resource group. Therefore, we
perform a resource group move request to move this resource to the newly migrated
scotland node. The serbia node is then available to migrate.
3. Run the clmigcheck command to ensure that the migration worked and that you can
proceed with the PowerHA upgrade. This step is important even though you have already
performed the cluster configuration migration check and CAA configuration on the first
node (scotland) is complete.
Figure 7-52 shows the panel that you see now.
clmigcheck: This is not the first node or last node clmigcheck was run on.
No further checking is required on this node. You can install the new
version of PowerHA SystemMirror.
4. Upgrade PowerHA on the serbia node to PowerHA 7.1 SP1. Follow the same migration
procedure as in the first node.
Reminder: Update the /etc/cluster/rhosts file so that it is the same as the first node
that you upgraded. See step 6 on page 183.
At this stage, two of the three nodes in the cluster are migrated to AIX 6.1 TL6 and PowerHA
7.1. The chile node is the last node in the cluster to be upgraded. Figure 7-53 shows how the
cluster looks now.
Figure 7-54 Rolling migration: The chile node before the AIX upgrade
chile:/ # clmigcheck
Verifying clcomd communication, please be patient.
clmigcheck: Running
/usr/sbin/rsct/install/bin/ct_caa_set_disabled_for_migration
on each node in the cluster
If you see a message similar to the one shown in Figure 7-56, the final mkcluster phase
has failed. For more information about this problem, see 10.2, “Troubleshooting the
migration” on page 308.
At this stage, you have upgraded AIX and run the final clmigcheck process. Figure 7-57
shows how the cluster looks now.
Reminder: Update the /etc/cluster/rhosts file so that it is the same as the other
nodes that you upgraded. See step 6 on page 183.
In this scenario, you started PowerHA on the chile node and performed a synchronization or
verification of the cluster, which is the final stage of the migration. The newyork_rg resource
group was moved back to the chile node. The cluster migration is now completed.
Figure 7-58 shows how the cluster looks now.
Check that CAA is working by running the lscluster -m command. This command returns
information about your cluster from all your nodes. If a problem exists, you see a message
similar to the one shown in Figure 7-59.
# lscluster -m
Cluster services are not active.
Figure 7-59 Message indicating that CAA is not running
If you receive this message, see 10.4.7, “The ‘Cluster services are not active’ message” on
page 323, for details about how to fix this problem.
Verify that CAA private is defined and active on all nodes.
Check the lspv output to ensure that the CAA repository is defined and varied on for each
node. You see output similar to what is shown in Figure 7-60.
chile:/ # lspv
caa_private0 000fe40120e16405 caavg_private active
hdisk2 000fe4114cf8d258 None
Figure 7-60 Extract from lspv showing the CAA repository disk
Review the /tmp/clconvert.log file to ensure that the conversion of the PowerHA ODM has
been successful. For additional details about the log files and troubleshooting information,
see 10.1, “Locating the log files” on page 306.
Synchronize or verify the cluster.
The cluster layout is a mutual takeover configuration. The munich system is the primary server
for the HTTP application. The berlin system is the primary server for the Network File
System (NFS), which is cross mounted by the system munich.
Because of resource limitations, the disk heartbeat is using one of the existing shared disks.
Two networks are defined:
The net_ether_01 network is the administrative network and is used only by the system
administration team.
The net_ether_10 network is used by the applications and its users.
PowerHA 6.1 support on AIX 7.1: PowerHA 6.1 SP2 is not supported on AIX 7.1. You
need a minimum of PowerHA 6.1 SP3.
Important: You must restart the systems to ensure that all needed processes for CAA
are running.
Beginning with PowerHA 6.1 SP3 or later, you can start the cluster if preferred, but we do
not start it now in this scenario.
7. Run the clmigcheck program on one of the cluster nodes.
Important: You must run the clmigcheck program (in the /usr/sbin/ directory) before
you install PowerHA 7.1. Keep in mind that you must run this program on each node
one-at-a-time in the cluster.
While checking the configuration, you might see warning or error messages. You must
correct errors manually, but can clean up issues identified by warning messages during
the migration process. In this case, a warning message (Figure 7-66) is displayed
indicating the disk heartbeat network will be removed at the end of the migration.
Press Enter, and the main clmigcheck panel (Figure 7-65 on page 196) is displayed
again.
d. Select option 3 (Enter repository disk and multicast IP addresses).
The next panel (Figure 7-68) lists all available shared disks that might be used for the
CAA repository disk. You need one shared disk for the CAA repository.
1 = 00c0f6a01c784107(hdisk4)
e. Configure the multicast address as shown in Figure 7-69 on page 198. The system
automatically creates an appropriate address for you. By default, PowerHA creates a
multicast address by replacing the first octet of the IP communication path of the lowest
node in the cluster by 228. Press Enter.
Important:
You cannot change the selected IP multicast address after the configuration is
activated.
You must set up any routers in the network topology to forward multicast
messages.
If you make a NULL entry, AIX will generate an appropriate address for
you.
You should only specify an address if you have an explicit reason to do
so, but are cautioned that this address cannot be changed once the
configuration is activated (i.e. migration is complete).
h = help
f. From the main clmigcheck panel, type an x to exit the clmigcheck program.
g. In the next panel (Figure 7-70), confirm the exit request by typing y.
Note - If you have not completed the input of repository disks and
multicast IP addresses, you will not be able to install
PowerHA SystemMirror
COMMAND STATUS
[MORE...94]
restricted by GSA ADP Schedule Contract with IBM Corp.
. . . . . << End of copyright notice for cluster.es.migcheck >>. . . .
9. Add the host names of your cluster nodes to the /etc/cluster/rhosts file. The names
must match the PowerHA node names.
10.Refresh the clcomd subsystem.
refresh -s clcomd
11.Review the /tmp/clconvert.log file to ensure that a conversion of the PowerHA ODMs
has occurred.
12.Start cluster services only on the node that you updated by using smitty clstart.
13.Ensure that the cluster services have started successfully on this node by using any of the
following commands:.
clstat -a
lssrc -ls clstrmgrES | grep state
clmgr query cluster | grep STATE
14.Continue to the next node.
15.Run the clmigcheck program on this node.
Keep in mind that you must run the clmigcheck program on each node before you can
install PowerHA 7.1. Follow the same steps as for the first system as explained in step 7
on page 195.
# clmigcheck
Saving existing /tmp/clmigcheck/clmigcheck.log to
/tmp/clmigcheck/clmigcheck.log.bak
rshexec: cannot connect to node munich
#
Figure 7-73 The clmigcheck execution error message
Attention: Do not start the clcomd subsystem manually. Starting this system manually
can result in further errors, which might require you to re-install this node or all the
cluster nodes.
16.Install PowerHA only on this node in the same way as you did on the first node. See step 8
on page 199.
17.As on the first node, add the host names of your cluster nodes to the /etc/cluster/rhosts
file. The names must be the same as the node names.
18.Refresh the clcomd subsystem.
19.Start the cluster services only on the node that you updated.
20.Ensure that the cluster services started successfully on this node.
21.If you have more than two nodes in you cluster, repeat step 15 on page 199 through step
20 until all of your cluster nodes are updated.
You now have a fully running cluster environment. Before going into production mode, test
your cluster as explained in Chapter 9, “Testing the PowerHA 7.1 cluster” on page 259.
Upon checking the topology information by using the cltopinfo command, all non-IP and disk
heartbeat networks should be removed. If these networks are not removed, see Chapter 10,
“Troubleshooting PowerHA 7.1” on page 305.
When checking the RSCT subsystems, the topology subsystem should now be inactive as
shown in Figure 7-74.
The role of the administrator is to quickly find relevant information and analyze it to make the
best decision in every situation. This chapter provides several examples that show how the
PowerHA 7.1 administrator can gather information about the cluster by using several
methods.
For most of the examples in this chapter, the korea cluster from the test environment is used
with the participating seoul and busan nodes. All the commands in the examples are executed
as root user.
CAA subsystems
Cluster Aware AIX (CAA) introduces a new set of subsystems. When the cluster is not
running, its status is inactive, except for the clcomd subsystem, which is active (Example 8-3).
The clcomdES subsystem has been replaced by the clcomd subsystem and is no longer part of
the cluster subsystems group. It is now part of the AIX Base Operating System (BOS), not
PowerHA.
Disk configuration
With the current code level in AIX 7.1.0.1, the CAA repository cannot be created over virtual
SCSI (VSCSI) disks. For the korea cluster, a DS4800 storage system is used and is accessed
over N_Port ID Virtualization (NPIV). The rootvg volume group is the only one using VSCSI
devices. Example 8-5 shows a list of storage disks.
busan:/ # lspv
hdisk0 00c0f6a089390270 rootvg active
hdisk1 00c0f6a077839da7 None
hdisk2 00c0f6a0107734ea None
hdisk3 00c0f6a010773532 None
busan:/ # ifconfig -a
en0: en0:
flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT
,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 192.168.101.144 netmask 0xffffff00 broadcast 192.168.101.255
inet 10.168.101.44 netmask 0xffffff00 broadcast 10.168.101.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
en2:
flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT
,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 192.168.201.144 netmask 0xffffff00 broadcast 192.168.201.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
lo0:
flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LAR
GESEND,CHAIN>
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1%1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
Routing table
Keeping the routing table is an important source of information. As shown in 8.3.1, “AIX
commands and log files” on page 216, the multicast address is not displayed in this table,
even when the CAA and IBM PowerHA clusters are running. Example 8-7 shows the routing
table for the seoul node.
Multicast information
You can use the netstat command to display information about an interface for which
multicast is enabled. As shown in Example 8-8 for en0, no multicast address is configured,
other than the default 224.0.0.1 address before the cluster is configured.
Cluster status
Before a cluster is configured, the state of every node is NOT_CONFIGURED as shown in
Example 8-10.
As soon as the configuration is synchronized to all nodes and the CAA cluster is created, the
administrator cannot change the cluster name or the cluster multicast address.
Changing the repository disk: The administrator can change the repository disk with the
procedure for replacing a repository disk provided in the PowerHA 7.1 Release Notes.
After the first synchronization, two other disks are added in the cluster storage by using the
following command:
chcluster -n korea -d+hdisk2,hdisk3
where hdisk2 is renamed to cldisk2, and hdisk3 is renamed to cldisk1. Example 8-13
shows the resulting disk listing.
Attention: The cluster repository disk is a special device for the cluster. The use of Logical
Volume Manager (LVM) commands over the repository disk is not supported. AIX LVM
commands are single node commands and are not intended for use in a clustered
configuration.
Multicast information
Compared with the multicast information collected when the cluster was not configured, the
netstat command now shows the 228.168.101.43 address in the table (Example 8-14).
Cluster status
The cluster status changes from NOT_CONFIGURED to ST_INIT as shown in Example 8-15.
Subsystem guide:
cld determines whether the local node must become the primary or secondary solidDB
server in a failover.
The solid subsystem is the database engine.
The solidhac subsystem is used for the high availability of the solidDB server.
The clconfd subsystem runs every 10 minutes to put any missed cluster configuration
changes into effect on the local node.
Tip: The primary IP address shown for each node is the IP address chosen as the
communication path during cluster definition. In this case, the address is the same IP
address that is used as the persistent IP address.
The multicast address, when not specified by the administrator during cluster creation, is
composed by the number 228 followed by the last three octets of the communication path
from the node where the synchronization is executed. In this particular example, the
synchronization was run from the seoul node that has the communication path
192.168.101.43. Therefore, the multicast address for the cluster becomes 228.168.101.43
as can be observed in the output of lscluster -c command.
The nodes can use IPv6, but at least one of the interfaces in each node must be configured
with IPv4 to enable the CAA cluster multicasting.
-------------------------------
NODE busan
-------------------------------
Calling node query for all nodes
Node query number of nodes examined: 2
Zone: Example 8-18 on page 209 mentions zones. A zone is a concept that is planned for
use in future versions of CAA, where the node can be part of different groups of machines.
rtt: The round-trip time (rtt) is calculated by using a mean deviation formula. Some
commands show rrt instead of rtt, which is believed to be a typographic error in the
command.
sfwcom: Storage Framework Communication (sfwcom) is the interface created by CAA for
SAN heartbeating. To enable sfwcom, the following prerequisites must be in place:
Each node must have either a 4 GB or 8 GB FC adapter. If you are using vSCSI or
NPIV, VIOS 2.2.0.11-FP24 SP01 is the minimum level required.
The adapters used for SAN heartbeating must have the tme (target mode enabled)
parameter set to yes. The Fibre Channel controller must have the parameter dyntrk set
to yes, and the parameter fc_err_recov set to fast_fail.
All the adapters participating in the heartbeating must be in the same fabric zone. In the
previous example, sydney-fcs0 and perth-fcs0 are in the same fabric zone;
sydney-fcs1 and perth-fcs1 are in the same fabric zone.
dpcomm: The dpcomm interface is the actual repository disk. It means that, on top of the
Ethernet and the Fibre Channel adapters, the cluster also uses the repository disk as a
physical media to exchange heartbeats among the nodes.
-------------------------------
NODE busan
-------------------------------
Disk configuration
All the volume groups controlled by a resource group are shown as concurrent on both sides
as shown in Example 8-22.
-------------------------------
NODE busan
-------------------------------
hdisk0 00c0f6a089390270 rootvg active
caa_private0 00c0f6a077839da7 caavg_private active
cldisk2 00c0f6a0107734ea pokvg concurrent
cldisk1 00c0f6a010773532 pokvg concurrent
Multicast information
When compared with the multicast information collected when the cluster is not configured,
the netstat command shows that the 228.168.101.43 address is present in the table. See
Example 8-23.
-------------------------------
NODE busan
-------------------------------
Routing tables
Destination Gateway Flags Refs Use If Exp Groups
Example 8-27 Multicast packet monitoring for the seoul node using the tcpdump utility
seoul:/ # tcpdump -D
1.en0
2.en2
3.lo0
The same information is captured on the busan node as shown in Example 8-28.
Example 8-28 Multicast packet monitoring for the busan node using the tcpdump utility
busan:/tmp # tcpdump -D
1.en0
2.en2
3.lo0
You can also see the multicast traffic for all the PowerHA 7.1 clusters in your LAN segment.
The following command generates the output:
seoul:/ # tcpdump -n -vvv port drmsfsd
Tip: You can observer the multicast address in the last line of the lscluster -c CAA
command.
Cluster information
The CAA comes with a set of command-line tools, as explained in “Cluster information using
the lscluster command” on page 209. These tools can be used to monitor the status and
statistics of a running cluster. For more information about CAA and its functionalities, see
Chapter 2, “Features of PowerHA SystemMirror 7.1” on page 23.
UUID
The UUID of the caa_private0 disk is stored as a cluster0 device attribute as shown in
Example 8-31.
The repository disk contains logical volumes for the bootstrap and solidDB file systems as
shown in Example 8-33.
NODES
numcl numz uuid shid name
0 0 4f8858be-c0dd-11df-930a-a24e50543103 2 seoul
0 0 e356646e-c0dd-11df-b51d-a24e57e18a03 1 busan
ZONES
none
Tip: The solidDB database is not necessarily active in the same node where the PowerHA
resource group is active. You can see this difference when comparing Example 8-35 with
the output of the clRGinfo command:
seoul:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
db2pok_Resourc ONLINE seoul
OFFLINE busan
In this case, the solidDB database has the primary database active in the busan node, and
the PowerHA resource group is currently settled in the seoul node.
Example 8-36 Using the lssrc command to check where solidDB is active
seoul:/ # lssrc -ls IBM.StorageRM
Subsystem : IBM.StorageRM
PID : 7077950
Cluster Name : korea
Node Number : 2
Daemon start time : 10/05/10 10:06:57
PeerNodes: 2
QuorumNodes: 2
Group IBM.StorageRM.v1:
ConfigVersion: 0x24cab3184
Providers: 2
QuorumMembers: 2
Group Leader: seoul, 0xdc82faf0908920dc, 2
Example 8-37 The solidDB SQL interface (view from left side of code)
seoul:/ # /opt/cluster/solidDB/bin/solsql -x pwdfile:/etc/cluster/dbpass "tcp 2188" caa
2 rows fetched.
select * from SHAREDDISKS;
Example 8-38 Using the solidDB SQL interface (view from right side starting at CLUSTER_ID row)
VERIFIED_STATUS ESTATE VERSION_OPERATING VERSION_CAPABLE MULTICAST
--------------- ------ ----------------- --------------- ---------
NULL 1 1 1 0
NULL 1 1 1 0
This file keeps all the logs about CAA activity, including the error outputs from the commands.
Example 8-39 shows an error caught in the /var/adm/ras/syslog.caa file during the cluster
definition. The chosen repository disk has already been part of a repository in the past and
had not been cleaned up.
Tip: To capture debug information, you can replace *.info with *.debug in the
/etc/syslog.conf file, followed by a syslogd daemon refresh. Given that the output in
debug mode provides much information, redirect the syslogd output to a file system other
than /, /var, or /tmp.
seoul:/ # /usr/sbin/snmpv3_ssw -1
Stop daemon: snmpmibd
In /etc/rc.tcpip file, comment out the line that contains: snmpmibd
In /etc/rc.tcpip file, remove the comment from the line that contains: dpid2
Make the symbolic link from /usr/sbin/snmpd to /usr/sbin/snmpdv1
Make the symbolic link from /usr/sbin/clsnmp to /usr/sbin/clsnmpne
ID Name State
1108531106 korea UP
Select an option:
# - the Cluster ID q- quit
1108531106
Example 8-43 The clstat utility with the option to run only once
sydney:/ # clstat -o
Tip: The sfwcom and dpcomm interfaces that are shown with the lscluster -i command are
not shown in output of the clstat utility. The PowerHA 7.1 cluster is unaware of the CAA
cluster that is present at the AIX level.
_____________________________________________________________________________
Cluster Name: korea
Cluster State: UP
Cluster Substate: STABLE
_____________________________________________________________________________
NODE busan:
This node has 1 service IP label(s):
NODE seoul:
This node has 1 service IP label(s):
Tip: The cltopinfo -m command is used to show the heartbeat rings in the previous
versions of PowerHA. Because this concept no longer applies, the output of the cltopinfo
-m command is empty in PowerHA 7.1.
The PowerHA 7.1 cluster administrator must explore all the utilities in the
/usr/es/sbin/cluster/utilities/ directory in a testing system. Most of the utilities are only
informational tools. Remember to never trigger unknown commands in production systems.
Use the odmget command followed by the name of the file in the /etc/es/objrepos directory.
Example 8-48 shows how to retrieve information about the cluster.
HACMPcluster:
id = 1108531106
name = "korea"
nodename = "seoul"
sec_level = "Standard"
sec_level_msg = ""
sec_encryption = ""
sec_persistent = ""
Tip: In previous versions of PowerHA, the ODM HACMPtopsvcs class kept information about
the current instance number for a node. In PowerHA 7.1, this class always has the instance
number 1 (instanceNum = 1 as shown in the following example) because topology services
are not used anymore. This number never changes.
seoul:/ # odmget HACMPtopsvcs
HACMPtopsvcs:
hbInterval = 1
fibrillateCount = 4
runFixedPri = 1
fixedPriLevel = 38
tsLogLength = 5000
gsLogLength = 5000
instanceNum = 1
You can use the HACMPnode ODM class to discover which version of PowerHA is installed as
shown in Example 8-49.
Example 8-49 Using the odmget command to retrieve the PowerHA version
seoul:/ # odmget HACMPnode | grep version | sort -u
version = 12
The following version numbers and corresponding HACMP/PowerHA release are available:
2: HACMP 4.3.1
3: HACMP 4.4
4: HACMP 4.4.1
5: HACMP 4.5
6: HACMP 5.1
7: HACMP 5.2
8: HACMP 5.3
9: HACMP 5.4
10: PowerHA 5.5
11: PowerHA 6.1
12: PowerHA 7.1
If the HACMPtopsvcs ODM class can no longer be used to discover if the configuration must be
synchronized across the nodes, you can query the HACMPcluster ODM class. This class
keeps a numeric attribute called handle. Each node has a different value for this attribute,
ranging from 1 to 32. You can retrieve the handle values by using the odmget or clhandle
commands as shown in Example 8-50.
HACMPcluster:
id = 1108531106
name = "korea"
nodename = "seoul"
sec_level = "Standard"
sec_level_msg = ""
sec_encryption = ""
sec_persistent = ""
last_node_ids = ""
highest_node_id = 0
last_network_ids = ""
highest_network_id = 0
last_site_ids = ""
highest_site_id = 0
handle = 2
cluster_version = 12
reserved1 = 0
reserved2 = 0
wlm_subdir = ""
settling_time = 0
rg_distribution_policy = "node"
noautoverification = 0
clvernodename = ""
clverhour = 0
clverstartupoptions = 0
-------------------------------
NODE busan
-------------------------------
HACMPcluster:
id = 1108531106
name = "korea"
nodename = "busan"
sec_level = "Standard"
sec_level_msg = ""
sec_encryption = ""
sec_persistent = ""
last_node_ids = ""
highest_node_id = 0
-------------------------------
NODE busan
-------------------------------
1 busan
When you perform a cluster configuration change in any node, that node receives a numeric
value of 0 over its handle.
Suppose that you want to add a new resource group to the korea cluster and that you make
the change from the seoul node. After you do the modification, and before you synchronize
the cluster, the handle attribute in the HACMPcluster ODM class in the seoul node has a value
of 0 as shown in Example 8-51.
If you experience a situation where more than one node has a handle with a 0 value, you or
another person might have performed the changes from different nodes. Therefore, you must
decide in which node you want to start the synchronization. As result, the cluster
modifications made on the other nodes are then lost.
The clmgr command supports the actions as listed in 5.2.1, “The clmgr action commands” on
page 104.
For monitoring purposes, you can use the query and view actions. For a list of object classes,
that are available for each action, see 5.2.2, “The clmgr object classes” on page 105.
Example 8-54 Query action on the PowerHA cluster using the clmgr command
seoul:/ # clmgr query cluster
CLUSTER_NAME="korea"
CLUSTER_ID="1108531106"
STATE="STABLE"
VERSION="7.1.0.1"
VERSION_NUMBER="12"
EDITION="STANDARD"
CLUSTER_IP=""
REPOSITORY="caa_private0"
SHARED_DISKS="cldisk2,cldisk1"
UNSYNCED_CHANGES="false"
SECURITY="Standard"
FC_SYNC_INTERVAL="10"
RG_SETTLING_TIME="0"
RG_DIST_POLICY="node"
MAX_EVENT_TIME="180"
MAX_RG_PROCESSING_TIME="180"
SITE_POLICY_FAILURE_ACTION="fallover"
SITE_POLICY_NOTIFY_METHOD=""
DAILY_VERIFICATION="Enabled"
VERIFICATION_NODE="Default"
VERIFICATION_HOUR="0"
VERIFICATION_DEBUGGING="Enabled"
LEVEL=""
ALGORITHM=""
GRACE_PERIOD=""
REFRESH=""
MECHANISM=""
CERTIFICATE=""
PRIVATE_KEY=""
Tip: Another way to check the PowerHA version is to query the SNMP subsystem as
follows:
seoul:/ # snmpinfo -m dump -v -o /usr/es/sbin/cluster/hacmp.defs
clstrmgrVersion
clstrmgrVersion.1 = "7.1.0.1"
clstrmgrVersion.2 = "7.1.0.1"
Example 8-55 Using the view action on the PowerHA cluster using clmgr
seoul:/ # clmgr view report cluster
Cluster: korea
Cluster services: active
State of cluster: up
Substate: stable
#############
APPLICATIONS
#############
Cluster korea provides the following applications: db2pok_ApplicationServer
Application: db2pok_ApplicationServer
db2pok_ApplicationServer is started by
/usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok
db2pok_ApplicationServer is stopped by
/usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok
Application monitors for db2pok_ApplicationServer:
db2pok_SQLMonitor
db2pok_ProcessMonitor
Monitor name: db2pok_SQLMonitor
Type: custom
Monitor method: user
Monitor interval: 120 seconds
Hung monitor signal: 9
Stabilization interval: 240 seconds
Retry count: 3 tries
Restart interval: 1440 seconds
Failure action: fallover
Cleanup method: /usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok
Restart method: /usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok
Monitor name: db2pok_ProcessMonitor
Type: process
Process monitored: db2sysc
Process owner: db2pok
Instance count: 1
Stabilization interval: 240 seconds
Retry count: 3 tries
Restart interval: 1440 seconds
#############
TOPOLOGY
#############
korea consists of the following nodes: busan seoul
busan
Network interfaces:
busan-b2 {up}
with IP address: 192.168.201.144
on interface: en2
on network: net_ether_01 {up}
busan-b1 {up}
with IP address: 192.168.101.144
on interface: en0
on network: net_ether_01 {up}
NODE busan:
Network net_ether_01
poksap-db 10.168.101.143
busan-b2 192.168.201.144
busan-b1 192.168.101.144
NODE seoul:
Network net_ether_01
poksap-db 10.168.101.143
seoul-b1 192.168.101.143
seoul-b2 192.168.201.143
You can also use the clmgr command to see the list of PowerHA SystemMirror log files as
shown in Example 8-56.
Example 8-56 Viewing the PowerHA cluster log files using the clmgr command
seoul:/ # clmgr view log
Available Logs:
autoverify.log
cl2siteconfig_assist.log
cl_testtool.log
clavan.log
clcomd.log
clcomddiag.log
clconfigassist.log
clinfo.log
clstrmgr.debug
clstrmgr.debug.long
cluster.log
cluster.mmddyyyy
clutils.log
clverify.log
cspoc.log
cspoc.log.long
cspoc.log.remote
dhcpsa.log
dnssa.log
domino_server.log
emuhacmp.out
hacmp.out
ihssa.log
migration.log
sa.log
sax.log
Tip: The output verbose level can be set by using the -l option as in the following
example:
clmgr -l {low|med|high|max} action object
Root user: Do not use the root user. The second person who logs on with the root user ID
unlogs the first person, and so on. The logon is exclusive. For a production environment,
create an AIX user ID for each person who must connect to the IBM Systems Director web
interface. This user ID must belong to smadmin. Therefore, everyone can connect
simultaneously to the IBM Systems Director web interface. For more information, see the
“Users and user groups in IBM Systems Director” topic in the IBM Systems Director V6.1.x
Information Center at:
http://publib.boulder.ibm.com/infocenter/director/v6r1x/index.jsp?topic=/direct
or.security_6.1/fqm0_c_user_accounts.html
smadmin (Administrator group): Members of the smadmin group are authorized for all
operations.
Cluster menu
You can right-click all the objects to access options. Figure 8-8 shows an example of the
options for the korea cluster.
Figure 8-8 Menu options when right-clicking a cluster in the PowerHA SystemMirror plug-in
Figure 8-10 Options available when right-clicking a resource group in PowerHA SystemMirror plug-in
Storage tab
Figure 8-13 shows the Storage tab and the information that is presented.
Whether scripting something to be used on many systems or to automate a process, the CLI
can be useful in a management environment such as IBM Systems Director.
Tip: To run the commands, the smcli interface requires you to be an IBM Systems Director
superuser.
Example 8-57 runs the smcli command host name mexico in IBM Systems Director to see the
available options with PowerHA.
Example 8-57 Available options for PowerHA in IBM Systems Director CLI
mexico:/ # /opt/ibm/director/bin/smcli lsbundle | grep sysmirror
sysmirror/help
sysmirror/lsac
sysmirror/lsam
sysmirror/lsappctl
sysmirror/lsappmon
sysmirror/lscl
sysmirror/lscluster
sysmirror/lsdependency
sysmirror/lsdp
sysmirror/lsfc
sysmirror/lsfilecollection
sysmirror/lsif
sysmirror/lsinterface
sysmirror/lslg
sysmirror/lslog
sysmirror/lsmd
sysmirror/lsmethod
All the configuration commands listed in Example 8-57 can be triggered by using the smcli
command. Example 8-58 shows the commands that you can use.
# Lists networks:
mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lsnw -c korea
net_ether_01
You can obtain the cluster address for the -a option of the socksimple command from the
lscluster -c command output (Example 9-3).
perth:/ # socksimple -r -a 1
socksimple version 1.2
Listening on 1/12:
2. Disconnect the network interfaces, by pulling the cables in one node to simulate an
Ethernet network failure. Example 9-5 shows the interfaces status.
perth:/ # socksimple -r -a 1
socksimple version 1.2
Listening on 1/12:
4. Check the status of the cluster interfaces by using the lscluster -i command.
Example 9-7 shows the status for both disconnected ports on the perth node. In this
example, the status has changed from UP to DOWN SOURCE HARDWARE RECEIVE SOURCE
HARDWARE TRANSMIT.
5. Reconnect the Ethernet cables and check the port status as shown in Example 9-8.
6. Check if the cluster status has recovered. Example 9-9 shows that both Ethernet ports on
the perth node are now in the UP state.
9.2.1 Background
When the entire PowerHA SystemMirror IP network fails, and either the SAN-based heartbeat
network (sfwcom) does not exist, or it exists but has failed, CAA uses the
heartbeat-over-repository-disk (dpcom) feature.
The example in the next section describes dpcom heartbeating in a two-node cluster after all
IP interfaces have failed.
Initially, both nodes are online and running cluster services, all IP interfaces are online, and
the service IP address has an alias on the en3 interface.
This test scenario includes unplugging the cable of one interface at a time, starting with en3,
en4, en5, and finally en0. As each cable is unplugged, the service IP correctly swaps to the
next available interface on the same node. Each failed interface is marked as DOWN SOURCE
HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT as shown in Example 9-10. After the cables for
the en3 through en5 interfaces are unplugged, a local network failure event occurs, leading to
a selective failover of the resource group to the remote node. However, because the en0
interface is still up, CAA continues to heartbeat over the en0 interface.
------------------------------
------------------------------
[hacmp28:HAES7101/AIX61-06 /]
# lscluster -m
Calling node query for all nodes
Node query number of nodes examined: 2
------------------------------
Example 9-13 shows the output of the lscluster -i command with the dpcom status
changing from UP RESTRICTED AIX_CONTROLLED to UP AIX_CONTROLLED.
Example 9-13 Output of the lscluster -i command showing the dpcom status
[hacmp27:HAES7101/AIX61-06 /]
# lscluster -i
Network/Storage Interface Query
After any interface cable is reconnected, such as the en0 interface, CAA stops heartbeating
over the repository disk and resumes heartbeating over the IP interface.
Example 9-14 shows the output of the lscluster -m command after the en0 cable is
reconnected. The dpcom status changes from UP to DOWN RESTRICTED, and the en0 interface
status changes from DOWN to UP.
------------------------------
Example 9-15 shows the output of the lscluster -i command. The en0 interface is now
marked as UP, and the dpcom returns to UP RESTRICTED AIX_CONTROLLED.
9.3.1 Background
In PowerHA 7.1, the heartbeat method has changed. Heartbeating between the nodes is now
done by AIX. The newly introduced CAA takes the role for heartbeating and event
management.
This simulation tests a network down scenario and looks at the log files of PowerHA and CAA
monitoring. This test scenario has a two-node cluster, and one network interface is down on
one of the nodes using the ifconfig command.
This cluster has one IP heartbeat path and two non-heartbeat paths. One of the
non-heartbeat paths is a SAN-based heartbeat channel (sfwcom). The other non-heartbeat
path is heartbeating over the repository disk (dpcom). Although IP connectivity is lost when
using the ifconfig command, PowerHA SystemMirror use CAA for heartbeating over the two
other channels. This process is similar to the rs232 or diskhb heartbeat networks in previous
versions of PowerHA.
riyad:/ # netstat -i
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en0 1500 link#2 a2.4e.5f.b4.5.2 74918 0 50121 0 0
en0 1500 192.168.100 riyad 74918 0 50121 0 0
en0 1500 10.168.200 saudisvc 74918 0 50121 0 0
lo0 16896 link#1 3937 0 3937 0 0
lo0 16896 127 loopback 3937 0 3937 0 0
lo0 16896 loopback 3937 0 3937 0 0
riyad:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
myrg ONLINE riyad
OFFLINE jeddah
Figure 9-2 Status of the riyad node
riyad:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
myrg OFFLINE riyad
ONLINE jeddah
Figure 9-5 clRGinfo while network failure
You can also check the network down event in the /var/hacmp/adm/cluster.log file
(Figure 9-6).
Oct 6 09:57:42 riyad user:notice PowerHA SystemMirror for AIX: EVENT START:
network_down riyad net_ether_01
Oct 6 09:57:42 riyad user:notice PowerHA SystemMirror for AIX: EVENT
COMPLETED: network_down riyad net_ether_01 0
Oct 6 09:57:42 riyad user:notice PowerHA SystemMirror for AIX: EVENT START:
network_down_complete riyad net_ether_01
Oct 6 09:57:43 riyad user:notice PowerHA SystemMirror for AIX: EVENT
COMPLETED: network_down_complete riyad net_ether_01 0
Figure 9-6 Network down event from the cluster.log file
jeddah:/ # /usr/sbin/rsct/bin/ahafs_mon_multi
=== write String : CHANGED=YES;CLUSTER=YES
=== files being monitored:
fd file
3 /aha/cluster/nodeState.monFactory/nodeStateEvent.mon
4 /aha/cluster/nodeAddress.monFactory/nodeAddressEvent.mon
5 /aha/cluster/networkAdapterState.monFactory/networkAdapterStateEvent.mon
6 /aha/cluster/nodeList.monFactory/nodeListEvent.mon
7 /aha/cpu/processMon.monFactory/usr/sbin/rsct/bin/hagsd.mon
==================================
Loop 1:
Event for
/aha/cluster/networkAdapterState.monFactory/networkAdapterStateEvent.mon has
occurred.
BEGIN_EVENT_INFO
TIME_tvsec=1286376025
TIME_tvnsec=623294923
SEQUENCE_NUM=0
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=ADAPTER_DOWN
INTERFACE_NAME=en0
NODE_NUMBER=2
NODE_ID=0x2F1590D0CC0211DFBF20A24E5FB40502
CLUSTER_ID=0x93D8689AD0F211DFA49CA24E5F0D9E02
END_EVPROD_INFO
END_EVENT_INFO
==================================
Figure 9-7 Event monitoring from AHAFS
With help from the caa_event, you can monitor the network failure event. You can see the
CAA event by running the /usr/sbin/rsct/bin/caa_event -a command (Figure 9-8).
# /usr/sbin/rsct/bin/caa_event -a
EVENT: adapter liveness:
event_type(0)
node_number(2)
node_id(0)
sequence_number(0)
reason_number(0)
p_interface_name(en0)
EVENT: adapter liveness:
event_type(1)
node_number(2)
node_id(0)
sequence_number(1)
reason_number(0)
p_interface_name(en0)
Figure 9-8 Network failure in CAA event monitoring
PowerHA 7.1 has a new system event that is enabled by default. This new event allows for the
monitoring of the loss of the rootvg volume group while the cluster node is up and running.
Previous versions of PowerHA/HACMP were unable to monitor this type of loss. Also the
cluster was unable to perform a failover action in the event of the loss of access to rootvg. An
example is if you lose a SAN disk that is hosting the rootvg for this cluster node.
The new option is available under the SMIT menu path smitty sysmirror Custom Cluster
Configuration Events System Events. Figure 9-9 shows that the rootvg system event
is defined and enabled by default in PowerHA 7.1.
[Entry Fields]
* Event Name ROOTVG +
* Response Log event and reboot +
* Active Yes +
Figure 9-9 The rootvg system event
The default event properties instruct the system to log an event and restart when a loss of
rootvg occurs. This exact scenario is tested in the next section to demonstrate this concept.
This scenario entails a two-node cluster with one resource group. The cluster is running on
two nodes: sydney and perth. The rootvg volume group is hosted by the VIOS on a VSCSI
disk.
lsmap -all
VTD vtscsi13
Status Available
LUN 0x8100000000000000
Backing device lp5_rootvg
Physloc
Figure 9-10 VIOS output of the lsmap command showing the rootvg resource
Check the node to ensure that you have the right disk as shown in Figure 9-11.
sydney:/ # lspv
hdisk0 00c1f170ff638163 rootvg active
caa_private0 00c0f6a0febff5d4 caavg_private active
hdisk2 00c1f170674f3d6b dbvg
hdisk3 00c1f1706751bc0d appvg
Figure 9-11 PowerHA node showing the mapping of hdisk0 to the VIOS
After the mapping is established, review the cluster status to ensure that the resource group
is online as shown in Figure 9-12.
sydney:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
dbrg ONLINE sydney
OFFLINE perth
You have now removed the virtual target device (VTD) mapping that maps the rootvg LUN to
the client partition, which in this case, is the PowerHA node called sydney. You perform this
operation while the node is up and running and hosting the resource group. This operation
demonstrates what happens to the node when rootvg access has been lost.
While checking the node, the node halted and failed the resource group over to the standby
node perth (Figure 9-14). This behavior is new and expected in this situation. It is a result of
the system event that monitors access to rootvg from the kernel. Checking perth shows that
the failover happened.
perth:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
dbrg OFFLINE sydney
ONLINE perth
Figure 9-14 Node status from the standby node showing that the node failed over
LABEL: KERNEL_PANIC
IDENTIFIER: 225E3B63
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
ASSERT STRING
PANIC STRING
System Halt because of rootvg failure
Figure 9-15 System error report showing a rootvg failure
The result is that the resource group moved to the standby node as expected. Example 9-16
shows the relevant output that is written to the busan:/var/hacmp/adm/cluster.log file.
The cld messages are related to the solidDB. The cld subsystem determines whether the
local node must become the primary or secondary solidDB server in a failover. Before the
crash, solidDB was active on the seoul node as follows:
seoul:/ # lssrc -ls IBM.StorageRM | grep Leader
Group Leader: seoul, 0xdc82faf0908920dc, 2
As expected, after the crash, solidDB is active in the remaining busan node as follows:
busan:/ # lssrc -ls IBM.StorageRM | grep Leader
Group Leader: busan, 0x564bc620973c9bdc, 1
With the absence of the seoul node, its interfaces are in STALE status as shown in
Example 9-17.
Example 9-17 The lscluster -i command to check the status of the cluster
busan:/ # lscluster -i
Network/Storage Interface Query
Results: The results were the same when issuing the halt command instead of the halt
-q command.
Scenario 1
This scenario shows the use of a stress tool on the CPU of one node with more than 50
processes in the run queue and a duration of 60 seconds.
Overview
This scenario consists of a hot-standby cluster configuration with participating nodes seoul
and busan with only one Ethernet network. Each node has two Ethernet interfaces. The
resource group is hosted on seoul, and solidDB is active on the busan node. A tool is run to
stress the seoul CPU with more than 50 processes in the run queue with a duration of 60
seconds as shown in Example 9-18 on page 293.
seoul:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
db2pok_Resourc ONLINE seoul
OFFLINE busan
Beneath the lpartstat output header, you see the CPU and memory configuration for each
node:
Seoul: Power 6, type=Shared, mode=Uncapped, smt=On, lcpu=2, mem=3584MB, ent=0.50
Busan: Power 6, type=Shared, mode=Uncapped, smt=On, lcpu=2, mem=3584MB, ent=0.50
Results
Before the test, the seoul node is running within an average of 3% of its entitled capacity. The
run queue is within an average of three processes as shown in Example 9-19.
During the test, the entitled capacity raised to 200%, and the run queue raised to an average
of 50 processes as shown in Example 9-20.
Example 9-20 Checking the node status after running the stress test
seoul:/ # vmstat 2
System configuration: lcpu=2 mem=3584MB ent=0.50
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------------------
r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec
52 0 405058 167390 0 0 0 0 0 0 108 988 397 42 8 50 0 0.25 50.6
41 0 405200 167248 0 0 0 0 0 0 78 140 245 99 0 0 0 0.79 158.1
49 0 405277 167167 0 0 0 0 0 0 71 206 249 99 0 0 0 1.00 199.9
50 0 405584 166860 0 0 0 0 0 0 73 33 241 99 0 0 0 1.00 199.9
48 0 405950 166491 0 0 0 0 0 0 71 297 244 99 0 0 0 1.00 199.8
As expected, the CPU starvation did not trigger a resource group move from the seoul node
to the busan node. The /var/adm/ras/syslog.caa log file reported messages about solidDB
daemons being unable to communicate, but the leader node continued to be the busan node
as shown in Example 9-21 on page 294.
seoul:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
db2pok_Resourc ONLINE seoul
OFFLINE busan
Scenario 2
This scenario shows the use of a stress tool on the CPU of two nodes with more than 50
processes in the run queue and a duration of 60 seconds.
Overview
This scenario consists of a hot-standby cluster configuration with participating nodes seoul
and busan with only one Ethernet network. Each node has two Ethernet interfaces. Both the
resource group and the solidDB are active in busan node. A tool is run to stress the CPU of
both nodes with more than 50 processes in the run queue with a duration of 60 seconds as
shown in Example 9-22.
Example 9-22 Scenario testing the use of a stress tool on both nodes
seoul:/ # lssrc -ls IBM.StorageRM | grep Leader
Group Leader: busan, 0x564bc620973c9bdc, 1
seoul:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
db2pok_Resourc OFFLINE seoul
ONLINE busan
Results
Before the test, both nodes have a low run queue and low entitled capacity as shown in
Example 9-23.
busan:/ # vmstat 2
System configuration: lcpu=2 mem=3584MB ent=0.50
kthr memory page faults cpu
During the test, the seoul node kept an average of 50 processes in the run queue and an
entitled capacity of 200% as shown in Example 9-24.
The busan node did not respond to the vmstat command during the test. When the CPU
stress finished, it could throw just one line of output showing a run queue of 119 processes
(Example 9-25).
Both the resource group and solidDB database did not move from the busan node as shown
in Example 9-26.
Conclusion
The conclusion of this test is that eventual peak performance degradation events do not
cause resource group moves and unnecessary outages.
As a result, the seoul node halted as expected, and the resource group is acquired by the
remaining node as shown in Example 9-27.
The seoul:/var/adm/ras/syslog.caa log file recorded the messages before the crash. You
can observe that the seoul node was halted after 1 second as shown in Example 9-28.
Figure 9-16 Start After dependency between the apprg and dbrg resource group
With both resource groups online, the source (dependent) apprg resource group can be
brought offline and then online again. Alternatively, it can be gracefully moved to another
node without any influence on the target dbrg resource group. With both resource groups
online, the source (dependent) apprg resource group can be brought offline and then online
again. Alternatively, it can be gracefully moved to another node without any influence on the
target dbrg resource group. Target resource group can also be brought offline. However, to
bring the source resource group online, the target resource group must be brought online
manually (if it is offline).
If you start the cluster only on the home node of the source resource group, the apprg
resource group in this case, the cluster waits until the dbrg resource group is brought online
as shown in Example 9-30. The startup policy is Online On Home Node Only for both resource
groups.
if [ -f /dbmp/db.lck ]; then
logger -t"$file" -p$fp "DB is running!"
exit 0
fi
logger -t"$file" -p$fp "DB is NOT running!"
exit 1
if [ -f /appmp/app.lck ]; then
logger -t"$file" -p$fp "APP is running!"
exit 0
fi
logger -t"$file" -p$fp "APP is NOT running!"
exit 1
Without Startup Monitoring, the APP startup script is launched before the DB startup script
returns as shown Example 9-32.
Example 9-34 shows the state change of the resource groups during this startup.
sydney:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
dbrg ACQUIRING sydney
OFFLINE perth
apprg TEMPORARY ERROR sydney
OFFLINE perth
sydney:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
dbrg ONLINE sydney
OFFLINE perth
The default node priority is algeria first, then brazil, and then usa. The usa node gets the
lowest return value from DNP.sh. When a resource group failover is triggered, the algeria_rg
resource group is moved to the usa node, because the return value is the lowest one as
shown in Example 9-35.
-------------------------------
NODE brazil
-------------------------------
exit 105
-------------------------------
NODE algeria
-------------------------------
exit 103
When the resource group fails over, algeria_rg moves from the algeria node to the usa
node, which has the lowest return value in DNP.sh as shown in Figure 9-18.
# clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
algeria_rg ONLINE algeria
OFFLINE brazil
OFFLINE usa
# clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
algeria_rg OFFLINE algeria
OFFLINE brazil
ONLINE usa
Figure 9-18 clRGinfo of before and after takeover
-------------------------------
NODE usa
-------------------------------
exit 100
-------------------------------
NODE brazil
-------------------------------
exit 101
-------------------------------
NODE algeria
-------------------------------
exit 103
Upon resource group failover, the resource group moves to brazil as long as it has the
lowest return value among the cluster nodes this time as shown in Figure 9-19.
# clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
algeria_rg OFFLINE algeria
OFFLINE brazil
ONLINE usa
# clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
algeria_rg OFFLINE algeria
ONLINE brazil
OFFLINE usa
Figure 9-19 Resource group moving
To simplify the test scenario, DNP.sh is defined to simply return a value. In a real situation, you
can replace this DNP.sh sample file with any customized script. Then, node failover is done
based upon the return value of your own script.
For verbose logging information, you must enable debug mode by editing the
/etc/syslog.conf configuration file and adding the following line as shown in Figure 10-1:
*.debug /tmp/syslog.out rotate size 10m files 10
local0.crit /dev/console
local0.info /var/hacmp/adm/cluster.log
user.notice /var/hacmp/adm/cluster.log
daemon.notice /var/hacmp/adm/cluster.log
*.info /var/adm/ras/syslog.caa rotate size 1m files 10
*.debug /tmp/syslog.out rotate size 10m files 10
Figure 10-1 Extract from the /etc/syslog.conf file
After you make this change, verify that a syslog.out file is in the in /tmp directory. If this file is
not in the directory, create one by entering the touch /tmp/syslog.out command. After you
create the file, refresh the syslog daemon by issuing the refresh -s syslogd command.
When debug mode is enabled, you capture detailed debugging information in the
/tmp/syslog.out file. This information can assist you in troubleshooting problems with
commands, such as the mkcluster command during cluster migration.
Example 10-1 Generating a list of PowerHA log files with the clmgr utility
seoul:/ # clmgr view log
ERROR: """" does not appear to exist!
Available Logs:
autoverify.log
cl2siteconfig_assist.log
cl_testtool.log
clavan.log
clcomd.log
clcomddiag.log
clconfigassist.log
clinfo.log
clstrmgr.debug
clstrmgr.debug.long
cluster.log
cluster.mmddyyyy
clutils.log
clverify.log
cspoc.log
cspoc.log.long
cspoc.log.remote
dhcpsa.log
dnssa.log
domino_server.log
emuhacmp.out
hacmp.out
ihssa.log
migration.log
sa.log
sax.log
Example 10-2 The cltopinfo command with the disk heartbeat still being displayed
berlin:/ # cltopinfo
Cluster Name: de_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: caa_private0
Cluster IP Address:
There are 2 node(s) and 3 network(s) defined
NODE berlin:
Network net_diskhb_01
berlin_hdisk1_01 /dev/hdisk1
Network net_ether_01
berlin 192.168.101.141
Network net_ether_010
alleman 10.168.101.142
german 10.168.101.141
berlinb1 192.168.200.141
berlinb2 192.168.220.141
NODE munich:
Network net_diskhb_01
munich_hdisk1_01 /dev/hdisk1
Network net_ether_01
munich 192.168.101.142
Network net_ether_010
alleman 10.168.101.142
german 10.168.101.141
munichb1 192.168.200.142
munichb2 192.168.220.142
COMMAND STATUS
Networks
Add a Network
Change/Show a Network
Remove a Network
+--------------------------------------------------------------------------+
| Select a Network to Remove |
| |
| Move cursor to desired item and press Enter. |
| |
| net_diskhb_01 |
| net_ether_01 (192.168.100.0/22) |
| net_ether_010 (10.168.101.0/24 192.168.200.0/24 192.168.220.0/24) |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 10-4 Removing the disk heartbeat network
3. Synchronize your cluster by selecting the path: smitty sysmirror Custom Cluster
Configuration Verify and Synchronize Cluster Configuration (Advanced).
4. See if the network is deleted by using the cltopinfo command as shown in Example 10-3.
Example 10-3 Output of the cltopinfo command after removing the disk heartbeat network
berlin:/ # cltopinfo
Cluster Name: de_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: caa_private0
Cluster IP Address:
There are 2 node(s) and 2 network(s) defined
NODE berlin:
Network net_ether_01
berlin 192.168.101.141
Network net_ether_010
german 10.168.101.141
alleman 10.168.101.142
berlinb1 192.168.200.141
5. Start PowerHA on all your cluster nodes by running the smitty cl_start command.
seoul:/ # clstat -a
Failed retrieving cluster information.
There are a number of possible causes:
clinfoES or snmpd subsystems are not active.
snmp is unresponsive.
snmp is not configured correctly.
Cluster services are not active on any nodes.
Refer to the HACMP Administration Guide for more information.
seoul:/ # /usr/sbin/snmpv3_ssw -1
Stop daemon: snmpmibd
In /etc/rc.tcpip file, comment out the line that contains: snmpmibd
In /etc/rc.tcpip file, remove the comment from the line that contains: dpid2
Make the symbolic link from /usr/sbin/snmpd to /usr/sbin/snmpdv1
Make the symbolic link from /usr/sbin/clsnmp to /usr/sbin/clsnmpne
Start daemon: dpid2
Example 10-5 The clcomd daemon indicating problems with the security keys
2010-09-23T00:02:07.983104: WARNING: Cannot read the key
/etc/security/cluster/key_md5_des
2010-09-23T00:02:07.985975: WARNING: Cannot read the key
/etc/security/cluster/key_md5_3des
2010-09-23T00:02:07.986082: WARNING: Cannot read the key
/etc/security/cluster/key_md5_aes
This problem means that the /etc/cluster/rhosts file is not completed correctly. On all
cluster nodes, edit this file by using the IP addresses as the communication paths during
cluster definition, before the first synchronization. Use the host name as the persistent
address and the communication path. Then add the persistent addresses to the
/etc/cluster/rhosts file. Finally, issue the startsrc -s clcomd command.
Example 10-6 Error messages when trying to create an ECM volume group using C-SPOC
seoul: 0516-1335 mkvg: This system does not support enhanced concurrent capable
seoul: volume groups.
seoul: 0516-862 mkvg: Unable to create volume group.
seoul: cl_rsh had exit code = 1, see cspoc.log and/or clcomd.log for more
information
cl_mkvg: An error occurred executing mkvg appvg on node seoul
In this case, install the bos.clvm.enh file set and any fixes for this file set for the system to stay
in a consistent version state.
ERROR:
Figure 10-5 clmigcheck error for communication path
HACMPnode:
name = "brazil"
object = "COMMUNICATION_PATH"
value = "brazil"
node_id = 3
node_handle = 3
version = 12
Figure 10-6 Communication path definition at HACMPnode.odm
Because the clmigcheck program is a ksh script, certain profiles can cause a similar problem. If
the problem persists after you correct the /etc/hosts configuration file, try to remove the
contents of the kshrc file because it might be affecting the behavior of the clmigcheck program.
If your /etc/cluster/rhosts program is not configured properly, you see an error message
similar to the one shown in Figure 10-7. The /etc/cluster/rhosts file must contain the fully
qualified domain name of each node in the cluster (that is, the output from the host name
command). After changing the /etc/cluster/rhosts file, run the stopsrc and startsrc
commands on the clcomd subsystem.
brazil:/ # clmigcheck
lslpp: Fileset hageo* not installed.
rshexec: cannot connect to node algeria
ERROR: Internode communication failed,
check the clcomd.log file for more information.
You can also check clcomd communication by using the clrsh command as shown in
Figure 10-8.
Example 10-8 shows the exact error message saved in the smit.log file.
The message includes the solution as shown in Example 10-7. You run the rmcluster
command as shown in Example 10-9 to remove all CAA structures from the specified disk.
After you issue the rmcluster command, the administrator can synchronize the cluster again.
Tip: After running the rmcluster command, verify that the caa_private0 disk has been
unconfigured and is not seen on other nodes. Run the lqueryvg -Atp command against
the repository disk to ensure that the volume group definition is removed from the disk. If
you encounter problems with the rmcluster command, see “Removal of the volume group
when the rmcluster command does not” on page 320 for information about how to
manually remove the volume group.
One of the error messages that you see is “ERROR: Problems encountered creating the
cluster in AIX.” This message indicates a problem with creating the CAA cluster. The
clmigcheck program calls the mkcluster command to create the CAA cluster, which is what
you must look for in the logs.
To proceed with the troubleshooting, enable the syslog debugging as discussed in 10.2.1,
“The clmigcheck script” on page 308.
If you encounter a problem when creating your cluster, check these log files to ensure that the
volume group and file systems are created without any errors.
# lsvg -l caavg_private
caavg_private:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
caalv_private1 boot 1 1 1 closed/syncd N/A
caalv_private2 boot 1 1 1 closed/syncd N/A
caalv_private3 boot 4 4 1 open/syncd N/A
fslv00 jfs2 4 4 1 open/syncd /clrepos_private1
fslv01 jfs2 4 4 1 closed/syncd /clrepos_private2
powerha_crlv boot 1 1 1 closed/syncd N/A
Figure 10-9 Contents of the caavg_private volume group
Figure 10-10 shows a crfs failure while creating the CAA cluster. This problem was corrected
by removing incorrect entries in the /etc/filesystems file. Likewise, problems can happen
when you already have the same logical volume name that must be used by the CAA cluster,
for example.
Tip: When you look at the syslog.caa file, focus on the AIX commands (such as mkvg, mklv,
and crfs) and their returned values. If you find non-zero return values, a problem exists.
You can see that the volume group creation failed because the name is already in use. This
problem can happen for several reasons. Ffor example, it can occur if the disk was previously
used as the CAA repository or the disk had the volume group descriptor area (VGDA)
information of other volume group in it.
For the full sequence of steps, see 10.4.1, “Previously used repository disk for CAA” on
page 316.
If you find that the rmcluster command has not removed your CAA definition from the disk,
use the steps in the following section, “Removal of the volume group when the rmcluster
command does not.”
Removal of the volume group when the rmcluster command does not
In this situation, you must use the Logical Volume Manager (LVM) commands, which you can
do in one of two ways. The easiest method is to import the volume group, vary on the volume
group, and then reduce it so that the VGDA is removed from the disk. If this method does not
work, use the dd command to overwrite special areas of the disk.
Tip: Make sure that the data contained on the disk is not needed because usage of the
following steps destroys the volume group data on the disk.
If the activation fails, run the exportvg command to remove the volume group definition from
the ODM. Then try to import it with a different name as follows:
# exportvg vgname
# importvg -y new-vgname hdiskx
After you complete the forced reduction, check whether the disk no longer contains a volume
group by using the lqueryvg -Atp hdisk command.
Also verify whether any previous volume group definition is still being displayed on the other
nodes of your cluster by using the lspv command. If the lspv output shows the PVID with one
associated volume group, you can fix it by running the exportvg vgname command.
If experience any problems with this procedure, try a force overwrite of the disk as described
in “Overwriting the disk.”
Attention: Only attempt this method if the rmcluster and reducevg procedures fail and if
AIX still has access to the disk. You can check this access by running the lquerypv -h
/dev/hdisk command.
This command zeros only the part of the disk that contains the repository offset. Therefore,
you do not lose the PVID information.
In some cases, this procedure is not sufficient to resolve the problem. If you need to
completely overwrite the disk, run the following procedure:
Attention: This procedure overwrites the entire disk structure including the PVID. You
must follow the steps as shown to change the PVID if required during migration.
Run the lspv command to check that the PVID is the same on both nodes. To ensure that you
have the real PVID, query the disk as follows:
# lquerypv -h /dev/hdiskn
The PVID should match the lspv output as shown in Figure 10-14.
chile:/ # lspv
hdisk1 000fe4114cf8d1ce None
hdisk2 000fe40163c54011 None
hdisk3 000fe40168921cea None
hdisk4 000fe4114cf8d3a1 None
hdisk5 000fe4114cf8d441 None
hdisk6 000fe4114cf8d4d5 None
hdisk7 000fe4114cf8d579 None
hdisk8 000fe4114cf8d608 ny_datavg
hdisk0 000fe40140a5516a rootvg active
Figure 10-14 The lspv output showing PVID
CLUSTER_TYPE:STANDARD
CLUSTER_REPOSITORY_DISK:000fe40120e16405
CLUSTER_MULTICAST:NULL
Figure 10-15 Changing the PVID in the clmigcheck.txt file
If this is post migration and PowerHA is installed, you must also modify the HACMPsircol ODM
class (Figure 10-16) on all nodes in the cluster.
HACMPsircol:
name = "newyork_sircol"
id = 0
uuid = "0"
repository = "000fe4114cf8d258"
ip_address = ""
nodelist = "serbia,scotland,chile,"
backup_repository1 = ""
backup_repository2 = ""
Figure 10-16 The HACMPsircol ODM class
You might be able to recover by recreating the CAA cluster from the last CAA configuration
(HACMPsircol class in ODM) as explained in the following steps:
1. Clear the CAA repository disk as explained in “Previously used repository disk for CAA” on
page 316.
2. Perform a synchronization or verification of the cluster. Upon synchronizing the cluster, the
mkcluster command is run to recreate the CAA cluster. However, if the problem still
persists, contact IBM support.
The following section, “Hardware requirements”, explains the installation requirements of IBM
Systems Director v6.2 on AIX.
Table 11-1 lists the hardware requirements for IBM Systems Director Server running on AIX
for a small configuration that has less than 500 managed systems.
Table 11-1 Hardware requirements for IBM Systems Director Server on AIX
Resource Requirement
Memory 3 GB
Disk storage 4 GB
For more details about hardware requirements, see the “Recommended hardware
requirements for IBM Systems Director Server running on AIX” topic in the IBM Systems
Director Information Center at:
http://publib.boulder.ibm.com/infocenter/director/v6r2x/index.jsp?topic=/com.ib
m.director.plan.helps.doc/fqm0_r_hardware_requirements_servers_running_aix.html
The following steps summarize the process for installing IBM Systems Director on AIX:
1. Increase the file size limit:
ulimit -f 4194302 (or to unlimited)
2. Increase the number of file descriptors:
ulimit -n 4000
3. Verify the file system (/, /tmp and /opt) size as mentioned in Table 11-1 on page 326:
df -g / /tmp /opt
4. Download IBM Systems Director from the IBM Systems Director Downloads page at:
http://www.ibm.com/systems/management/director/downloads/
5. Extract the content:
gzip -cd <package_name> | tar -xvf -
where <package_name> is the file name of the download package.
6. Install the content by using the script in the extracted package:
./dirinstall.server
Chapter 11. Installing IBM Systems Director and the PowerHA SystemMirror plug-in 327
11.1.3 Configuring and activating IBM Systems Director
To configure and activate IBM Systems Director, follow these steps:
1. Configure IBM Systems Director by using the following script:
/opt/ibm/director/bin/configAgtMgr.sh
Agent password: The script prompts for an agent password for which you can
consider giving the host system root password or any other common password of your
choice. This password is used by IBM Systems Director for its internal communication
and does have any external impact.
/opt/ibm/director/bin/smstatus -r
Inactive
Starting
Active
Figure 11-1 Activation status for IBM Systems Director
After completing the installation of IBM Systems Director, install the SystemMirror plug-in as
explained in the following section.
Chapter 11. Installing IBM Systems Director and the PowerHA SystemMirror plug-in 329
Figure 11-2 shows the output of the plug-in status.
94:RESOLVED:com.ibm.director.power.ha.systemmirror.branding:7.1.0.1:com.ibm.director.power.ha.systemmirr
or.branding
95:ACTIVE:com.ibm.director.power.ha.systemmirror.common:7.1.0.1:com.ibm.director.power.ha.systemmirror.c
ommon
96:ACTIVE:com.ibm.director.power.ha.systemmirror.console:7.1.0.1:com.ibm.director.power.ha.systemmirror.
console
97:RESOLVED:com.ibm.director.power.ha.systemmirror.helps.doc:7.1.0.1:com.ibm.director.power.ha.systemmir
ror.helps.doc
98:INSTALLED:com.ibm.director.power.ha.systemmirror.server.fragment:7.1.0.0:com.ibm.director.power.ha.sy
stemmirror.server.fragment
99:ACTIVE:com.ibm.director.power.ha.systemmirror.server:7.1.0.1:com.ibm.director.power.ha.systemmirror.s
erver
If the subagent interface plug-in shows the RESOLVED status instead of the ACTIVE status,
attempt to start the subagent. Enter the following commands by using the lwiplugin.sh script
on AIX and Linux or the lwiplugin.bat script on Windows and the plug-in number (which is
94):
AIX and Linux
/opt/ibm/director/agent/bin/lwiplugin.sh -start 94
Windows
C:/Program Files/IBM/Director/lwi/bin/lwiplugin.bat -start 94
If Systems Director was active during installation of the plug-in, you must stop it and restart it
as follows:
1. Stop the IBM Systems Director Server:
# /opt/ibm/director/bin/smstop
2. Start the IBM Systems Director Server:
# /opt/ibm/director/bin/smstart
3. Monitor the startup process:
# /opt/ibm/director/bin/smstatus -r
Inactive
Starting
Active *** (the "Active" status can take a long time)
More information: See the SystemMirror agent installation section in Configuring AIX
Clusters for High Availability Using PowerHA SystemMirror for Systems Director paper at:
http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101774
Chapter 11. Installing IBM Systems Director and the PowerHA SystemMirror plug-in 331
11.3.2 Installing the PowerHA SystemMirror agent
To install the PowerHA SystemMirror agent on the nodes, follow these steps:
1. Install the cluster.es.director.agent.rte file set:
# smitty install_latest
2. Stop the common agent:
# stopsrc -s platform_agent
# stopsrc -s cimsys
3. Start the common agent:
# startsrc -s platform_agent
Tip: The cimsys subsystem starts along with the platform_agent subsystem.
This chapter explains how to create and manage the PowerHA SystemMirror cluster with IBM
Systems Director.
3. In the IBM Systems Director console, in the left navigation pane, expand Availability and
select PowerHA SystemMirror (Figure 12-2).
Figure 12-2 Selecting the PowerHA SystemMirror link in IBM Systems Director
5. Starting with the Create Cluster Wizard, follow the wizard panes to create the cluster.
a. In the Welcome pane (Figure 12-4), click Next.
Chapter 12. Creating and managing a cluster using IBM Systems Director 335
b. In the Name the cluster pane (Figure 12-5), in the Cluster name field, provide a name
for the cluster. Click Next.
c. In the Choose nodes pane (Figure 12-6), select the host names of the nodes.
Figure 12-7 Verifying common storage availability for the repository disk
d. In the Configure nodes pane (Figure 12-8), set the controlling node. The controlling
node in the cluster is considered to be the primary or home node. Click Next.
Chapter 12. Creating and managing a cluster using IBM Systems Director 337
e. In the Choose repositories pane (Figure 12-9), choose the storage disk that is shared
among all nodes in the cluster to use as the common storage repository. Click Next.
f. In the Configure security pane (Figure 12-10), specify the security details to secure
communication within the cluster.
6. Verify the cluster creation in the AIX cluster nodes by using either of the following
commands:
– The CAA command:
/usr/sbin/lscluster -m
– The PowerHA command:
/usr/es/sbin/cluster/utilities/cltopinfo
Chapter 12. Creating and managing a cluster using IBM Systems Director 339
Overview of the CLI
The CLI is executed by using a general-purpose smcli command. To list the available CLI
commands for managing the cluster, run the smcli lsbundle command as shown in
Figure 12-12.
You can retrieve help information for the commands (Figure 12-12) as shown in Figure 12-13.
To verify the availability of the mkcluster command, you can use the smcli lsbundle
command in IBM Systems Director as shown in Figure 12-12.
Example 12-1 Creating a cluster with the smcli mkcluster CLI command
smcli mkcluster -i 224.0.0.0 \
-r hdisk3 \
–n nodeA.xy.ibm.com,nodeB.xy.ibm.com \
DB2_Cluster
You can use the -h option to list the commands that are available (Figure 12-14).
# smcli mkcluster -h
smcli sysmirror/mkcluster {-h|-?|--help} [-v|--verbose]
smcli sysmirror/mkcluster [{-i|--cluster_ip} <multicast_address>] \
[{-S|--fc_sync_interval} <##>] \
[{-s|--rg_settling_time} <##>] \
[{-e|--max_event_time} <##>] \
[{-R|--max_rg_processing_time} <##>] \
[{-c|--controlling_node} <node>] \
[{-d|--shared_disks} <DISK>[,<DISK#2>,...] ] \
{-r|--repository} <disk> \
{-n|--nodes} <NODE>[, <NODE#2>,...] \
[<cluster_name>]
Figure 12-14 The mkcluster -h command to list the available commands
To verify that the cluster has been created, you can use the smcli lscluster command.
Command help: To assistance with using the commands, you can use either of the
following help options:
smcli <command name> -help --verbose
smcli <command name> -h -v
Chapter 12. Creating and managing a cluster using IBM Systems Director 341
Accessing the Cluster Management Wizard
To access the Cluster Management Wizard, follow these steps:
1. In the IBM Systems Director console, expand Availability and select PowerHA
SystemMirror (Figure 12-3 on page 335).
2. In the right pane, under Cluster Management, click the Manage Clusters link
(Figure 12-15).
Chapter 12. Creating and managing a cluster using IBM Systems Director 343
Edit Advanced Properties button
Under the General tab, you can click the Edit Advanced Properties button to modify the
cluster properties. For example, you can change the controlling node as shown in
Figure 12-17.
Figure 12-17 Editing the advanced properties, such as the controlling node
Capture Snapshot
You can capture and manage snapshots through the Snapshots tab. To capture a new
snapshot, click the Create button on the Snapshots tab as shown in Figure 12-20.
Chapter 12. Creating and managing a cluster using IBM Systems Director 345
File collection and logs management
You can manage file collection and logs on the Additional Properties tab. From the View
drop-down list, select either File Collections or Log files as shown in Figure 12-21.
Figure 12-21 Additional Properties tab: File Collections and Log files options
The Systems Director plug-in also provides a CLI to manage the cluster. The following section
explains the available CLI commands and how you can find help for each of these commands.
A few of the CLI commands are provided as follows for a quick reference:
Snapshot creation
You can use the smcli mksnapshot command to create a snapshot. Figure 12-24 on
page 348 shows the command for obtaining detailed help about this command.
Chapter 12. Creating and managing a cluster using IBM Systems Director 347
# smcli mkss -h -v
Verify the snapshot by using the smcli lsss command as shown in Example 12-3.
File collection
You can use the smcli mkfilecollection command to create a file collection as shown in
Example 12-4. A file collection helps to keep the files and directories synchronized on all
nodes in the cluster.
Log files
You can use the smcli lslog command (Example 12-5) to list the available log files in the
cluster. Then you can use the smcli vlog command to view the log files.
Modification functionality: At the time of writing this IBM Redbooks publication, an edit
or modification CLI command, such as to modify the controlling node, is not available for its
initial release. Therefore, use the GUI wizards for the modification functionality.
Chapter 12. Creating and managing a cluster using IBM Systems Director 349
4. On the Clusters tab, click the Actions list and select Add Resource Group
(Figure 12-26). Then select the cluster node, and click the Action button.
Alternative: You can select the resource group configuration wizard by selecting the
cluster nodes, as shown in Figure 12-26.
5. In the Choose a cluster pane (Figure 12-27), choose the cluster where the resource group.
Notice that this step is highlighted under welcome in the left pane.
Figure 12-27 Choose the cluster for the resource group configuration
You can now choose to create either a custom resource group or a predefined resource group
as explained in 12.3.1, “Creating a custom resource group” on page 351, and 12.3.2,
“Creating a predefined resource group” on page 353.
2. In the Choose nodes pane (Figure 12-29), select the nodes for which you want to
configure the resource group.
Chapter 12. Creating and managing a cluster using IBM Systems Director 351
3. In the Choose policies and attributes pane (Figure 12-30), select the policies to add to the
resource group.
4. In the Choose resources pane (Figure 12-31), select the shared resources to define for
the resource group.
Application list: Only the applications installed in the cluster nodes are displayed
under the predefined resource group list.
Chapter 12. Creating and managing a cluster using IBM Systems Director 353
Figure 12-33 Predefined resource group configuration
2. In the Choose components pane, for the predefined resource group, select the
components of the application to create the resource group. In the example shown in
Figure 12-34, the Tivoli Director Server component is selected. Each component already
has the predefined properties such as the primary node and takeover node.
Modify the properties per your configuration and requirements. Then create the resource
group.
3. Enter the following base SystemMirror command to verify that the resource group has
been created:
/usr/es/sbin/cluster/utilities/clshowres
Chapter 12. Creating and managing a cluster using IBM Systems Director 355
3. In the right pane, under Resource Group Management, select Manage Resource Groups
(Figure 12-36).
The Resource Group Management wizard opens as in Figure 12-37. Alternatively, you can
access the Resource Group Management wizard by selecting Manage Cluster under Cluster
Management (Figure 12-36).
To access the Cluster and Resource Group Management wizard, select the Resource
Groups tab as shown in Figure 12-37.
c. In the Parent-child window (Figure 12-39), select the dependency type to configure the
dependencies.
Chapter 12. Creating and managing a cluster using IBM Systems Director 357
Resource group removal
Right-click the selected resource group, and click Remove to remove the resource group
as shown in Figure 12-40.
Chapter 12. Creating and managing a cluster using IBM Systems Director 359
Examples of CLI command usage
This section shows examples using the CLI commands for resource group management.
To list the resource groups, use the following command as shown in Example 12-6:
smcli lsresgrp -c <cluster name>
To remove the resource group, use the following command as shown in Example 12-7:
smcli rmresgrp -c <cluster name> -C <RG_name>
Example 12-7 The smcli rmresgrp command using the -C option to confirm the removal operation
# smcli rmresgrp -c selma04_cluster Test_AltRG
Removing this resource group will cause all user-defined PowerHA information
to be DELETED.
5. In the Verify and Synchronize pane (Figure 12-44), select whether you want to
synchronize the entire configuration, only the unsynchronized changes, or verify. Then
click OK.
Chapter 12. Creating and managing a cluster using IBM Systems Director 361
6. Optional: Undo the changes to the configuration after synchronization.
a. To access this option, in the Cluster and Resource Group Management wizard, on the
Clusters tab, select the cluster for which you want to perform the synchronize and
verification function (Figure 12-43 on page 361).
b. As shown in (Figure 12-45), select the Recovery Undo local changes of
configuration.
c. When you see the Undo Local Changes of the Configuration message (Figure 12-46),
click OK.
Snapshot for the undo changes option: The undo changes option creates a
snapshot before it deletes the configuration since the last synchronization.
Example 12-9 shows how to synchronize cluster changes and to log the output in its own
specific log file.
Example 12-9 smcli synccluster changes only with the log file option
# smcli synccluster -C -l /tmp/sync.log selma04_cluster
Undo changes
To restore the cluster configuration back to the configuration after any synchronization,
use the smcli undochanges command. This operation restores the cluster configuration
from the active configuration database. Typically, this command has the effect of
discarding any unsynchronized changes.
The help option is available by using the smcli undochanges -h -v command as shown in
Example 12-10.
Example 12-10 The help option for the smcli undochanges command
# smcli undochanges -h -v
Chapter 12. Creating and managing a cluster using IBM Systems Director 363
Command Alias: undo
-h|-?|--help
Requests help for this command.
-v|--verbose
Requests maximum details in the displayed information.
<CLUSTER> The label of a cluster to perform this operation on.
...
<output truncated >
Test_AhRG
myRG RG_test_NChg
_testinggg
RG_testing11
RG_testing9
RG01_selma03
RG_testing6
selma_04_cluster
RG_testing2
RG05_selma03_04
RG_TEST_4
RG06_selma03_04
Chapter 12. Creating and managing a cluster using IBM Systems Director 365
Cluster subsystem services status:
You can view the status of PowerHA services, such as the clcomd subsystem, by using the
Status feature. To access this feature, select the cluster for which the service status is to
be viewed. Click the Action button and select Reports Status.
You now see the cluster service status details, similar to the example in Figure 12-49.
Similarly you can view the configuration report for the resource group as shown in
Figure 12-52. On the Resource Groups tab, select the resource group for which you want
to view the configuration. Then click the Action button and select Reports.
Chapter 12. Creating and managing a cluster using IBM Systems Director 367
Application monitoring
To locate the details of the application monitors that are configured and assigned to a
resource group, select the cluster. Click the Action button and select Reports
Applications. Figure 12-53 shows the status of the application monitoring.
Similarly you can view the configuration report for networks and interfaces by selecting the
cluster, clicking the Action button, and selecting Reports Networks and Interfaces.
Chapter 12. Creating and managing a cluster using IBM Systems Director 369
Recovering from an event failure
After you issue a cluster recover from event failure, you see a message similar to the one
shown in Figure 12-57. Verify that you have addressed all problems that led to the error
before continuing with the operation.
PPRC and SPPRC file sets: The PPRC and SPPRC file sets are not required for
Global Mirror support on PowerHA.
The following additional file sets included in SP3 (must be installed separately and require
the acceptance of licenses during the installation):
– cluster.es.genxd
cluster.es.genxd.cmds 6.1.0.0 Generic XD support - Commands
cluster.es.genxd.rte 6.1.0.0 Generic XD support - Runtime
– cluster.msg.en_US.genxd
cluster.msg.en_US.genxd 6.1.0.0 Generic XD support - Messages
AIX supported levels:
– 5.3 TL9, RSCT 2.4.12.0, or later
– 6.1 TL2 SP1, RSCT 2.5.4.0, or later
The IBM DS8700 microcode bundle 75.1.145.0 or later
DS8000 CLI (DSCLI) 6.5.1.203 or later client interface (must be installed on each
PowerHA SystemMirror node):
– Java™ 1.4.1 or later
– APAR IZ74478, which removes the previous Java requirement
The path name for the DSCLI client in the PATH for the root user on each PowerHA
SystemMirror node (must be added)
13.1.3 Considerations
The PowerHA SystemMirror Enterprise Edition using DS8700 Global Mirror has the following
considerations:
The AIX Virtual SCSI is not supported in this initial release.
No auto-recovery is available from a PPRC path or link failure.
If the PPRC path or link between Global Mirror volumes breaks down, the PowerHA
Enterprise Edition is unaware of it. (PowerHA does not process Simple Network
Management Protocol (SNMP) for volumes that use DS8K Global Mirror technology for
mirroring). In this case, the user must identify and correct the PPRC path failure.
Depending on timing conditions, such an event can result in the corresponding Global
Mirror session to go to a “Fatal” state. If this situation occurs, the user must manually stop
and restart the corresponding Global Mirror Session (using the rmgmir and mkgmir DSCLI
commands) or an equivalent DS8700 interface.
Cluster Single Point Of Control (C-SPOC) cannot perform the some Logical Volume
Manager (LVM) operations on nodes at the remote site that contain the target volumes.
Operations that require nodes at the target site to read from the target volumes result in an
error message in C-SPOC. Such operations include such functions as changing the file
system size, changing the mount point, and adding LVM mirrors. However, nodes on the
same site as the source volumes can successfully perform these tasks, and the changes
can be propagated later to the other site by using a lazy update.
Attention: For C-SPOC operations to work on all other LVM operations, you must
perform all C-SPOC operations with the DS8700 Global Mirror volume pairs in a
synchronized or consistent state. Alternatively, you must perform them in the active
cluster on all nodes.
The volume group names must be listed in the same order as the DS8700 mirror group
names in the resource group.
ftp://ftp.software.ibm.com/storage/ds8000/updates/DS8K_Customer_Download_Files/CLI
Install the DS8000 DSCLI software on each PowerHA SystemMirror node. By default, the
installation process installs the DSCLI in the /opt/ibm/dscli directory. Add the installation
directory of the DSCLI into the PATH environment variable for the root user.
For more details about the DS8000 DSCLI, see the IBM System Storage DS8000:
Command-Line Interface User’s Guide, SC26-7916.
Txrmnia
For this test, the resources are limited. Each system has a single IP, an XD_ip network, and
single Fibre Channel (FC) host adapters. Ideally, redundancy might exist throughout the
system, including in the local Ethernet networks, cross-site XD_ip networks, and FC
connectivity. This scenario has a single resource group, ds8kgmrg, which consists of a service
IP address (service_1), a volume group (txvg), and a DS8000 Global Mirror replicated
resource (texasmg). To configure the cluster, see 13.6, “Configuring the cluster” on page 385.
For each task, the DS8000 storage units are already added to the storage area network
(SAN) fabric and zoned appropriately. Also, the volumes are already provisioned to the nodes.
3. Check the code bundle level that corresponds to your LMC version on the “DS8700 Code
Bundle Information” web page at:
http://www.ibm.com/support/docview.wss?uid=ssg1S1003593
The code bundle level must be at version 75.1.145.0 or later. Also on the same page,
verify that your displayed DSCLI version corresponds to the installed code bundle level or
a later level.
Example 13-2 shows the extra parameters inserted into the DSCLI configuration file for the
storage unit in the primary site, /opt/ibm/dscli/profile/dscli.profile.hmc1. Adding these
parameters helps to prevent from having to type them each time they are required.
Global Copy
Flash Copy
Data Volume
Table 13-1 shows the association between the source and target volumes of the replication
relationship and between their logical subsystems (LSS, the two most significant digits of a
volume identifier highlighted in bold in the table). Table 13-1 also indicates the mapping
between the volumes in the DS8000 units and their disk names on the attached AIX hosts.
You can easily obtain this mapping by using the lscfg -vl hdiskX | grep Serial command
as shown in Example 13-3. The hdisk serial number is a concatenation of the storage image
serial number and the ID of the volume at the storage level.
Example 13-3 The hdisk serial number in the lscfg command output
# lscfg -vl hdisk10 | grep Serial
Serial Number...............75DC8902E00
# lscfg -vl hdisk6 | grep Serial
Serial Number...............75DC8902600
4. In a similar manner, configure one PPRC path for each other involved LSS pair.
5. Because the PPRC paths are unidirectional, create a second path, in the opposite
direction, for each LSS pair. You use the same procedure, but work on the other storage
unit (see Example 13-6). We select different FC links for this direction.
Example 13-10 Defining the GM session for the source and target volumes
dscli> mksession -lss 2e 03
Date/Time: October 5, 2010 6:11:07 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
CMUC00145I mksession: Session 03 opened successfully.
dscli> mksession -lss 26 03
Date/Time: October 5, 2010 6:11:25 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
CMUC00145I mksession: Session 03 opened successfully.
Including all the source and target volumes in the Global Mirror session
Add the volumes in the Global Mirror sessions and verify their status by using the commands
shown in Example 13-11.
Example 13-11 Adding source and target volumes to the Global Mirror sessions
dscli> chsession -lss 26 -action add -volume 2600 03
Date/Time: October 5, 2010 6:15:17 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
CMUC00147I chsession: Session 03 successfully modified.
dscli> chsession -lss 2e -action add -volume 2e00 03
Date/Time: October 5, 2010 6:15:56 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
CMUC00147I chsession: Session 03 successfully modified.
dscli> lssession 26 2e
Date/Time: October 5, 2010 6:16:21 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
26 03 Normal 2600 Join Pending Primary Copy Pending Secondary Simplex True Disable
2E 03 Normal 2E00 Join Pending Primary Copy Pending Secondary Simplex True Disable
dscli>
dscli> chsession -lss 2c -action add -volume 2c00 03
Date/Time: October 6, 2010 5:41:12 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
CMUC00147I chsession: Session 03 successfully modified.
dscli> chsession -lss 28 -action add -volume 2800 03
Date/Time: October 6, 2010 5:41:56 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
You must configure the volume groups and file systems on the cluster nodes. The application
might need the same major number for the volume group on all nodes. Perform this
configuration task because it might be useful later for additional configuration of the Network
File System (NFS).
For the nodes on the primary site, you can use the standard procedure. You define the
volume groups and file systems on one node and then import them to the other nodes. For
the nodes on the secondary site, you must first suspend the replication on the involved target
volumes.
root@robert: lvlstmajor
44..54,56...
root@jordan: # lvlstmajor
50...
3. Import the volume group on the second node on the primary site, leeann, as shown in
Example 13-14:
a. Verify that the shared disks have the same PVID on both nodes.
b. Run the rmdev -dl command for each hdisk.
c. Run the cfgmgr program.
d. Run the importvg command.
Example 13-14 Importing the txvg volume group on the leeann node
root@leean: rmdev -dl hdisk6
hdisk6 deleted
root@leean: rmdev -dl hdisk10
hdisk10 deleted
root@leean: cfgmgr
root@leean:lspv | grep -e hdisk6 -e hdisk10
hdisk6 000a625afe2a4958 txvg
hdisk10 000a624a833e440f txvg
root@leean: importvg -V 51 -y txvg hdisk6
txvg
root@leean: lsvg -l txvg
txvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
txlv jfs2 250 250 2 open/syncd /txro
txloglv jfs2log 1 1 1 open/syncd N/A
Example 13-15 Pausing the Global Copy relationship on the primary site
dscli> lspprc -l 2600 2e00
Date/Time: October 6, 2010 3:40:56 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID State Reason Type Out Of Sync Tracks Tgt Read Src Cascade Tgt Cascade Date
Suspended SourceLSS Timeout (secs) Critical Mode First Pass Status Incremental Resync Tgt Write GMIR CG
PPRC CG isTgtSE DisableAutoResync
===========================================================================================================
===========================================================================================================
2600:2C00 Copy Pending - Global Copy 0 Disabled Disabled Invalid -
26 60 Disabled True Disabled Disabled N/A Disabled
Unknown False
2E00:2800 Copy Pending - Global Copy 0 Disabled Disabled Invalid -
2E 60 Disabled True Disabled Disabled N/A Disabled
Unknown False
dscli> pausepprc 2600:2C00 2E00:2800
Date/Time: October 6, 2010 3:49:29 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
CMUC00157I pausepprc: Remote Mirror and Copy volume pair 2600:2C00 relationship successfully paused.
CMUC00157I pausepprc: Remote Mirror and Copy volume pair 2E00:2800 relationship successfully paused.
dscli> lspprc -l 2600 2e00
Date/Time: October 6, 2010 3:49:41 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID State Reason Type Out Of Sync Tracks Tgt Read Src Cascade Tgt Cascade Date
Suspended SourceLSS Timeout (secs) Critical Mode First Pass Status Incremental Resync Tgt Write GMIR CG
PPRC CG isTgtSE DisableAutoResync
===========================================================================================================
===========================================================================================================
2600:2C00 Suspended Host Source Global Copy 0 Disabled Disabled Invalid -
26 60 Disabled True Disabled Disabled N/A Disabled
Unknown False
2E00:2800 Suspended Host Source Global Copy 0 Disabled Disabled Invalid -
2E 60 Disabled True Disabled Disabled N/A Disabled
Unknown False
dscli>
4. To make the target volumes available to the attached hosts, use the failoverpprc
command on the secondary site as shown in Example 13-16.
Example 13-16 The failoverpprc command on the secondary site storage unit
dscli> failoverpprc -type gcp 2C00:2600 2800:2E00
Date/Time: October 6, 2010 3:55:19 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
CMUC00196I failoverpprc: Remote Mirror and Copy pair 2C00:2600 successfully reversed.
CMUC00196I failoverpprc: Remote Mirror and Copy pair 2800:2E00 successfully reversed.
dscli> lspprc 2C00:2600 2800:2E00
Date/Time: October 6, 2010 3:55:35 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
5. Refresh and check the PVIDs. Then import and vary off the volume group as shown in
Example 13-17.
Example 13-17 Importing the volume group txvg on the secondary site node, robert
root@robert: rmdev -dl hdisk2
hdisk2 deleted
root@robert: rmdev -dl hdisk6
hdisk6 deleted
root@robert: cfgmgr
root@robert: lspv |grep -e hdisk2 -e hdisk6
hdisk2 000a624a833e440f txvg
hdisk6 000a625afe2a4958 txvg
root@robert: importvg -V 50 -y txvg hdisk2
txvg
root@robert: lsvg -l txvg
txvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
txlv jfs2 250 250 2 closed/syncd /txro
txloglv jfs2log 1 1 1 closed/syncd N/A
root@robert: varyoffvg txvg
Adding a cluster
To add a cluster, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select Extended Configuration Extended Topology Configuration
Configure an HACMP Cluster Add/Change/Show an HACMP Cluster.
3. Enter the cluster name, which is Txrmnia in this scenario, as shown in Figure 13-3. Press
Enter.
[Entry Fields]
* Cluster Name [Txrmnia]
Figure 13-3 Adding a cluster in the SMIT menu
[Entry Fields]
* Node Name [jordan]
Communication Path to Node [] +
Figure 13-4 Add a Node SMIT menu
4. In this scenario, repeat these steps two more times to add the additional nodes of leeann
and robert.
Adding sites
To add the nodes, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration Extended Topology
Configuration Configure HACMP Sites Add a Site.
3. Enter the desired site name, which in this scenario is the Texas site with the nodes jordan
and leeann, as shown in Figure 13-5. Press Enter.
The output is displayed in the SMIT Command Status window.
Add a Site
[Entry Fields]
* Site Name [Texas] +
* Site Nodes jordan leeann +
Figure 13-5 Add a Site SMIT menu
4. In this scenario, repeat these steps to add the Romania site with the robert node.
Adding networks
To add the nodes, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration Extended Topology
Configuration Configure HACMP Networks Add a Network to the HACMP
Cluster.
3. Choose the desired network type, which in this scenario is XD_ip.
4. Keep the default network name and press Enter (Figure 13-6).
[Entry Fields]
* Network Name [net_XD_ip_01]
* Network Type XD_ip
* Netmask(IPv4)/Prefix Length(IPv6) [255.255.255.0]
* Enable IP Address Takeover via IP Aliases [Yes] +
IP Address Offset for Heartbeating over IP Aliases []
Figure 13-6 Add an IP-Based Network SMIT menu
5. Repeat these steps but select a network type of diskhb for the disk heartbeat network and
keep the default network name of net_diskhb_01.
[Entry Fields]
* IP Label/Address [jordan_base] +
* Network Type XD_ip
* Network Name net_XD_ip_01
* Node Name [jordan] +
Figure 13-7 Add communication interface SMIT menu
5. Repeat these steps and select Communication Devices to complete the disk heartbeat
network.
The topology is now configured. Also you can see all the interfaces and devices from the
cllsif command output shown in Figure 13-8.
[Entry Fields]
* IP Label/Address serviceip_2 +
Netmask(IPv4)/Prefix Length(IPv6) []
* Network Name net_XD_ip_01
Alternate HW Address to accompany IP Label/Address []
Associated Site ignore
Figure 13-9 Add a Service IP Label SMIT menu
In most true site scenarios, where each site is on different segments, it is common to create
at least two service IP labels. You create one for each site by using the Associated Site
option, which indicates the desire to have site-specific service IP labels. With this option, you
can have a unique service IP label at each site. However, we do not use them in this test
because we are on the same network segment.
Because these options are all new, define each one before you configure them:
Storage agent A generic name given by PowerHA SystemMirror for an entity such as
the IBM DS8000 HMC. Storage agents typically provide a one-point
coordination point and often use TCP/IP as their transport for
communication. You must provide the IP address and authentication
information that will be used to communicate with the HMC.
Storage system A generic name given by PowerHA SystemMirror for an entity such as
a DS8700 Storage Unit. When using Global Mirror, you must associate
one storage agent with each storage system. You must provide the
IBM DS8700 system identifier for the storage system. For example,
IBM.2107-75ABTV1 is a storage identifier for a DS8000 Storage
System.
Mirror group A generic name given by PowerHA SystemMirror for a logical
collection of volumes that must be mirrored to another storage system
that resides on a remote site. A Global Mirror session represents a
mirror group.
[Entry Fields]
* Storage Agent Name [ds8khmc]
* IP Addresses [9.3.207.122]
* User ID [redbook]
* Password [r3dbook]
Figure 13-10 Add a Storage Agent SMIT menu
It is possible to have multiple storage agents. However, this test scenario has only one
storage agent that manages both storage units.
Important: The user ID and password are stored as flat text in the
HACMPxd_storage_agent.odm file.
[Entry Fields]
* Storage System Name [texasds8k]
* Storage Agent Name(s) ds8kmainhmc +
* Site Association Texas +
* Vendor Specific Identification [IBM.2107-75DC890] +
* WWNN [5005076308FFC004] +
Figure 13-11 Add a Storage System SMIT menu
[Entry Fields]
* Mirror Group Name [texasmg]
* Storage System Name texasds8k romaniads8k +
* Vendor Specific Identifier [03] +
* Recovery Action automatic +
Maximum Coordination Time [50]
Maximum Drain Time [30]
Consistency Group Interval Time [0]
Figure 13-12 Add a Mirror Group SMIT menu
Vendor Specific Identifier field: For the Vendor Specific Identifier field, provide only the
Global Mirror session number.
[Entry Fields]
* Resource Group Name [ds8kgmrg]
In this scenario, we only added a service IP label, the volume group, and the DS8000 Global
Mirror Replicated Resources as shown in the streamlined clshowres command output in
Example 13-21.
Volume group: The volume group names must be listed in the same order as the DS8700
mirror group names in the resource group.
DS8000 Global Mirror Replicated Resources field: In the SMIT menu for adding
resources to the resource group, notice that the appropriate field is named DS8000 Global
Mirror Replicated Resources. However, when viewing the menu by using the clshowres
command (Example 13-21 on page 392), the field is called GENXD Replicated Resources.
You can now synchronize the cluster, start the cluster, and begin testing it.
In these scenarios, redundancy tests, such as on IP networks that have only a single network,
cannot be performed. Instead you must configure redundant IP or non-IP communication
paths to avoid isolation of the sites. The loss of all the communication paths between sites
leads to a partitioned state of the cluster. Such a loss also leads to data divergence between
sites if the replication links are also unavailable.
Another specific failure scenario is the loss of replication paths between the storage
subsystems while the cluster is running on both sites. To avoid this type of loss, configure a
redundant PPRC path or links for the replication. You must manually recover the status of the
pairs after the storage links are operational again.
Important: If the PPRC path or link between Global Mirror volumes breaks down, the
PowerHA Enterprise Edition is unaware. The reason is that PowerHA does not process
SNMP for volumes that use DS8700 Global Mirror technology for mirroring. In such a case,
you must identify and correct the PPRC path failure. Depending upon some timing
conditions, such an event can result in the corresponding Global Mirror session going into
a fatal state. In this situation, you must manually stop and restart the corresponding Global
Mirror session (by using the rmgmir and mkgmir DSCLI commands) or an equivalent
DS8700 interface.
Each test, other than the re-integration test, begins in the same initial state of the primary site
hosting the ds8kgmrg resource group on the primary node as shown in Example 13-22 on
page 394. Before each test, we start copying data from another file system to the replicated
file systems. After each test, we verify that the service IP address is online and that new data
After each test, we show the Global Mirror states. Example 13-23 shows the normal running
production status of the Global Mirror pairs from each site.
dscli> lssession 26 2E
Date/Time: October 10, 2010 4:00:04 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
==============
26 03 CG In Progress 2600 Active Primary Copy Pending Secondary Simplex True
Disable
2E 03 CG In Progress 2E00 Active Primary Copy Pending Secondary Simplex True
Disable
dscli> lssession 28 2c
Date/Time: October 10, 2010 3:54:58 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
======
28 03 Normal 2800 Join Pending Primary Simplex Secondary Copy Pending True Disable
2C 03 Normal 2C00 Join Pending Primary Simplex Secondary Copy Pending True Disable
In a true maintenance scenario, you might most likely perform a graceful site failover by
stopping the cluster on the local standby node first. Then you stop the cluster on the
production node by using Move Resource Group.
Moving the resource group to another site: In this scenario, because we only have one
node at the Romania site, we use the option to move the resource group to another site. If
multiple remote nodes are members of the resource, use the option to move the resource
group to another node instead.
To perform the resource group move by using SMIT, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path System Management (C-SPOC) Resource Groups and
Applications Move a Resource Group to Another Node / Site Move Resource
Groups to Another Site.
4. Select the Romania site from the next menu as shown in Figure 13-15.
+--------------------------------------------------------------------------+
| Select a Destination Site |
| |
| Move cursor to desired item and press Enter. |
| |
| # *Denotes Originally Configured Primary Site |
| Romania |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
| /=Find n=Find Next |
+--------------------------------------------------------------------------+
Figure 13-15 Selecting a site for a resource group move
Attention: During our testing, a problem was encountered. After performing the first
resource group move between sites, we are unable to move it back due to the pick list
for destination site is empty. We can move it back by node. Later in our testing, the
by-site option started working. However, it moved the resource group to the standby
node at the primary site instead of the original primary node. If you encounter similar
problems, contact IBM support.
Example 13-24 Resource group status after the site move to Romania
-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
ds8kgmrg ONLINE SECONDARY jordan@Texas
OFFLINE leeann@Texas
ONLINE robert@Romania
6. Repeat the resource group move to move it back to its original primary site, Texas, and
node, jordan, to return to the original starting state. However, instead of using the option
to move it another site, use the option to move it to another node.
Example 13-25 shows that the Global Mirror statuses are now swapped, and the local site is
showing the LUNs now as the target volumes.
Example 13-25 Global Mirror status after the resource group move
*******************From node jordan at site Texas***************************
dscli> lssession 26 2E
Date/Time: October 10, 2010 4:04:44 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
======
26 03 Normal 2600 Active Primary Simplex Secondary Copy Pending True Disable
2E 03 Normal 2E00 Active Primary Simplex Secondary Copy Pending True Disable
dscli> lssession 28 2C
Date/Time: October 10, 2010 3:59:25 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
==============
28 03 CG In Progress 2800 Active Primary Copy Pending Secondary Simplex True
Disable
2C 03 CG In Progress 2C00 Active Primary Copy Pending Secondary Simplex True
Disable
Begin with all three nodes active in the cluster and the resource group online on the primary
node as shown in Example 13-22 on page 394.
On the node jordan, we run the reboot -q command. The node leeann acquires the
ds8kgmrg resource group as shown in Example 13-26.
Example 13-27 shows that the statuses are the same as when we started.
dscli> lssession 26 2E
Date/Time: October 10, 2010 4:10:04 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
==============
26 03 CG In Progress 2600 Active Primary Copy Pending Secondary Simplex True
Disable
2E 03 CG In Progress 2E00 Active Primary Copy Pending Secondary Simplex True
Disable
dscli> lssession 28 2c
Date/Time: October 10, 2010 4:04:58 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
28 03 Normal 2800 Join Pending Primary Simplex Secondary Copy Pending True Disable
2C 03 Normal 2C00 Join Pending Primary Simplex Secondary Copy Pending True Disable
Upon the cluster stabilization, we run the reboot -q command on the leeann node invoking a
site_down event. The robert node at the Romania site acquires the ds8kgmrg resource group
as shown in Example 13-28.
You can also see that the replicated pairs are now in the suspended state at the remote site as
shown in Example 13-29.
Tip: Follow these steps “as is” because you can accomplish the same results using various
methods:
1. Verify that the Global Mirror statuses at the primary site are suspended.
2. Fail back PPRC from the secondary site.
3. Verify that the Global Mirror status at the primary site shows the target status.
4. Verify that out-of-sync tracks are 0.
5. Stop the cluster to ensure that the volume group I/O is stopped.
6. Fail over the PPRC on the primary site.
7. Fail back the PPRC on the primary site.
8. Start the cluster.
Example 13-30 Suspended pair status in Global Mirror on the primary site after node restart
*******************From node jordan at site Texas***************************
dscli> lspprc 2600 2e00
Date/Time: October 10, 2010 4:27:48 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status
====================================================================================================
2600:2C00 Suspended Host Source Global Copy 26 60 Disabled True
2E00:2800 Suspended Host Source Global Copy 2E 60 Disabled True
2. On the remote node robert, fail back the PPRC pairs as shown in Example 13-31.
Example 13-32 Verifying that the primary site LUNs are now target LUNs
*******************From node jordan at site Texas***************************
dscli> lspprc 2600 2e00
Date/Time: October 10, 2010 4:44:21 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First
Pass Status
================================================================================================
=========
2800:2E00 Target Copy Pending - Global Copy 28 unknown Disabled Invalid
2C00:2600 Target Copy Pending - Global Copy 2C unknown Disabled Invalid
4. Monitor that the status of replication at the remote site by watching the Out of Sync Tracks
field by using the lspprc -l command. After they are at 0, as shown in Example 13-33,
they are in sync. Then you can stop the remote site in preparation to move production
back to the primary site.
Example 13-33 Verifying that the Global Mirror pairs are back in sync
dscli> lspprc -l 2800 2c00
Date/Time: October 10, 2010 4:22:46 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID State Reason Type Out Of Sync Tracks Tgt Read Src Cascade Tgt Cascade Date
Suspended SourceLSS
===========================================================================================================
============
2800:2E00 Copy Pending - Global Copy 0 Disabled Disabled Invalid -
28
2C00:2600 Copy Pending - Global Copy 0 Disabled Disabled Invalid -
2C 6
Example 13-36 Failing back the PPRC pairs on the primary site
*******************From node jordan at site Texas***************************
dscli> failbackpprc -type gcp 2600:2c00 2E00:2800
Date/Time: October 10, 2010 4:46:49 PM CDT IBM DSCLI Version: 6.5.15.19 DS:
IBM.2107-75DC890
CMUC00197I failbackpprc: Remote Mirror and Copy pair 2600:2C00 successfully failed back.
CMUC00197I failbackpprc: Remote Mirror and Copy pair 2E00:2800 successfully failed back.
Verify the status of the pairs at each site as shown in Example 13-37.
Example 13-37 Global Mirror pairs failed back to the primary site
*******************From node jordan at site Texas***************************
dscli> lspprc 2600 2e00
Date/Time: October 10, 2010 4:47:04 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status
==================================================================================================
2600:2C00 Copy Pending - Global Copy 26 60 Disabled True
2E00:2800 Copy Pending - Global Copy 2E 60 Disabled True
[Entry Fields]
* Start now, on system restart or both now +
Start Cluster Services on these nodes [jordan,leeann,robert] +
* Manage Resource Groups Automatically +
BROADCAST message at startup? true +
Startup Cluster Information Daemon? true +
Ignore verification errors? false +
Automatically correct errors found during Interactively +
cluster start?
Figure 13-16 Restarting a cluster after a site failure
Upon startup of the primary node jordan, the resource group is automatically started on
jordan and back to the original starting point as shown in Example 13-38.
2. Verify the pair and session status on each site as shown in Example 13-39.
Dynamically expanding a volume: This topic does not provide information about
dynamically expanding a volume because this option is not supported.
Important: C-SPOC cannot perform the certain LVM operations on nodes at the
remote site (that contain the target volumes). Such operations include those operations
that require nodes at the target site to read from the target volumes. These operations
cause an error message in C-SPOC. This includes functions such as changing file
system size, changing mount point, and adding LVM mirrors. However, nodes on the
same site as the source volumes can successfully perform these tasks. The changes
can be propagated later to the other site by using a lazy update.
For C-SPOC operations to work on all other LVM operations, perform all C-SPOC
operations with the Global Mirror volume pairs in synchronized or consistent states or
the ACTIVE cluster on all nodes.
+--------------------------------------------------------------------------+
| Physical Volume Names |
| |
| Move cursor to desired item and press Enter. |
| |
| 000a624a987825c8 ( hdisk10 on node robert ) |
| 000a624a987825c8 ( hdisk11 on nodes jordan,leeann ) |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 13-17 Disk selection to add to the volume group
e. Verify the menu information, as shown in Figure 13-18, and press Enter.
[Entry Fields]
VOLUME GROUP name txvg
Resource Group Name ds8kgmrg
Node List jordan,leeann,robert
Reference node robert
VOLUME names hdisk10
Figure 13-18 Add a Volume C-SPOC SMIT menu
Upon completion of the C-SPOC operation, the local nodes have been updated but the
remote node has not been updated as shown in Example 13-40. This node was not updated
because the target volumes are not readable until the relationship is swapped. You receive an
error message from C-SPOC, as shown in the note after Example 13-40. However, the lazy
update procedure at the time of failover pulls in the remaining volume group information.
root@robert: lspv
hdisk2 000a624a833e440f txvg
hdisk6 000a625afe2a4958 txvg
hdisk10 000a624a987825c8 none
Attention: When using C-SPOC to modify a volume group containing a Global Mirror
replicated resource, you can expect to see the following error message:
cl_extendvg: Error executing clupdatevg txvg 000a624a833e440f on node robert
You do not need to synchronize the cluster because all of these changes are made to an
existing volume group. However, consider running a verification.
Logical Volumes
6. Upon completion of the C-SPOC operation, verify that the new logical volume is created
locally on node jordan as shown in Example 13-41.
Similar to when you create the volume group, you see an error message (Figure 13-21) about
being unable to update the remote node.
COMMAND STATUS
jordan: pattilv
cl_mklv: Error executing clupdatevg txvg 000a625afe2a4958 on node robert
Figure 13-21 C-SPOC normal error upon logical volume creation
5. Upon completion of the C-SPOC operation, verify that the new file system size locally on
node jordan has increased from 250 LPAR as shown in Example 13-41 on page 409 to
313 LPAR as shown Example 13-42.
A cluster synchronization is not required, because technically the resources have not
changed. All of the changes were made to an existing volume group that is already a resource
in the resource group.
In this scenario, we re-use the LUNs from the previous section. We removed them from the
volume group and removed the disks for all nodes except the main primary node jordan. In
our process, we cleared the PVID and then assigned a new PVID for a clean start.
Table 13-3 provides a summary of the LUNs that we implemented in each site.
Now continue with the following steps, which are the same as those steps for defining new
LUNs:
1. Run the cfgmgr command on the primary node jordan.
2. Assign the PVID on the node jordan:
chdev -l hdisk11 -a pv=yes
3. Configure the disk and PVID on the local node leeann by using the cfgmgr command.
4. Verify that PVID shows up by using the lspv command.
5. Pause the PPRC on the primary site.
6. Fail over the PPRC to the secondary site.
7. Fail back the PPRC to the secondary site.
8. Configure the disk and PVID on the remote node robert by using the cfgmgr command.
9. Verify that PVID shows up by using the lspv command.
10.Pause the PPRC on the secondary site.
11.Fail over the PPRC to the primary site.
12.Fail back the PPRC to the primary site.
The main difference between adding a new volume group and extending an existing one is
that, when adding a new volume group, you must swap the pairs twice. When extending an
existing volume group, you can get away with only swapping once.
The main difference between adding a new volume group and extending an existing one is
similar to the original setup where we created all LVM components on the primary site and
swap the PPRC pairs to the remote site to import the volume group and then swap it back.
You can avoid performing two swaps, as we showed, by not choosing to include the third node
when creating the volume group. Then you can swap the pairs, run cfgmgr on the new disk
with the PVID, import the volume group, and swap the pairs back.
Volume Groups
+--------------------------------------------------------------------------+
| Node Names |
| |
| Move cursor to desired item and press F7. |
| ONE OR MORE items can be selected. |
| Press Enter AFTER making all selections. |
| |
| > jordan |
| > leeann |
| > robert |
| |
| |
| F1=Help F2=Refresh F3=Cancel |
| F7=Select F8=Image F10=Exit |
F1| Enter=Do /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 13-23 Adding a volume group node pick list
Volume Groups
Volume Groups
6. Select the proper resource group. We select ds8kgmrg as shown in Figure 13-26.
You can also use the C-SPOC CLI commands (Example 13-43). These commands are in the
/usr/es/sbin/cluster/cspoc directory, and all begin with the cli_ prefix. Similar to the SMIT
menus, their operation output is also saved in the cspoc.log file.
Upon completion of the C-SPOC operation, the local nodes are updated, but the remote node
is not as shown in Example 13-44. The remote nodes are not updated because the target
volumes are not readable until the relationship is swapped. You see an error message from
C-SPOC as shown in the note following Example 13-44. After you create all LVM structures,
you swap the pairs back to the remote node and import the new volume group and logical
volume.
Attention: When using C-SPOC to add a new volume group that contains a Global Mirror
replicated resource, you might see the following error message:
cl_importvg: Error executing climportvg -V 51 -c -y princessvg -Q
000a624a9bb74ac3 on node robert
While this message is normal, if you select any remote nodes, you can omit the remote
nodes and then you do not see the error message. This step is allowed because you
manually import it anyway.
When creating the volume group, it usually is automatically added to the resource group as
shown in Example 13-45 on page 416. However, with the error message indicted in the
previous attention box, it might not be automatically added. Therefore, double check that the
volume group is added into the resource group before continuing. Otherwise we do not have
to change the resource group any further. The new LUN pairs are added to the same storage
subsystems and the same session (3) that is already defined in the mirror group texasmg.
Example 13-46 New logical volume on the newly added volume group
root@jordan: lsvg -l princessvg
princessvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
princesslv raw 38 38 1 closed/syncd N/A
[Entry Fields]
* Verify changes only? [No] +
* Logging [Standard] +
Upon completion, the cluster configuration is synchronize and can now be tested.
Ideally the connectivity is through redundant links, switches, and fabrics to the hosts and
between the storage units themselves.
14.1.3 Considerations
Keep in mind the following considerations for mirroring PowerHA SystemMirror Enterprise
Edition with TrueCopy/HUR:
AIX Virtual SCSI is not supported in this initial release.
Logical Unit Size Expansion (LUSE) for Hitachi is not supported.
Only fence-level NEVER is supported for synchronous mirroring.
Only HUR is supported for asynchronous mirroring.
The dev_name must map to a logical devices, and the dev_group must be defined in the
HORCM_LDEV section of the horcm.conf file.
The PowerHA SystemMirror Enterprise Edition TrueCopy/HUR solution uses dev_group
for any basic operation, such as the pairresync, pairevtwait, or horctakeover operation.
If several dev_names are in a dev_group, the dev_group must be enabled for consistency.
PowerHA SystemMirror Enterprise Edition does not trap Simple Network Management
Protocol (SNMP) notification events for TrueCopy/HUR storage. If a TrueCopy link goes
down when the cluster is up and later the link is repaired, you must manually
resynchronize the pairs.
The creation of pairs is done outside the cluster control. You must create the pairs before
you start the cluster services.
Resource groups that are managed by PowerHA SystemMirror Enterprise Edition cannot
contain volume groups with both TrueCopy/HUR-protected and
non-TrueCopy/HUR-protected disks.
All nodes in the PowerHA SystemMirror Enterprise Edition cluster must use same horcm
instance.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 421
You cannot use Cluster Single Point Of Control (C-SPOC) for the following Logical Volume
Manager (LVM) operations to configure nodes at the remote site that contain the target
volume:
– Creating a volume group
– Operations that require nodes at the target site to write to the target volumes
For example, changing the file system size, changing the mount point, or adding LVM
mirrors cause an error message in C-SPOC. However, nodes on the same site as the
source volumes can successfully perform these tasks. The changes are then
propagated to the other site by using a lazy update.
C-SPOC on other LVM operations: For C-SPOC operations to work on all other LVM
operations, perform all C-SPOC operations when the cluster is active on all PowerHA
SystemMirror Enterprise Edition nodes and the underlying TrueCopy/HUR PAIRs are in
a PAIR state.
Important: You must install the Hitachi CCI software into the /HORCM/usr/bin directory.
Otherwise, you must create a symbolic link to this directory.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 423
6. Verify installation of the proper version by using the raidqry command:
# raidqry -h
Model: RAID-Manager/AIX
Ver&Rev: 01-23-03/06
Usage: raidqry [options] for HORC
Important: Do not edit the configuration definition file while HORCM is running. Shut down
HORCM, edit the configuration file as needed, and then restart HORCM.
You might have multiple CCI instances, each of which uses its own specific horcm#.conf file.
For example, instance 0 might be horcm0.conf, instance 1 (Example 14-1) might be
horcm1.conf, and so on. The test scenario presented later in this chapter uses instance 2 and
provides examples of the horcm2.conf file on each cluster node.
HORCM_CMD
#dev_name => hdisk of Command Device
#UnitID 0 (Serial# eg. 45306)
/dev/hdisk19
HORCM_DEV
#Map dev_grp to LDEV#
#dev_group dev_name port# TargetID LU# MU#
VG01 test01 CL1-B 1 5 0
VG01 work01 CL1-B 1 24 0
VG01 work02 CL1-B 1 25 0
HORCM_INST
#dev_group ip_address service
VG01 10.15.11.195 horcm1
HORCM_CMD
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 425
#dev_name => hdisk of Command Device
#UnitID 0 (Serial# eg. 45306)
/dev/hdisk19
HORCM_DEV
#Map dev_grp to LDEV#
#dev_group dev_name port# TargetID LU# MU#
VG01 test01 CL1-B 1 5 0
VG01 work01 CL1-B 1 21 0
VG01 work02 CL1-B 1 22 0
HORCM_INST
#dev_group ip_address service
VG01 10.15.11.194 horcm1
NOTE 1: For the horcm instance to use any available command device, in case one of
them fails, it is RECOMMENDED that, in your horcm file, under HORCM_CMD
section, the command device, is presented in the format below,
where 10133 is the serial # of the array:
\\.\CMD-10133:/dev/hdisk/
For example:
NOTE 2: The Device_File will show "-----" for the "pairdisplay -fd" command,
which will also cause verification to fail, if the ShadowImage license
has not been activated on the storage system and the MU# column is not
empty.
It is therefore recommended that the MU# column be left blank if the
ShadowImage license is NOT activated on the storage system.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 427
Austin Miami
FC Links
Each site consists of two type Ethernet networks. In this case, both networks are used for a
public Ethernet and for cross-site networks. Usually the cross-site network is on separate
segments and is an XD_ip network. It is also common to use site-specific service IP labels.
Example 14-2 shows the interlace list from the cluster topology.
The krod and bina nodes at the Miami site have two disks, hdisk38 and hdisk39. These disks
are the secondary target volumes for the TrueCopy synchronous replication of the truesyncvg
volume group from the Austin site. The other two disks, hdisk40 and hdisk41, are to be used
as the primary source volumes for the ursasyncvg volume group that uses HUR for
asynchronous replication.
For each of these tasks, the Hitachi storage units have been added to the SAN fabric and
zoned appropriately. Also, the host groups have been created for the appropriate node
adapters, and the LUNs have been created within the storage unit.
To begin, the Hitachi USP-V storage unit is at the Austin site. The host group, JessBina, is
assigned to port CL1-E on the Hitachi storage unit with the serial number 45306. Usually the
host group is assigned to multiple ports for full multipath redundancy.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 429
Figure 14-2 Assigning LUNs to the Austin site nodes2
2. In the path verification window (Figure 14-3), check the information and record the LUN
number and LDEV numbers. You use this information later. However, you can also retrieve
this information from the AIX system after the devices are configured by the host. Click
OK.
You have completed assigning four more LUNs for the nodes at the Austin site. However the
lab environment already had several LUNs, including both command and journaling LUNs in
the cluster nodes. These LUNs were added solely for this test scenario.
Important: If these LUNs are the first ones to be allocated to the hosts, you must also
assign the command LUNs. See the appropriate Hitachi documentation as needed.
For the storage unit at the Miami site, repeat the steps that you performed for the Austin site.
The host group, KrodMaddi, is assigned to port CL1-B on the Hitachi UPS-VM storage unit
with the serial number 35764. Usually the host group is assigned to multiple ports for full
multipath redundancy. Figure 14-5 on page 432 shows the result of these steps.
Again record both the LUN numbers and LDEV numbers so that you can easily refer to them
as needed when creating the replicated pairs. The numbers are also required when you add
the LUNs into device groups in the appropriate horcm.conf file.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 431
Figure 14-5 Miami site LUNs assigned5
You must know exactly which LUNs from each storage unit will be paired together. They must
be the same size. In this case, all of the LUNs that are used are 2 GB in size. The pairing of
LUNs also uses the LDEV numbers. The LDEV numbers are hexadecimal values that also
show up as decimal values on the AIX host.
LUN number LDEV-HEX LDEV-DEC number LUN number LDEV-HEX LDEV-DEC number
Although the pairing can be done by using the CCI, the example in this section shows how to
create the replicated pairs through the Hitachi Storage Navigator. The appropriate commands
are in the /HORCM/usr/bin directory. In this scenario, none of the devices have been
configured to the AIX cluster nodes.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 433
2. In the TrueCopy Pair Operation window (Figure 14-7), select the appropriate port, CL-1E,
and find the specific LUNs to use (00-00A and 00-00B).
In this scenario, we have predetermined that we want to pair these LUNs with 00-01C and
00-01D from the Miami Hitachi storage unit on port CL1-B. Notice in the occurrence of
SMPL in the Status column next to the LUNs. SMPL indicates simplex, meaning that no
mirroring is being used with that LUN.
3. Right-click the first Austin LUN (00-00A), and select Paircreate Synchronize
(Figure 14-7).
6
Courtesy of Hitachi Data Systems
7 Courtesy of Hitachi Data Systems
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 435
6. After you complete the pairing selections, on the Pair Operation tab, verify that the
information is correct and click Apply to apply them all at one time.
Figure 14-9 shows both of the source LUNs in the middle of the pane. It also shows an
overview of which remote LUNs they are to be paired with.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 437
After the copy has completed, the status is displayed as PAIR as shown in Figure 14-11. You
can also view this status from the management interface of either one of the storage units.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 439
2. In the Universal Replicator Pair Operation window (Figure 14-13), select the appropriate
port CL-1B and find the specific LUNs that you want to use, which are 00-01E and 00-01F
in this example). We have already predetermined that we want to pair these LUNs with
00-0C and 00-00D from the Austin Hitachi storage unit on port CL1-E.
Right-click one of the desired LUNs and select Paircreate.
Important: If these are the first Universal Replicator LUNs to be allocated, you must
also assign journaling groups and LUNs for both storage units. Refer to the
appropriate Hitachi Universal Replicator documentation as needed.
We chose ones that have been already previously created in the environment.
d. Click Set
e. Repeat these steps for the second LUN pairing.
Figure 14-14 shows details of the two pairings.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 441
4. After you complete the pairing selections, on the Pair Operation tab, verify that the
information is correct and click Apply to apply them all at one time.
When the pairing is established, the copy automatically begins to synchronize with the
remote LUNs at the Austin site. The status changes to COPY, as shown in Figure 14-15,
until the pairs are in sync. After the pairs are synchronized, their status changes to PAIR.
In the test environment, we already have hdisk0-37 on each of the four cluster nodes. After
running the cfgmgr command one each node, one at a time, we now have four additional
disks, hdisk38-hdisk41, as shown in Example 14-3.
Although the LUN and LDEV numbers were written down during the initial LUN assignments,
you must identify the correct LDEV numbers of the Hitachi disks and the corresponding AIX
hdisks by performing the following steps:
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 443
1. On the PowerHA SystemMirror Enterprise Edition nodes, select the Hitachi disks and the
disks that will be used in the TrueCopy/HUR relationships by running the inqraid
command. Example 14-4 shows hdisk38-hdisk41, which are the Hitachi disks that we just
added.
2. Edit the HORCM LDEV section in the horcm#.conf file to identify the dev_group that will be
managed by PowerHA SystemMirror Enterprise Edition. In this example, we use the
horcm2.conf file.
Hdisk38 (ldev 272) and hdisk39 (ldev 273) are the pair for the synchronous replicated
resource group, which is primary at the Austin site. Hdisk40 (ldev 275) and hdisk41
(ldev276) are the pair for an asynchronous replicated resource, which is primary at the
Miami site.
Specify the device groups (dev_group) in the horcm#.conf file. We are using dev_group
htcdg01 with dev_names htcd01 and htcd02 for the synchronous replicated pairs. For the
asynchronous pairs, we are using dev_group hurdg01 and dev_names hurd01 and hurd02.
The device group names are needed later when checking that status of the replicated
pairs and when defining the replicated pairs as a resource for PowerHA Enterprise Edition
to control.
Important: Do not edit the configuration definition file while HORCM is running. Shut
down HORCM, edit the configuration file as needed, and then restart HORCM.
Example 14-5 shows the horcm2.conf file from the jessica node, at the Austin site.
Because two nodes are at the Austin site, the same updates were performed to the
/etc/horcm2.conf file on the bina node. Notice that you can use either the decimal value
of the LDEV or the hexidecimal value.
We specifically did one pair each way just to show it and to demonstrate that it works.
Although several groups were already defined, only those that are relevant to this scenario
are shown.
Example 14-5 Horcm2.conf file used for the Austin site nodes
root@jessica:
/etc/horcm2.conf
HORCM_MON
#Address of local node...
#ip_address service poll(10ms) timeout(10ms)
HORCM_CMD
#hdisk of Command Device...
#dev_name dev_name dev_name
#UnitID 0 (Serial# 45306)
#/dev/rhdisk10
\\.\CMD-45306:/dev/rhdisk10 /dev/rhdisk14
HORCM_LDEV
#Map dev_grp to LDEV#...
#dev_group dev_name Serial# CU:LDEV MU# siteA siteB
# (LDEV#) hdisk -> hdisk
#--------- --------- ------- -------- --- --------------------
htcdg01 htcd01 45306 272
htcdg01 htcd02 45306 273
hurdg01 hurd01 45306 01:12
hurdg01 hurd02 45306 01:13
For the krod and maddi nodes at the Miami site, the dev_groups, dev_names, and the
LDEV numbers are the same. The difference is the specific serial number of the storage
unit at that site. Also, the remote system or IP address for the appropriate system in the
Austin site.
Example 14-6 shows the horcm2.conf file that we used for both nodes in the Miami site.
Notice that, for the ip_address fields, fully qualified names are used instead of the IP
address. As long as these names are resolvable, the format is still valid. However, the
format is seen using the actual addresses as shown in Example 14-1 on page 425.
Example 14-6 The horcm2.conf file used for the nodes in the Miami site
root@krod:
horcm2.conf
HORCM_MON
#Address of local node...
#ip_address service poll(10ms) timeout(10ms)
r9r3m13.austin.ibm.com 52323 1000 3000
HORCM_CMD
#hdisk of Command Device...
#dev_name dev_name dev_name
#UnitID 0 (Serial# 35764)
#/dev/rhdisk10
# /dev/hdisk19
\\.\CMD-45306:/dev/rhdisk11 /dev/rhdisk19
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 445
hurdg01 hurd02 35764 01:0F
# Address of remote node for each dev_grp...
HORCM_INST
#dev_group ip_address service
htcdg01 bina.austin.ibm.com 52323
hurdg01 bina.austin.ibm.com 52323
3. Map the TrueCopy-protected hdisks to the TrueCopy device groups by using the raidscan
command. In the following example, 2 is the HORCM instance number:
lsdev -Cc disk|grep hdisk | /HORCM/usr/bin/raidscan -IH2 -find inst
The -find inst option of the raidscan command registers the device file name (hdisk) to
all mirror descriptors of the LDEV map table for HORCM. This option also permits the
matching volumes on the horcm.conf file in protection mode and is started automatically
by using the /etc/horcmgr command. Therefore you do not need to use this option
normally. This option is terminated to avoid wasteful scanning when the registration has
been finished based on HORCM.
Therefore, if HORCM no longer needs the registration, then no further action is taken and
it exits. You can use the -find inst option with the -fx option to view LDEV numbers in
the hexadecimal format.
4. Verify that the PAIRs are established by running either the pairvdisplay command or the
pairvolchk command against the device groups htcdg01 and hurdg01.
Example 14-7 shows how we use the pairvdisplay command. For device group htcdg01,
the status of PAIR and fence of NEVER indicates that they are a synchronous pair. For
device group hurdg01, the ASYNC fence option clearly indicates that it is in an
asynchronous pair. Also notice that the CTG field shows the consistency group number for
the asynchronous pair managed by HUR.
Example 14-7 The pairdisplay command to verify that the pair status is synchronized
# pairdisplay -g htcdg01 -IH2 -fe
Group PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M CTG JID AP
htcdg01 htcd01(L) (CL1-E-0, 0, 10)45306 272.P-VOL PAIR NEVER ,35764 268 - - - 1
htcdg01 htcd01(R) (CL1-B-0, 0, 28)35764 268.S-VOL PAIR NEVER ,----- 272 - - - -
htcdg01 htcd02(L) (CL1-E-0, 0, 11)45306 273.P-VOL PAIR NEVER ,35764 269 - - - 1
htcdg01 htcd02(R) (CL1-B-0, 0, 29)35764 269.S-VOL PAIR NEVER ,----- 273 - - - -
To show the output in Example 14-7, we removed the last three columns of the output
because it was not relevant to what we are checking.
Otherwise, if you are using Storage Navigator, see 14.4.2, “Creating replicated pairs” on
page 432.
root@jessica:lsvg -l truesyncvg
lsvg -l truesyncvg
truesyncvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
oreolv jfs2 125 125 1 closed/syncd /oreofs
majorlv jfs2 125 125 1 closed/syncd /majorfs
truefsloglv jfs2log 1 1 1 closed/syncd N/A
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 447
We create the ursasyncvg big volume group on the krod node where the primary LUNs
are located. We also create the logical volumes, jfslog, and file systems as shown in
Example 14-9.
root@krod:lsvg ursasyncvg
VOLUME GROUP: ursasyncvg VG IDENTIFIER:
00cb14ce00004c000000012b5676b11e
VG STATE: active PP SIZE: 4 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 1018 (4072 megabytes)
MAX LVs: 512 FREE PPs: 596 (2384 megabytes)
LVs: 3 USED PPs: 422 (1688 megabytes)
OPEN LVs: 3 QUORUM: 2 (Enabled)
TOTAL PVs: 2 VG DESCRIPTORS: 3
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 2 AUTO ON: no
MAX PPs per VG: 130048
MAX PPs per PV: 1016 MAX PVs: 128
LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
root@krod:lsvg -l ursasyncvg
ursasyncvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
ursfsloglv jfs2log 2 2 1 closed/syncd N/A
hannahlv jfs2 200 200 1 closed/syncd /hannahfs
julielv jfs2 220 220 1 closed/syncd /juliefs
2. Vary off the newly created volume groups by running the varyoffvg command. To import
the volume groups onto the other three systems, the pairs must be in sync.
We execute the pairresync command as shown in Example 14-10 on the local disks and
make sure that they are in the PAIR state. This process verifies that the local disk
information has been copied to the remote storage. Notice that the command is being run
on the respective node that contains the primary source LUNs and where the volume
groups are created.
Verify that the pairs are in sync with the pairdisplay command as shown in Example 14-7
on page 446.
To verify that the pairs are split, check the status by using the pairdisplay command.
Example 14-12 shows that the pairs are in a suspended state.
4. To import the volume groups on the remaining nodes, ensure that the PVID is present on
the disks by using one of the following options:
– Run the rmdev -dl command for each hdisk and then run the cfgmgr command.
– Run the appropriate chdev command against each disk to pull in the PVID.
As shown in Example 14-13, we use the chdev command on each of the three additional
nodes.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 449
5. Verify that the PVIDs are correctly showing on each system by running the lspv command
as shown in Example 14-14. Because all four of the nodes have the exact hdisk
numbering, we show the output only from one node, the bina node.
6. Import the volume groups on each node as needed by using the importvg command.
Specify the major number that you used earlier.
7. Disable both the auto varyon and quorum settings of the volume groups by using the chvg
command.
8. Vary off the volume group as shown in Example 14-15.
9. Re-establish the pairs that you split in step 3 on page 449 by running the pairresync
command again as shown in Example 14-10 on page 448.
10.Verify again if they are in sync by using the pairdisplay command as shown in
Example 14-7 on page 446.
In these steps, the cluster topology has been configured, including all four nodes, both sites,
and networks.
In this configuration, we created two replicated resources. One resource is for the
synchronous device group, htcdg01, named trulee. The second resource for the
asynchronous device group, hurdg01, named ursasyncRR. Example 14-16 shows both of the
replicated resources.
[Entry Fields]
* TRUECOPY(R)/HUR Resource Name [truelee]
* TRUECOPY(R)/HUR Mode SYNC +
* Device Groups [htcdg01] +
* Recovery Action AUTO +
* Horcm Instance [horcm2]
* Horctakeover Timeout Value [300] #
* Pairevtwait Timeout Value [3600] #
[Entry Fields]
* TRUECOPY(R)/HUR Resource Name [ursasyncRR]
* TRUECOPY(R)/HUR Mode ASYNC +
* Device Groups [hurdg01] +
* Recovery Action AUTO +
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 451
* Horcm Instance [horcm2]
* Horctakeover Timeout Value [300] #
* Pairevtwait Timeout Value [3600] #
For a complete list of all of defined TrueCopy/HUR replicated resources, run the cllstc
command, which is in the /usr/es/sbin/cluster/tc/cmds directory. Example 14-17 shows
the output of the cllstc command.
Example 14-17 The cllstc command to list the TrueCopy/HUR replicated resources
root@jessica: cllstc -a
Name CopyMode DeviceGrps RecoveryAction HorcmInstance HorcTimeOut PairevtTimeout
truelee SYNC htcdg01 AUTO horcm2 300 3600
ursasyncRR ASYNC hurdg01 AUTO horcm2 300 3600
Important: You cannot mix regular (non-replicated) volume groups and TrueCopy/HUR
replicated volume groups in the same resource group.
Press Enter.
In this scenario, we changed an existing resource group, emlecRG, for the Austin site and
specifically chose a site relationship, also known as an Inter-site Management Policy of Prefer
Primary Site. We added a new resource group, valhallarg, for the Miami site and chose to
use the same site relationship. We also added the additional nodes from each site. We
configured both to failover locally within a site and failover between sites. If a site failure
occurs, the node falls over to the remote site standby node, but never to the remote
production node.
Figure 14-17 Error messages found during TrueCopy/HUR replicated resource verification
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 453
Synchronizing the cluster configuration
You must verify the PowerHA SystemMirror Enterprise Edition cluster and the TrueCopy/HUR
configuration before you can synchronize the cluster. To propagate the new TrueCopy/HUR
configuration information and the additional resource group that were created across the
cluster, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select Extended Configuration Extended Verification and
Synchronization.
3. In the Verify Synchronize or Both field select Synchronize. In the Automatically correct
errors found during verification field select No. Press Enter.
The output is displayed in the SMIT Command Status window.
These scenarios do not entail performing a redundancy test with the IP networks. Instead you
configure redundant IP or non-IP communication paths to avoid isolation of the sites. The loss
of all the communication paths between sites leads to a partitioned state of the cluster and to
data divergence between sites if the replication links are also unavailable.
Another specific failure scenario is the loss of the replication paths between the storage
subsystems while the cluster is running on both sites. To avoid this situation, configure
redundant communication links for TrueCopy/HUR replication. You must manually recover the
status of the pairs after the storage links are operational again.
Important: PowerHA SystemMirror Enterprise Edition does not trap SNMP notification
events for TrueCopy/HUR storage. If a TrueCopy link goes down when the cluster is up and
the link is repaired later, you must manually resynchronize the pairs.
This topic explains how to perform the following tests for each site and resource group:
Graceful site failover for the Austin site
Rolling site failure of the Austin site
Site re-integration for the Austin site
Graceful site failover for the Miami site
Rolling site failure of the Miami site
Site re-integration for the Miami site
Before each test, we start copying data from another file system to the replicated file systems.
After each test, we verify that the site service IP address is online and new data is in the file
systems. We also had a script that inserts the current time and date into a file on each file
system. Because of the small amounts of I/O in our environment, we were unable to
determine to have lost any data in the asynchronous replication either.
In a true maintenance scenario, you most likely perform this task by stopping the cluster on
the local standby node first. Then you stop the cluster on the production node by using the
Move Resource Group. You perform the following operations during this move:
Releasing the primary online instance of emlecRG at the Austin site
– Executes application server stop
– Unmounts the file systems
– Varies off the volume group
– Removes the service IP address
Releasing the secondary online instance of emlecRG at the Miami site.
Acquire the emlecRG resource group in the secondary online state at Austin site.
Acquire the emlecRG resource group in the online primary state at the Miami site.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 455
3. In the Move a Resource Group to Another Node / Site panel (Figure 14-18), select the
ONLINE instance of the emlecRG resource group to be moved.
4. In the Select a Destination Site panel, select the Miami site as shown in Figure 14-19.
+--------------------------------------------------------------------------+
| Select a Destination Site |
| |
| Move cursor to desired item and press Enter. |
| |
| # *Denotes Originally Configured Primary Site |
| Miami |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 14-19 Selecting the site for resource group move
Example 14-20 Resource group status after a move to the Miami site
root@maddi# clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
emlecRG ONLINE SECONDARY jessica@Austin
OFFLINE bina@Austin
ONLINE maddi@Miami
6. Repeat the resource group move to move it back to its original primary site and node to
return to the original starting state.
Attention: In our environment, after the first resource group move between sites, we were
unable to move the resource group back without leaving the pick list for the destination site
empty. However, we were able to move it back by node, instead of by site. Later in our
testing, the by-site option started working, but it moved it to the standby node at the
primary site instead of the original primary node. If you encounter similar problems, contact
IBM support.
To begin, all four nodes are active in the cluster and the resource groups are online on the
primary node as shown in Example 14-19 on page 455.
1. On the jessica node, run the reboot -q command. The bina node acquires the emlecRG
resource group as shown in Example 14-21.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 457
valhallarg ONLINE krod@Miami
OFFLINE maddi@Miami
ONLINE SECONDARY bina@Austin
2. Run the pairdisplay command (as shown in Example 14-22) to verify that the pairs are
still established because the volume group is still active on the primary site.
3. Upon cluster stabilization, run the reboot -q command on the bina node. The maddi node
at the Miami site acquires the emlecRG resource group as shown in Example 14-23.
4. Verify that the replicated pairs are now in the suspended state from the command line as
shown in Example 14-24.
Important: Although our testing resulted in a site_down event, we never lost access to
the primary storage subsystem. In a true site failure, including loss of storage,
re-establish the replicated pairs, and synchronize them before moving back to the
primary site. If you must change the storage LUNs, modify the horcm.conf file, and use
the same device group and device names. You do not have to change the cluster
resource configuration.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 459
Important: The resource group settings of the Inter-site Management Policy, also known
as the site relationship, dictate the behavior of what occurs upon re-integration of the
primary node. Because we chose Prefer Primary Site, the automatic fallback occurred.
Initially we are unable to restart the cluster on the jessica node because of verification errors
at startup, which are similar to the errors shown in Figure 14-17 on page 453. Of the two
possible reasons for these errors, one reason is that we failed to include starting the horcm
instance on bootup. The second is reason is that we also had to re-map the copy protected
device groups by running the raidscan command again.
Important: Always ensure that the horcm instance is running before rejoining a node into
the cluster. In some cases, if all instances, cluster nodes, or both have been down, you
might need to run the raidscan command again.
Example 14-25 Resource group status after moving to the Austin site
root@bina: clRGinfo
Group Name Group State Node
-----------------------------------------------------------------------------
emlecRG ONLINE jessica@Austin
OFFLINE bina@Austin
ONLINE SECONDARY maddi@Miami
6. Repeat these steps to move a resource group back to the original primary krod node at
the Miami site.
Attention: In our environment, after the first resource group move between sites, we were
unable to move the resource group back without leaving the pick list for the destination site
empty. However, we were able to move it back by node, instead of by site. Later in our
testing, the by-site option started working, but it moved it to the standby node at the
primary site instead of the original primary node. If you encounter similar problems, contact
IBM support.
To begin, all four nodes are active in the cluster, and the resource groups are online on the
primary node as shown in Example 14-19 on page 455. Follow these steps:
1. On the krod node, run the reboot -q command. The maddi node brings the valhallaRG
resource group online, and the remote bina node maintains the online secondary status as
shown in Example 14-26. This time the failover time was noticeably longer, specifically in
the fsck portion. The longer amount of time is most likely a symptom of the asynchronous
replication.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 461
2. Run the pairdisplay command as shown in Example 14-27 to verify that the pairs are still
established because the volume group is still active on the primary site.
Example 14-27 Status using the pairdisplay command after the local Miami site fallover
root@maddi: pairdisplay -fd -g hurdg01 -IH2 -CLI
Group PairVol L/R Device_File Seq# LDEV# P/S Status Fence Seq# P-LDEV# M
hurdg01 hurd01 L hdisk40 35764 270 P-VOL PAIR ASYNC 45306 274 -
hurdg01 hurd01 R hdisk40 45306 274 S-VOL PAIR ASYNC - 270 -
hurdg01 hurd02 L hdisk41 35764 271 P-VOL PAIR ASYNC 45306 275 -
hurdg01 hurd02 R hdisk41 45306 275 S-VOL PAIR ASYNC - 271 -
3. Upon cluster stabilization, run the reboot -q command on the maddi node. The bina node
at the Austin sites acquires the valhallaRG resource group as shown in Example 14-28.
Important: Although our testing resulted in a site_down event, we never lost access to
the primary storage subsystem. In a true site failure, including loss of storage,
re-establish the replicated pairs, and synchronize them before moving back to the
primary site. If you must change the storage LUNs, modify the horcm.conf file, and use
the same device group and device names. You do not have to change the cluster
resource configuration.
Important: The resource group settings of the Inter-site Management Policy, also known
as the site relationship, dictate the behavior of what occurs upon re-integration of the
primary node. Because we chose Prefer Primary Site policy, the automatic fallback
occurred.
Initially we are unable to restart the cluster on the jessica node because of verification errors
at startup, which are similar to the errors shown in Figure 14-17 on page 453. Of the two
possible reasons for these errors, the first reason is that we failed to include starting the horcm
instance on bootup. The second is reason is that we also had to re-map the copy protected
device groups by running the raidscan command again.
Important: This topic does not explain how to dynamically expand a volume through
Hitachi Logical Unit Size Expansion (LUSE) because this option is not supported.
CU 01 CU 01
Then follow the same steps from of defining new LUNs as follows:
1. Run the cfgmgr command on the primary node jessica.
2. Assign the PVID on the jessica node.
chdev -l hdisk42 -a pv=yes
3. Run the pairsplit command on the replicated LUNs.
4. Run the cfgmgr command on each of the remaining three nodes.
5. Verify that the PVID shows up on each node by using the lspv command.
6. Run the pairresync command on the replicated LUNs.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 463
7. Shut down the horcm2 instance on each node:
/HORCM/usr/bin/horcmshutdown.sh 2
8. Edit the /etc/horcm2.conf file on each node as appropriate for each site:
– The krod and maddi nodes on the Miami site added the following new line:
htcdg01 htcd03 35764 01:1F
– The jessica and bina nodes on the Austin site added the following new line:
htcdg01 htcd03 45306 01:14
9. Restart horcm2 instance on each node:
/HORCM/usr/bin/horcmstart.sh 2
10.Map the devices and device group on any node:
lsdev -Cc disk|grep hdisk|/HORCM/usr/bin/raidscan -IH2 -find inst
We ran this command on the jessica node.
11.Verify that the htcgd01 device group pairs are now showing the new pairs, which consist of
hdisk42 on each system as shown in Example 14-29.
You are now ready to use C-SPOC to add the new disk into the volume group:
Important: You cannot use C-SPOC for the following LVM operations to configure nodes at
the remote site that contain the target volume:
Creating a volume group
Operations that require nodes at the target site to write to the target volumes
For example, changing the file system size, changing the mount point, or adding LVM
mirrors cause an error message in C-SPOC. However, nodes on the same site as the
source volumes can successfully perform these tasks. The changes are then
propagated to the other site by using a lazy update.
For C-SPOC operations to work on all other LVM operations, perform all C-SPOC
operations with the (TrueCopy/HUR) volume pairs in the Synchronized or Consistent states
or the cluster ACTIVE on all nodes.
5. Verify the menu information, as shown in Figure 14-22, and press Enter.
[Entry Fields]
VOLUME GROUP name truesyncvg
Resource Group Name emlecRG
Node List bina,jessica,krod,mad>
Reference node bina
VOLUME names hdisk42
Figure 14-22 Adding a volume to a volume group
The krod node does not need the volume group because it is not a member of the resource
group. However, we started with all four nodes seeing all volume groups and decided to leave
the configuration that way. This way we have additional flexibility later if we need to change
the cluster configuration to allow the krod node to take over as a last resort.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 465
Upon completion of the C-SPOC operation, all four nodes now have the new disk as a
member of the volume group as shown in Example 14-30.
Example 14-30 New disk added to the volume group on all nodes
root@jessica: lspv |grep truesyncvg
hdisk38 00cb14ce564c3f44 truesyncvg active
hdisk39 00cb14ce564c40fb truesyncvg active
hdisk42 00cb14ce74090ef3 truesyncvg active
We do not need to synchronize the cluster because all of these changes are made to an
existing volume group. However, you might want to run the cl_verify_tc_config command to
verify the resources replicated correctly.
Logical Volumes
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 467
6. Upon completion of the C-SPOC operation, verify that the new logical was created locally
on the jessica node as shown Example 14-31.
You do not need to synchronize the cluster because all of these changes are made to an
existing volume group. However, you might want to make sure that the replicated resources
verify correctly. Use the cl_verify_tc_config command first to isolate the replicated
resources specifically.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 469
10.Map the devices and device group on any node. We ran the raidscan command on the
jessica node. See Table 14-3 for additional configuration details.
lsdev -Cc disk|grep hdisk|/HORCM/usr/bin/raidscan -IH2 -find inst
Table 14-3 Details on the Austin and Miami LUNs
Austin - Hitachi USPV - 45306 Miami - Hitachi USPVM - 35764
CU 00 CU 00
11.Verify that the htcgd01 device group pairs are now showing the new pairs that consist of
hdisk42 on each system as shown in Example 14-33.
Volume Groups
+--------------------------------------------------------------------------+
| Node Names |
| |
| Move cursor to desired item and press F7. |
| ONE OR MORE items can be selected. |
| Press Enter AFTER making all selections. |
| |
| > bina |
| > jessica |
| > krod |
| > maddi |
| |
| F1=Help F2=Refresh F3=Cancel |
| F7=Select F8=Image F10=Exit |
F1| Enter=Do /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 14-26 Selecting a volume group node
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 471
4. In the Physical Volume Names panel (Figure 14-27), select hdisk43.
Volume Groups
Volume Groups
6. In the Create a Scalable Volume Group panel, select the proper resource group. We chose
emlecRG as shown in Figure 14-29.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 473
8. Verify that the volume group is successfully created, which we do on all four nodes as
shown in Example 14-34.
When creating the volume group, the volume group is automatically added to the resource
group as shown in Example 14-35. However, we do not have to change the resource
group any further, because the new disk and device are added to the same device group
and TrueCopy/HUR replicated resource.
Example 14-35 Newly added volume group also added to the resource group
Resource Group Name emlecRG
Participating Node Name(s) jessica bina maddi
Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node
Fallback Policy Never Fallback
Site Relationship Prefer Primary Site
Node Priority
Service IP Label service_1
Volume Groups truesyncvg truetarahvg
Hitachi TrueCopy Replicated Resources truelee
9. Repeat the steps in 14.6.2, “Adding a new logical volume” on page 466, to create a new
logical volume, named tarahlv on the newly created volume group truetarahvg.
Example 14-36 shows the new logical volume.
10.Manually run the cl_verify_tc_config command to verify that the new addition of the
replicated resources is complete.
These results incorrectly imply a one to one relationship between the device
group/replicated resource and the volume group, which is not intended. To work around
this problem, ensure that the cluster is down, do a forced synchronization, and then start
the cluster but ignore the verification errors. Usually performing both a forced
synchronization and then starting the cluster ignoring errors is not recommended. Contact
IBM support to see if a fix is available.
Synchronize the resource group change to include the new volume that you just added.
Usually you can perform this task within a running cluster. However, because of the defect
mentioned in the previous Important box, we had to have the cluster down to synchronize it.
To perform this task, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration Extended Verification and
Synchronization and Verification
3. In the HACMP Verification and Synchronization display (Figure 14-30), for Force
synchronization if verification fails, select Yes.
[Entry Fields]
* Verify, Synchronize or Both [Both] +
* Automatically correct errors found during [No] +
verification?
4. Verify the information is correct, and press Enter. Upon completion, the cluster
configuration is in sync and can now be tested.
5. Repeat the steps for a rolling system failure as explained in 14.5.2, “Rolling site failure of
the Austin site” on page 457. In this scenario, the tests are successful.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 475
Testing failover after adding a new volume group
Because you do not know if the cluster is going to work when needed, repeat the steps of a
rolling site failure as explained in 14.5.2, “Rolling site failure of the Austin site” on page 457.
The new volume group truetarahvg and new logical volume tarahlv are displayed on each
node. However, there is a noticeable difference in total time involved during the site failover
when the lazy update is performed to update the volume group changes.
Syntax
lscluster -i [ -n ] | -s | -m | -d | -c
Description
The lscluster command shows the attributes that are associated with the cluster and the
cluster configuration.
Flags
-i Lists the cluster configuration interfaces on the local node.
-n Allows the cluster name to be queried for all interfaces (applicable only with the -i
flag).
-s Lists the cluster network statistics on the local node.
-m Lists the cluster node configuration information.
-d Lists the cluster storage interfaces.
-c Lists the cluster configuration.
Examples
To list the cluster configuration for all nodes, enter the following command:
lscluster -m
To list the cluster statistics for the local node, enter the following command:
lscluster -s
To list the interface information for the local node, enter the following command:
lscluster -i
To list the interface information for the cluster, enter the following command:
lscluster -i -n mycluster
To list the storage interface information for the cluster, enter the following command:
lscluster -d
To list the cluster configuration, enter the following command:
lscluster -c
Syntax
mkcluster [ -n clustername ] [ -m node[,...] ] -r reposdev [-d shareddisk [,...]]
[-s multaddr_local ] [-v ]
A multicast address is used for cluster communications between the nodes in the cluster.
Therefore, if any network considerations must be reviewed before creating a cluster, consult
your network systems administrator.
Flags
-n clustername Sets the name of the local cluster being created. If no name is
specified when you first run the mkcluster command, a default of
SIRCOL_hostname is used, where hostname is the name
(gethostname()) of the local host.
-m node[,...] Lists the comma-separated resolvable host names or IP addresses for
nodes that are members of the cluster. The local host must be
included in the list. If the -m option is not used, the local host is implied,
causing a one-node local cluster to be created.
-r reposdev Specifies the name, such as hdisk10, of the SAN-shared storage
device that is used as the central repository for the cluster
configuration data. This device must be accessible from all nodes.
This device is required to be a minimum of 1 GB in size and backed by
a redundant and highly available SAN configuration. This flag is
required when you first run the mkcluster command within a Storage
Interconnected Resource Collection (SIRCOL), and cannot be used
thereafter.
-d shareddisk[,...] Specifies a comma-separated list of SAN-shared storage devices,
such as hdisk12,hdisk34, to be incorporated into the cluster
configuration.
These devices are renamed with a cldisk prefix. The same name is
assigned to this device on all cluster nodes from which the device is
accessible. Specified devices must not be open when the mkcluster
command is executed. This flag is used only when you first run the
mkcluster command.
-s multaddr_local Sets the multicast address of the local cluster that is being created.
This address is used for internal communication within the local
cluster. If the -s option is not specified when you first run the
mkcluster command within a SIRCOL, a multicast address is
automatically generated. This flag is used only when you first run the
mkcluster command within a SIRCOL.
-v Specifies the verbose mode.
Examples
To create a cluster of one node and use the default values, enter the following command:
mkcluster -r hdisk1
The output is a cluster named SIRCOL_myhostname with a single node in the cluster. The
multicast address is automatically generated, and no shared disks are created for this
cluster. The repository device is set up on hdisk1, and this disk cannot be used by the
Syntax
rmcluster -n name [-f] [-v]
Description
The rmcluster command removes the cluster configuration. The repository disk and all SAN
Volume Controller (SVC) shared disks are released, and the SAN shared disks are
re-assigned to a generic hdisk name. The generic hdisk name cannot be the same name that
was initially used to add the disk to the cluster.
Flags
-n name Specifies the name of the cluster to be removed.
-f Forces certain errors to be ignored.
-v Specifies the verbose.
Example
To remove the cluster configuration, enter the following command:
rmcluster -n mycluster
Syntax
chcluster [ -n name ] [{ -d | -m } [+|-] name [,....]] ..... [ -q ][ -f ][ -v ]
Description
The chcluster command changes the cluster configuration. With this command, SAN shared
disks and nodes can be added and removed from the cluster configuration.
Examples
To add shared disks to the cluster configuration, enter the following command:
chcluster -n mycluster -d +hdisk20,+hdisk21
To remove shared disks from the cluster configuration, enter the following command:
chcluster -n mycluster -d -hdisk20,-hdisk21
To add nodes to the cluster configuration, enter the following command:
chcluster -n mycluster -m +nodeD,+nodeE
To remove nodes from the cluster configuration, enter the following command:
chcluster -n mycluster -m -nodeD,-nodeE
Syntax
clusterconf [ -u [-f ] | -s | -r hdiskN ] [-v ]
Description
The clusterconf command allows administration of the cluster configuration. A node in a
cluster configuration might indicate a status of DOWN (viewable by issuing the lscluster -m
command). Alternatively, a node in a cluster might not be displayed in the cluster configuration,
and you know the node is part of the cluster configuration (viewable from another node in the
cluster by using the lscluster -m command). In these cases, the following flags allow the
node to search and read the repository disk and take self-correcting actions.
Do not use the clusterconf command option to remove a cluster configuration. Instead, use
the rmcluster command for normal removal of the cluster configuration.
The clusterconf command is a normal cluster service and is automatically handled during
normal operation. This following flags are possible for this command:
-r hdiskN Has the cluster subsystem read the repository device if you know where the
repository disk is (lspv and look for cvg). It causes the node to join the cluster if
the node is configured in the repository disk.
-s Performs an exhaustive search for a cluster repository disk on all configured
hdisk devices. It stops when a cluster repository disk is found. This option
searches all disks that are looking for the signature of a repository device. If a
disk is found with the signature identifying it as the cluster repository, the search
is stopped. If the node finds itself in the cluster configuration on the disk, the
node joins the cluster. If the storage network is dirty and multiple repositories
are in the storage network (not supported), it stops at the first repository disk. If
the node is not in that repository configuration, it does not join the cluster.
Use the -v flag to see which disk was found. Then use the other options on the
clusterconf command to clean up the storage network until the desired results
are achieved.
-u Performs the unconfigure operation for the local node. If the node is in the
cluster repository configuration on the shared disk to which the other nodes
have access, the other nodes in the cluster request this node to rejoin the
cluster. The -u option is used when cleanup must be performed on the local
node. (The node was removed from the cluster configuration. For some reason,
the local node was either down or inaccessible from the network to be removed
during normal removal operations such as when the chcluster -m -nodeA
command was run). The updates to clean up the environment on the local node
are performed by the unconfigure operation.
-f The force option, which performs the unconfigure operation and ignores errors.
-v Verbose mode.
Examples
To clean up the local node, the following command cleans up the nodes environment:
clusterconf -fu
To recover the cluster configuration and start cluster services, enter the following
command:
clusterconf -r hdisk1
To search for the cluster repository device and join the cluster, enter the following
command:
clusterconf -s
Note the following explanation to help you understand how to read the tree:
The number of right-pointing double quotation marks (») indicates the number of screens
that you have to go down in the PowerHA SMIT tree. For example, » » » means that you
must page down three screens.
The double en dashes (--) are used as a separator between the SMIT text and the SMIT
fast path.
The parentheses (()) indicate the fast path.
Because PowerHA 7.1 is not supported on any AIX level before 6.1.6, if the hardware is not
supported on 6.1.6, then by definition PowerHA 7.1 does not support it either. Also, if the
hardware manufacturer has not made any statement of support for AIX 7.1, it is not valid until
such support is stated. This is true even though the tables in this appendix might show that
PowerHA supports it.
This appendix contains information about IBM Power Systems, IBM storage, adapters, and
AIX levels supported by current versions of High-Availability Cluster Multi-Processing
(HACMP) 5.4.1 through PowerHA 7.1. It focuses on hardware support from around the last
five years and consists mainly of IBM POWER5 systems and later. At the time of writing, the
information was current and complete.
All POWER5 and later systems are supported on AIX 7.1 and HACMP 5.4.1 and later. AIX 7.1
support has the following specific requirements for HACMP and PowerHA:
HACMP 5.4.1, SP10
PowerHA 5.5, SP7
PowerHA 6.1, SP3
PowerHA 7.1
Full software support details are in the official support flash. The information in this appendix
is available and maintained in the “PowerHA hardware support matrix” at:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105638
Most of the devices in the online documentation are linked to their corresponding support flash.
Table C-1 POWER5 System p model support for HACMP and PowerHA
System p HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1
models
7037-A50 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9110-510 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9110-51A AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9111-285 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9111-520 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6 r
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9113-550 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9115-505 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9116-561+ AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9117-570 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9118-575 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6 r
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9119-590 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9119-595 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9131-52A AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9133-55A AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
Table C-2 POWER5 System i model support for HACMP and PowerHA
System i models HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1
9406-520 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9406-550 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9406-570 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9406-590 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9406-595 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
8203-E4A AIX 5.3 TL7 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP2 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
8203-E8A AIX 5.3 TL7 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX6.1 TL0 SP AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
8234-EMA AIX 5.3 TL8 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP5 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9117-MMA AIX 5.3 TL6 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9119-FHA AIX 5.3 TL8 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 SP1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9125-F2A AIX 5.3 TL8 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 SP1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
Built-in serial ports: Built-in serial ports in POWER6 servers are not available for
PowerHA use. Instead, use disk heartbeating. However, note that the built-in Ethernet
(IVE) adapters are supported for PowerHA use.
8202-E4B/720 AIX 5.3 TL11 SP1 AIX 5.3 TL12 AIX 5.3 TL12 AIX 6.1 TL6
AIX 6.1 TL4 SP2 AIX 6.1 TL5 AIX 6.1 TL5 AIX 7.1
8205-E6B/740 AIX 5.3 TL11 SP1 AIX 5.3 TL12 AIX 5.3 TL12 AIX 6.1 TL6
AIX 6.1 TL4 SP2 AIX 6.1 TL5 AIX 6.1 TL5 AIX 7.1
8231-E2B/710 AIX 5.3 TL11 SP1 AIX 5.3 TL12 AIX 5.3 TL12 AIX 6.1 TL6
AIX 6.1 TL4 SP2 AIX 6.1 TL5 AIX 6.1 TL5 AIX 7.1
8231-E2B/730 AIX 5.3 TL11 SP1 AIX 5.3 TL12 AIX 5.3 TL12 AIX 6.1 TL6
AIX 6.1 TL4 SP2 AIX 6.1 TL5 AIX 6.1 TL5 AIX 7.1
8233-E8B/750 AIX 5.3 TL11 SP1 AIX 5.3 TL11 SP1 AIX 5.3 TL11 AIX 6.1 TL6 r
AIX 6.1 TL4 SP2 AIX 6.1 TL4 SP3 AIX 6.1 TL4 SP3 AIX 7.1
9117-MMB/770 AIX 5.3 TL11 SP1 AIX 5.3 TL11 AIX 5.3 TL11 AIX 6.1 TL6
AIX 6.1 TL4 SP2 AIX 6.1 TL4 SP3 AIX 6.1 TL4 SP3 AIX 7.1
9119-FHB/795 AIX 5.3 TL11 SP1 AIX 5.3 TL12 AIX 5.3 TL12 AIX 6.1 TL6
AIX 6.1 TL4 SP2 AIX 6.1 TL5 AIX 6.1 TL5 AIX 7.1
9179-FHB/780 AIX 5.3 TL11 SP1 AIX 5.3 TL11 AIX 5.3 TL11 or AIX 6.1 TL6
AIX 6.1 TL4 SP2 AIX 6.1 TL4 SP3 AIX 6.1 TL4 SP3 AIX 7.1
Built-in serial ports: Built-in serial ports in POWER7 Servers are not available for
PowerHA use. Instead, use disk heartbeating. However, note that the built-in Ethernet
(IVE) adapters are supported for PowerHA use.
Table C-5 IBM POWER Blade support for HACMP and PowerHA
System p HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1
models
7778-23X/JS23 HACMP SP2 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL7 AIX 6.1 TL2 SP1 AIX 6.1 TL2 SP1 AIX 7.1
AIX 6.1 TL0 SP2
7778-43X/JS43 HACMP SP2 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL7 AIX 6.1 TL2 SP1 AIX 6.1 TL2 SP1 AIX 7.1
AIX 6.1 TL0 SP2
7998-60X/JS12 HACMP SP2 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL7 AIX 6.1 TL2 SP AIX 6.1 TL2 SP1 AIX 7.1
7998-61X/JS22 HACMP SP2 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL6 AIX 6.1 TL2 SP1 AIX 6.1 TL2 SP1 AIX 7.1
8406-70Y/PS700 AIX 5.3 TL11 SP1 AIX 5.3 TL12 AIX 5.3 TL12 AIX 6.1 TL6
AIX 6.1 TL4 SP2 AIX 6.1 TL5 AIX 6.1 TL5 AIX 7.1
8406-71Y/PS701 AIX 5.3 TL11 SP1 AIX 5.3 TL12 AIX 5.3 TL12 AIX 6.1 TL6
PS702 AIX 6.1 TL4 SP2 AIX 6.1 TL5 AIX 6.1 TL5 AIX 7.1
8844-31U/JS21 AIX 5.3. TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
8844-51U/JS21 AIX 6.1 TL2 SP1 AIX 6.1 TL2 SP1 AIX 7.1
Blade support includes support for IVM and IVE on both POWER6 and POWER7 blades. The
following adapter cards are supported in the POWER6 and POWER7 blades:
8240 Emulex 8Gb FC Expansion Card (CIOv)
8241 QLogic 4Gb FC Expansion Card (CIOv)
8242 QLogic 8Gb Fibre Channel Expansion Card (CIOv)
8246 SAS Connectivity Card (CIOv)
8251 Emulex 4Gb FC Expansion Card (CFFv)
8252 QLogic combo Ethernet and 4 Gb Fibre Channel Expansion Card (CFFh)
8271 QLogic Ethernet/8Gb FC Expansion Card (CFFh)
IBM storage
It is common to use multipathing drivers with storage. If using MPIO, SDD, SDDPCM, or all
three types on any PowerHA controlled storage, you are required to use enhanced concurrent
volume groups (ECVGs). This requirement also applies to vSCSI and NPIV devices.
DS storage units
Table C-6 lists the DS storage unit support for HACMP and PowerHA with AIX.
DS3400 HACMP SP2 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL8 AIX TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL2
DS3500 HACMP SP2 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL8 AIX TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL2
DS4100 AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
DS4200 AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
DS4300 AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
DS4400 AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
DS4500 AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
DS4700 AIX 5.3 TL5 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
DS4800 AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
DS5020 HACMP SP2 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL7 AIX 6.1 TL2 SP1 AIX 6.1 TL2 SP1 AIX 7.1
AIX 6.1 TL0 SP2
DS6000 AIX 5.3 TL5 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
DS6800 AIX 6.1 AIX TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
DS5100 HACMP SP2 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL7 AIX 6.1 TL2 SP1 AIX 6.1 TL2 SP1 AIX 7.1
AIX 6.1 TL0 SP2
DS5300 HACMP SP2 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL7 AIX 6.1 TL2 SP1 AIX 6.1 TL2 SP1 AIX 7.1
AIX 6.1 TL0 SP2
DS8000 AIX 5.3 TL5 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
931,932,9B2 AIX 6.1 AIX TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
DS8700 HACMP SP2 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL8 AIX TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL2
IBM XIV
Table C-7 lists the software versions for HAMCP and PowerHA with AIX supported on XIV
storage. PowerHA requires XIV microcode level 10.0.1 or later.
Table C-7 IBM XIV support for HACMP and PowerHA with AIX
Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1
XIV HACMP SP4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
2810-A14 AIX 5.3 TL7 SP6 AIX TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL0 SP2
Table C-8 SVC supported models for HACMP and PowerHA with AIX
Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1
2145-4F2 HACMP SP8 PowerHA SP6 PowerHA SP1 AIX 6.1 TL6
AIX 5.3 TL9 AIX 5.3 TL9 AIX 5.3 TL9 AIX 7.1
AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3
2145-8F2 HACMP SP8 PowerHA SP8 PowerHA SP1 AIX 6.1 TL6
AIX 5.3 TL9 AIX 5.3 TL9 AIX 5.3 TL9 AIX 7.1
AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3
Network-attached storage
Table C-9 shows the software versions for PowerHA and AIX supported on network-attached
storage (NAS).
Table C-9 NAS supported models for HACMP and PowerHA with AIX
Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1
N3700 (A20) AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
N5200 (A20) AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
N5200 (G20) AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
N5300 HACMP SP3 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL0 SP2
N5500 (A20) AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
N5500 (G20) AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
N5600 HACMP SP3 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL0 SP2
N6040 AIX 5.3 TL7 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP2 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
N6060 AIX 5.3 TL7 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP2 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
N6070 AIX 5.3 TL7 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP2 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
N7600 (A20) AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
N7600 (G20) AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
N7700 (A21) AIX 5.3 TL7 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP2 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
N7700 (G21) AIX 5.3 TL7 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP2 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
N7800 (A20) AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
N7800 (G20) AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
N7900 (A21) AIX 5.3 TL7 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP2 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
N7900 (G21) AIX 5.3 TL7 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP2 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
Serial-attached SCSI
Table C-10 lists the software versions for PowerHA and AIX supported on the serial-attached
SCSI (SAS) model.
Table C-10 SAS supported model for HACMP and PowerHA with AIX
Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1
5886 EXP12S HACMP SP5 HACMP SP2 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3
SCSI
Table C-11 shows the software versions for PowerHA and AIX supported on the SCSI model.
Table C-11 SCSI supported model for HACMP and PowerHA with AIX
Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1
7031-D24 AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
Adapters
This following sections contain information about the supported adapters for PowerHA.
#5273/#5735 PCI-Express Dual Port Fibre Channel Adapter: The 5273/5735 minimum
requirements are PowerHA 5.4.1 SP2 or 5.5 SP1.
SAS
The following SAS adapters are supported:
#5278 LP 2x4port PCI-Express SAS Adapter 3 Gb
#5901 PCI-Express SAS Adapter
#5902 PCI-X DDR Dual –x4 Port SAS RAID Adapter
#5903 PCI-Express SAS Adapters
#5912 PCI-X DDR External Dual – x4 Port SAS Adapter
Table C-12 SAS software support for HACMP and PowerHA with AIX
HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1
HACMP SP5 HACMP SP2 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3
Ethernet
The following Ethernet adapters are supported with PowerHA:
#1954 4-Port 10/100/100 Base-TX PCI-X Adapter
#1959 IBM 10/100/1000 Base-TX Ethernet PCI-X Adapter
#1978 IBM Gigabit Ethernet-SX PCI-X Adapter
#1979 IBM 10/100/1000 Base-TX Ethernet PCI-X Adapter
#1981 IBM 10 Gigabit Ethernet-SR PCI-X Adapter
#1982 IBM 10 Gigabit Ethernet-LR PCI-X Adapter
#1983 IBM 2-port 10/100/1000 Base-TX Ethernet PCI-X
#1984 IBM Dual Port Gigabit Ethernet-SX PCI-X Adapter
#1990 IBM 2-port 10/100/1000 Base-TX Ethernet PCI-X
#4961 IBM Universal 4-Port 10/100 Ethernet Adapter
#4962 IBM 10/100 Mbps Ethernet PCI Adapter II
#5271 LP 4-Port Ethernet 10/100/1000 Base-TX PCI-X Adapter
#5274 LP 2-Port Gigabit Ethernet-SX PCI Express
#5700 IBM Gigabit Ethernet-SX PCI-X Adapter
#5701 IBM 10/100/1000 Base-TX Ethernet PCI-X Adapter
#5706 IBM 2-Port 10/100/1000 Base-TX Ethernet PCI-X Adapter
#5707 IBM 2-Port Gigabit Ethernet-SX PCI-X Adapter
#5717 IBM 4-Port Ethernet 10/100/1000 Base-TX PCI-X Adapter
InfiniBand
The following InfiniBand adapters are supported with PowerHA:
#1809 IBM GX Dual-port 4x IB HCA
#1810 IBM GX Dual-port 4x IB HCA
#1811 IBM GX Dual-port 4x IB HCA
#1812 IBM GX Dual-port 4x IB HCA
#1820 IBM GX Dual-port 12x IB HCA
Purpose
=======
clmgr: Provides a consistent, reliable interface for performing IBM PowerHA
SystemMirror cluster operations via a terminal or script. All clmgr
operations are logged in the "clutils.log" file, including the
command that was executed, its start/stop time, and what user
initiated the command.
This consistency helps make clmgr easier to learn and use. Further
help is also available at each part of clmgr's commmand line. For
example, just executing "clmgr" by itself will result in a list of
the available ACTIONs supported by clmgr. Executing "clmgr ACTION"
with no CLASS provided will result in a list of all the available
CLASSes for the specified ACTION. Executing "clmgr ACTION CLASS"
with no NAME or ATTRIBUTES provided is slightly different, though,
since for some ACTION+CLASS combinations, that may be a valid
command format. So to get help in this scenario, it is necessary
to explicitly request it by appending the "-h" flag. So executing
"clmgr ACTION CLASS -h" will result in a listing of all known
attributes for that ACTION+CLASS combination being displayed.
That is where clmgr's ability to help ends, however; it can not
help with each individual attribute. If there is a question about
what a particular attribute is for, or when to use it, the product
Synopsis
========
clmgr [-c|-x] [-S] [-v] [-f] [-D] [-l {low|med|high|max}] [-T <ID>]
[-a {<ATTR#1>,<ATTR#2>,<ATTR#n>,...}] <ACTION> <CLASS> [<NAME>]
[-h | <ATTR#1>=<VALUE#1> <ATTR#2>=<VALUE#2> <ATTR#n>=<VALUE#n>]
ACTION={add|modify|delete|query|online|offline|...}
CLASS={cluster|site|node|network|resource_group|...}
add (Aliases: a)
query (Aliases: q, ls, get)
modify (Aliases: mod, ch, set)
delete (Aliases: de, rm, er)
Cluster, Method:
verify (Aliases: ve)
CLASS the type of object upon which the ACTION will be performed.
The complete list of supported CLASSes is:
NAME the specific object, of type "CLASS", upon which the ACTION
is to be performed.
NOTE: an ATTR may not always need to be fully typed. Only the
number of leading characters required to uniquely identify
the attribute from amongst the set of attributes available
for the specified operation need to be provided. So instead
of "FC_SYNC_INTERVAL", for the "add/modify cluster"
operation, "FC" could be used, and would have the same
result.
Operations
==========
CLUSTER:
clmgr add cluster \
[ <cluster_label> ] \
REPOSITORY=<hdisk#> \
SITE:
clmgr add site <sitename> \
[ NODES=<node>[,<node#2>,<node#n>,...] ]
clmgr modify site <sitename> \
[ NEWNAME=<new_site_label> ] \
[ {ADD|REPLACE}={ALL|<node>[,<node#2>,<node#n>,...}] ]
At least one modification option must be specified.
ADD attempts to append the specified nodes to the site.
REPLACE attempts to replace the sites current nodes with
the specified nodes.
clmgr query site [ <sitename>[,<sitename#2>,<sitename#n>,...] ]
clmgr delete site {<sitename>[,<sitename#2>,<sitename#n>,...] | ALL}
clmgr recover site <sitename>
clmgr offline site <sitename> \
[ WHEN={now|restart|both} ] \
NODE:
clmgr add node <node> \
[ COMMPATH=<ip_address_or_network-resolvable_name> ] \
[ RUN_DISCOVERY={true|false} ] \
[ PERSISTENT_IP=<IP> NETWORK=<network>
{NETMASK=<255.255.255.0 | PREFIX=1..128} ] \
[ START_ON_BOOT={false|true} ] \
[ BROADCAST_ON_START={true|false} ] \
[ CLINFO_ON_START={false|true|consistent} ] \
[ VERIFY_ON_START={true|false} ]
clmgr modify node <node> \
[ NEWNAME=<new_node_label> ] \
[ COMMPATH=<new_commpath> ] \
[ PERSISTENT_IP=<IP> NETWORK=<network>
{NETMASK=<255.255.255.0 | PREFIX=1..128} ] \
[ START_ON_BOOT={false|true} ] \
[ BROADCAST_ON_START={true|false} ] \
[ CLINFO_ON_START={false|true|consistent} ] \
[ VERIFY_ON_START={true|false} ]
clmgr query node [ {<node>|LOCAL}[,<node#2>,<node#n>,...] ]
clmgr delete node {<node>[,<node#2>,<node#n>,...] | ALL}
clmgr manage node undo_changes
clmgr recover node <node>[,<node#2>,<node#n>,...]
clmgr online node <node>[,<node#2>,<node#n>,...] \
[ WHEN={now|restart|both} ] \
[ MANAGE={auto|manual} ] \
[ BROADCAST={false|true} ] \
[ CLINFO={false|true|consistent} ] \
[ FORCE={false|true} ] \
[ FIX={no|yes|interactively} ]
[ TIMEOUT=<seconds_to_wait_for_completion> ]
clmgr offline node <node>[,<node#2>,<node#n>,...] \
[ WHEN={now|restart|both} ] \
[ MANAGE={offline|move|unmanage} ] \
[ BROADCAST={true|false} ] \
[ TIMEOUT=<seconds_to_wait_for_completion> ]
NETWORK:
INTERFACE:
clmgr add interface <interface> \
NETWORK=<network> \
[ NODE=<node> ] \
[ TYPE={ether|infiniband} ] \
[ INTERFACE=<network_interface> ]
clmgr modify interface <interface> \
NETWORK=<network>
clmgr query interface [ <interface>[,<if#2>,<if#n>,...] ]
clmgr delete interface {<interface>[,<if#2>,<if#n>,...] | ALL}
RESOURCE GROUP:
clmgr add resource_group <resource_group> \
NODES=nodeA1,nodeA2,... \
[ SECONDARYNODES=nodeB2,nodeB1,... ] \
[ STARTUP={OHN|OFAN|OAAN|OUDP} ] \
[ FALLOVER={FNPN|FUDNP|BO} ] \
[ FALLBACK={NFB|FBHPN} ] \
[ NODE_PRIORITY_POLICY={default|mem|cpu| \
disk|least|most} ] \
[ NODE_PRIORITY_POLICY_SCRIPT=</path/to/script> ] \
[ NODE_PRIORITY_POLICY_TIMEOUT=### ] \
[ SITE_POLICY={ignore|primary|either|both} ] \
[ SERVICE_LABEL=service_ip#1[,service_ip#2,...] ] \
[ APPLICATIONS=appctlr#1[,appctlr#2,...] ] \
STARTUP:
OHN ----- Online Home Node (default value)
OFAN ---- Online on First Available Node
OAAN ---- Online on All Available Nodes (concurrent)
OUDP ---- Online Using Node Distribution Policy
FALLOVER:
FNPN ---- Fallover to Next Priority Node (default value)
FUDNP --- Fallover Using Dynamic Node Priority
BO ------ Bring Offline (On Error Node Only)
FALLBACK:
NFB ----- Never Fallback
FBHPN --- Fallback to Higher Priority Node (default value)
NODE_PRIORITY_POLICY:
NOTE: this policy may only be established if if the FALLOVER
policy has been set to "FUDNP".
default - next node in the NODES list
mem ----- node with most available memory
disk ---- node with least disk activity
cpu ----- node with most available CPU cycles
least --- node where the dynamic node priority script
returns the lowest value
most ---- node where the dynamic node priority script
returns the highest value
SITE_POLICY:
ignore -- Ignore
primary - Prefer Primary Site
either -- Online On Either Site
both ---- Online On Both Sites
FALLBACK TIMER:
clmgr add fallback_timer <timer> \
[ YEAR=<####> ] \
[ MONTH=<{1..12 | Jan..Dec}> ] \
[ DAY_OF_MONTH=<{1..31}> ] \
[ DAY_OF_WEEK=<{0..6 | Sun..Sat}> ] \
HOUR=<{0..23}> \
MINUTE=<{0..59}>
clmgr modify fallback_timer <timer> \
[ YEAR=<{####}> ] \
[ MONTH=<{1..12 | Jan..Dec}> ] \
PERSISTENT IP/LABEL:
clmgr add persistent_ip <persistent_IP> \
NETWORK=<network> \
[ NODE=<node> ]
clmgr modify persistent_ip <persistent_label> \
[ NEWNAME=<new_persistent_label> ] \
[ NETWORK=<new_network> ] \
[ PREFIX=<new_prefix_length> ]
clmgr query persistent_ip [ <persistent_IP>[,<pIP#2>,<pIP#n>,...] ]
clmgr delete persistent_ip {<persistent_IP>[,<pIP#2>,<pIP#n>,...] |
ALL}
clmgr move persistent_ip <persistent_IP> \
INTERFACE=<new_interface>
SERVICE IP/LABEL:
clmgr add service_ip <service_ip> \
NETWORK=<network> \
[ {NETMASK=<255.255.255.0 | PREFIX=1..128} ] \
[ HWADDR=<new_hardware_address> ] \
[ SITE=<new_site> ]
clmgr modify service_ip <service_ip> \
[ NEWNAME=<new_service_ip> ] \
[ NETWORK=<new_network> ] \
[ {NETMASK=<###.###.###.###> | PREFIX=1..128} ] \
[ HWADDR=<new_hardware_address> ] \
[ SITE=<new_site> ]
clmgr query service_ip [ <service_ip>[,<service_ip#2>,...] ]
clmgr delete service_ip {<service_ip>[,<service_ip#2>,,...] | ALL}
clmgr move service_ip <service_ip> \
INTERFACE=<new_interface>
APPLICATION CONTROLLER:
clmgr add application_controller <application_controller> \
STARTSCRIPT="/path/to/start/script" \
STOPSCRIPT ="/path/to/stop/script"
[ MONITORS=<monitor>[,<monitor#2>,<monitor#n>,...] ]
APPLICATION MONITOR:
clmgr add application_monitor <monitor> \
TYPE={Process|Custom} \
APPLICATIONS=<appctlr#1>[,<appctlr#2>,<appctlr#n>,...] \
MODE={continuous|startup|both} \
[ STABILIZATION="1 .. 3600" ] \
[ RESTARTCOUNT="0 .. 100" ] \
[ FAILUREACTION={notify|fallover} ] \
Process Arguments:
PROCESSES="pmon1,dbmon,..." \
OWNER="<processes_owner_name>" \
[ INSTANCECOUNT="1 .. 1024" ] \
[ RESTARTINTERVAL="1 .. 3600" ] \
[ NOTIFYMETHOD="</script/to/notify>" ] \
[ CLEANUPMETHOD="</script/to/cleanup>" ] \
[ RESTARTMETHOD="</script/to/restart>" ]
Custom Arguments:
MONITORMETHOD="/script/to/monitor" \
[ MONITORINTERVAL="1 .. 1024" ] \
[ HUNGSIGNAL="1 .. 63" ] \
[ RESTARTINTERVAL="1 .. 3600" ] \
[ NOTIFYMETHOD="</script/to/notify>" ] \
[ CLEANUPMETHOD="</script/to/cleanup>" ] \
[ RESTARTMETHOD="</script/to/restart>" ]
DEPENDENCY:
# Acquisition/Release Order
clmgr add dependency \
TYPE={ACQUIRE|RELEASE} \
{ SERIAL="{<rg1>,<rg2>,...|ALL}" |
PARALLEL="{<rg1>,<rg2>,...|ALL}" }
clmgr modify dependency \
TYPE={ACQUIRE|RELEASE} \
{ SERIAL="{<rg1>,<rg2>,...|ALL}" |
PARALLEL="{<rg1>,<rg2>,...|ALL}" }
FILE COLLECTION:
clmgr add file_collection <file_collection> \
FILES="/path/to/file1,/path/to/file2,..." \
[ SYNC_WITH_CLUSTER={no|yes} ] \
[ SYNC_WHEN_CHANGED={no|yes} ] \
[ DESCRIPTION="<file_collection_description>" ]
clmgr modify file_collection <file_collection> \
[ NEWNAME="<new_file_collection_label>" ] \
[ ADD="/path/to/file1,/path/to/file2,..." ] \
[ DELETE={"/path/to/file1,/path/to/file2,..."|ALL} ] \
[ REPLACE={"/path/to/file1,/path/to/file2,..."|""} ] \
[ SYNC_WITH_CLUSTER={no|yes} ] \
[ SYNC_WHEN_CHANGED={no|yes} ] \
[ DESCRIPTION="<file_collection_description>" ]
clmgr query file_collection [ <file_collection>[,<fc#2>,<fc#n>,...]]
clmgr delete file_collection {<file_collection>[,<fc#2>,<fc#n>,...]|
ALL}
clmgr sync file_collection <file_collection>
SNAPSHOT:
clmgr add snapshot <snapshot> \
DESCRIPTION="<snapshot_description>" \
[ METHODS="method1,method2,..." ] \
[ SAVE_LOGS={false|true} ]
clmgr modify snapshot <snapshot> \
[ NEWNAME="<new_snapshot_label>" ] \
[ DESCRIPTION="<snapshot_description>" ]
clmgr query snapshot [ <snapshot>[,<snapshot#2>,<snapshot#n>,...] ]
clmgr view snapshot <snapshot> \
[ TAIL=<number_of_trailing_lines> ] \
METHOD:
clmgr add method <method_label> \
TYPE={snapshot|verify} \
FILE=<executable_file> \
[ DESCRIPTION=<description> ]
clmgr modify method <method_label> \
TYPE={snapshot|verify} \
[ NEWNAME=<new_method_label> ] \
[ DESCRIPTION=<new_description> ] \
[ FILE=<new_executable_file> ]
LOG:
clmgr modify logs ALL DIRECTORY="<new_logs_directory>"
clmgr modify log {<log>|ALL} \
[ DIRECTORY="{<new_log_directory>"|DEFAULT} ]
[ FORMATTING={none|standard|low|high} ] \
[ TRACE_LEVEL={low|high} ]
[ REMOTE_FS={true|false} ]
clmgr query log [ <log>[,<log#2>,<log#n>,...] ]
clmgr view log [ {<log>|EVENTS} ] \
[ TAIL=<number_of_trailing_lines> ] \
[ HEAD=<number_of_leading_lines> ] \
[ FILTER=<pattern>[,<pattern#2>,<pattern#n>,...] ] \
[ DELIMITER=<alternate_pattern_delimiter> ] \
[ CASE={insensitive|no|off|false} ]
clmgr manage logs collect \
[ DIRECTORY="<directory_for_collection>" ] \
[ NODES=<node>[,<node#2>,<node#n>,...] ] \
[ RSCT_LOGS={yes|no} ] \
VOLUME GROUP:
clmgr query volume_group
LOGICAL VOLUME:
clmgr query logical_volume
FILE_SYSTEM:
clmgr query file_system
PHYSICAL VOLUME:
clmgr query physical_volume \
[ <disk>[,<disk#2>,<disk#n>,...] ] \
[ NODES=<node>,<node#2>[,<node#n>,...] ] \
[ ALL={no|yes} ]
REPORT:
clmgr view report [<report>] \
[ FILE=<PATH_TO_NEW_FILE> ] \
[ TYPE={text|html} ]
Usage Examples
==============
clmgr query cluster
Suggested Reading
=================
IBM PowerHA SystemMirror for AIX Troubleshooting Guide
IBM PowerHA SystemMirror for AIX Planning Guide
IBM PowerHA SystemMirror for AIX Installation Guide
Prerequisite Information
========================
IBM PowerHA SystemMirror for AIX Concepts and Facilities Guide
Related Information
===================
IBM PowerHA SystemMirror for AIX Administration Guide
The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this book.
IBM Redbooks
The following IBM Redbooks publication provides additional information about the topic in this
document. Note that it might be available in softcopy only.
Best Practices for DB2 on AIX 6.1 for POWER Systems, SG24-7821
DS8000 Performance Monitoring and Tuning, SG24-7146
IBM AIX Version 7.1 Differences Guide, SG24-7910
IBM System Storage DS8700 Architecture and Implementation, SG24-8786
Implementing IBM Systems Director 6.1, SG24-7694
Personal Communications Version 4.3 for Windows 95, 98 and NT, SG24-4689
PowerHA for AIX Cookbook, SG24-7739
You can search for, view, download or order this document and other Redbooks, Redpapers,
Web Docs, draft, and additional materials, at the following website:
ibm.com/redbooks
Other publications
These publications are also relevant as further information sources:
Cluster Management, SC23-6779
PowerHA SystemMirror for IBM Systems Director, SC23-6763
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Administration Guide,
SC23-6750
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Concepts and Facilities
Guide, SC23-6751
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Installation Guide,
SC23-6755
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Master Glossary, SC23-6757
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Planning Guide,
SC23-6758-01
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Programming Client
Applications, SC23-6759
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Smart Assist Developer’s
Guide, SC23-6753
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Smart Assist for DB2 user’s
Guide, SC23-6752
Online resources
These websites are also relevant as further information sources:
IBM PowerHA SystemMirror for AIX
http://www.ibm.com/systems/power/software/availability/aix/index.html
PowerHA hardware support matrix
http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105638
IBM PowerHA High Availability wiki
http://www.ibm.com/developerworks/wikis/display/WikiPtype/High%20Availability
Implementation Services for Power Systems for PowerHA for AIX
http://www.ibm.com/services/us/index.wss/offering/its/a1000032
IBM training classes for PowerHA SystemMirror for AIX
http://www.ibm.com/training
A B
-a option bootstrap repository 225
clmgr command 109 built-in serial ports 493–494
wildcards 110
ADAPTER_DOWN event 12
adapters C
Ethernet 499 -c flag 209
fibre channel 498 CAA (Cluster Aware AIX) 7, 225
for the repository disk 49 /etc/filesystems file 318
InfiniBand 500 AIX 6.1 and 7.1 3
PCI bus 500 central repository 9
SAS 499 changed PVID of repository disk 322
SCSI and iSCSI 500 chcluster command 480
adaptive failover 35, 102 clcomdES 24
Add Network tab 344 cluster after the node restarts 317
adding on new volume group 416 cluster commands 477
Additional Properties tab 257 cluster creation 154, 318
agent password 328 cluster environment management 11
AHAFS (Autonomic Health Advisor File System) 11 cluster services not active message 323
files used in RSCT 12 clusterconf command 481
AIX collecting debug information for IBM support 231
commands and log files 216 commands and log files 224
disk and dev_group association 443 communication 156
importing volume groups 383 daemons 8
installation of IBM Systems Director 327 disk fencing 37
volume group configuration 381 file sets 7, 179
AIX 6.1 67 initial cluster status 82
AIX 6.1 TL6 152 log files for troubleshooting 306
for migration 47 lscluster command 478
upgrading to 153 mkcluster command 478
AIX 7.1 support of PowerHA 6.1 193 previously used repository disk 316
AIX BOS components removal of volume group when rmcluster does not
installation 59 320
prerequisites for 44 repository disk 9
AIX_CONTROLLED interface state 18 repository disk replacement 317
Index 523
lscluster command for cluster information 209 common storage 337
map view 365 communication interfaces, adding 387
multicast information 205, 207, 217 communication node status 18
network configuration and routing table 219 communication path 314
network interfaces configuration 203 components, Reliable Scalable Cluster Technology
ODM classes 236 (RSCT) 2
of activities before starting a cluster 364 configuration
PowerHA groups 203 AIX disk and dev_group association 443
recovery from configuration issues 369 cluster 385
repository disk 206 adding 385
repository disk, CAA, solidDB 225 adding a node 386
routing table 204 adding communication interfaces 387
solidDB log files 230 cluster resources and resource group 388
subsystem services status 366 Hitachi TrueCopy/HUR resources 429
SystemMirror plug-in 364 PowerHA cluster 65
tcipdump, iptrace, mping utilities 220 recovery from issues 369
tools 231 troubleshooting 312
topology view 364 verification and synchronization 360
cluster node CLI 363
installation of SystemMirror agent 330 GUI 360
status and mapping 287 verification of Hitachi TrueCopy/HUR 453
Cluster Nodes and Networks 27 volume groups and file systems on primary site 381
cluster resources, configuration 388 Configure Persistent Node IP Label/Address menu 31
Cluster services are not active message 323 CPU starvation 292
Cluster Snapshot menu 31 Create Dependency function 357
Cluster Standard Configuration menu 29 creation
cluster status 208 custom resource group 351
cluster still stuck in migration condition 308 predefined resource group 353
cluster testing 259, 297 resource group 349
CPU starvation 292 verifying 355
crash in node with active resource group 289 C-SPOC
dynamic node priority 302 adding Global Mirror pair to existing volume group
Group Services failure 296 405
loss of the rootvg volume group 286, 289 creating a volume group 412
network failure 283 disaster recovery 373
network failure simulation 282 on other LVM operations 422
repository disk heartbeat channel 269 storage resources and resource groups 86
rootvg system event 286 cthags, grpsvcs 6
rootvg volume group offline 288 Custom Cluster Configuration menu 30
SAN-based heartbeat channel 260 custom configuration 68, 78
cluster topology 385 verifying and synchronizing 81
Cluster Topology Configuration Report 367
cluster topology information 235
CLUSTER_OVERRIDE environment variable 36 D
deleting the variable 37 -d flag 214
error message 37 daemons
clusterconf command 481 CAA 8
description 481 clcomd 8
examples 482 clconfd 8
flags 482 cld 8
syntax 481 failure in Group Services 12
clutils file 306 data collection, problem determination 370
Collect log files button 347 DB2
collection monitoring installation on nodes for Smart Assist 136
/etc/cluster/rhosts file 202 instance and database on shared 137
/etc/hosts file 202 debug information, collecting for IBM support 231
colon-delimited format of clmgr command 130 default value 122
command dev_group association and AIX disk configuration 443
help 341 disaster recovery
profile 157 C-SPOC operations 373
DS8700 requirements 372
Index 525
primary site 402 disaster recovery 371
secondary site 400 DS8700 requirements 372
failbackpprc command 402 failover testing 393
failover of PPRC pairs back to primary site 401 graceful site failover 395
failover testing 393 rolling site failure 398
graceful site failover 395 site re-integration 400
Hitachi TrueCopy/HUR 454 importing new volume group to remote site 416
rolling site failure 398 importing volume groups 383
site re-integration 400 LVM administration of replicated resources 404
failoverpprc command 383 mirror group 391
fallover testing planning for disaster recovery 372
after adding new volume group 476 software prerequisites 372
after making LVM changes 469 PPRC and SPPRC file sets 372
fast path, smitty cm_apps_resources 95 relationship configuration 377
fcsX FlashCopy relationship creation 379
device busy 57 Global Copy relationship creation 378
X value 57 PPRC path creation 377
Fibre Channel adapters 495, 498 session identifier 379
DS storage units 495 sessions for involved LSSs 380
IBM XIV 496 source and target volumes 380
SAN-based communication 57 resource configuration 374
SVC 497 prerequisites 375
file collection and logs management 346 source and target volumes 375
file collection creation 346 resource definition 389
file sets 7, 61 resources and resource group definition 391
installation 58, 64 service IP definition 388
PowerHA 62 session identifier 379
PPRC and SPPRC 372 sessions for all involved LSSs 380
Smart Assist 91 size increase of existing file system 410
Smart Assist for DB2 136 source and target volumes 380
file systems 121 storage agent 389
configuration 381 storage system 390
creation with volume groups 447 symmetrical configuration 376
importing for Smart Assist for DB2 137 synchronization and verification of cluster configura-
increasing size 468 tion 416
size increase 410 testing fallover after adding new volume group 417
FILTER argument 132 gossip protocol 13
FlashCopy relationship creation 379 graceful site failover 395, 455, 460
FQDN format on host names 75 moving resource group to another site 395
Group Services 2
daemon failure 12
G failure 296
GENXD Replicated Resources field 393 information 218
Global Copy relationships 378 subsystem name
Global Mirror cthags 6
adding a cluster 385 grpsvcs 6
adding a logical volume 407 switch to CAA 156
adding a node 386 grpsvcs
adding a pair to a new volume group 411 cthags 6
adding a pair to an existing volume group 404 SRC subsystem 156
adding communication interfaces 387
adding networks 387
adding new logical volume on new volume group 416 H
adding sites 386 HACMPtopsvcs class 238
AIX volume group configuration 381 halt -q command 289
cluster configuration 385 hardware configuration
cluster resources and resource group configuration Fibre Channel adapters for SAN-based communica-
388 tion 57
considerations for disaster recovery 373 SAN zoning 54
creating new volume group 412 shared storage 55
C-SPOC operations 373 test environment 54
Index 527
states 14 M
IPAT via aliasing subnetting requirements 51 -m flag 209
IPAT via replacement configuration 162 management interfaces 13
iptrace utility 220, 222 map view 365
iSCSI adapters 500 migration
AIX 6.1 TL6 152
L CAA cluster creation 154
LDEV hex values 433 clcomdES and clcomd subsystems 157
log files 224 considerations 152
AIX 216 planning 46
clmgr command 130 AIX 6.1 TL6 47
displaying content using clmgr command 132 PowerHA 7.1 151
PowerHA 306 premigration checking 153, 157
troubleshooting 306 process 153
logical subsystem (LSS), Global Mirror session definition protocol 155
380 snapshot 161
logical volume 416, 466 SRC subsystem changes 157
adding 407 stages 153
Logical Volume Manager (LVM) switch from Group Services to CAA 156
administration of Global Mirror replicated resources troubleshooting 308
404 clmigcheck script 308
commands over repository disk 207 cluster still stuck in migration condition 308
lppchk command 64 non-IP networks 308
lsavailpprcport command 377 upgrade to AIX 6.1 TL6 153
lscluster command 18, 82, 478 upgrade to PowerHA 7.1 154
-c flag 82, 209 mirror group 389, 391
cluster information 209 mkcluster command 478
-d flag 214 description 479
description 478 examples 479
examples 478 flags 479
-i flag 14–15, 20, 83, 211, 275 syntax 478
output 16 mksnapshot command 347
-m flag 18, 20, 82, 209, 273 mkss alias 347
-s flag 215 modification functionality 349
syntax 478 monitoring 201
zone 211 /etc/cluster/rhosts file 202
lscluster -m command 18 /etc/hosts file 202
lslpp command 59 /etc/inittab file 206
lsmap -all command 287 /etc/syslogd.conf file 206
lspv command 127 /usr/es/sbin/cluster/utilities/ file tools 234
lssi command 377 /var/adm/ras/syslog.caa log file 229
lssrc -ls clstrmgrES command 163 AIX commands and log files 216
lssrc -ls cthags command 218 CAA commands and log files 224
lssrc -ls grpsvcs command 218 CAA debug information for IBM support 231
lsvg command 10 CAA subsystem group active 208
LUN pairs CAA subsystem guide 208
adding to existing volume group 463 CAA subsystems 202
adding to new volume group 469 cldump utility 233
LUNs clmgr utility 241
assigning to hosts 429 clstat utility 231
LDEV hex values 433 cluster status 205, 208, 218
LVM (Logical Volume Manager) common agent subsystems 205
commands over repository disk 207 disk configuration 203, 207, 216
C-SPOC 422 Group Services 218
Global Mirror replicated resources 404 IBM Systems Director web interface 246
lwiplugin.bat script 330 information collection
lwiplugin.sh script 330 after cluster configuration 206
after cluster is running 216
before configuration 202
lscluster command for cluster information 209
Index 529
SMIT tree 483 replicated disks, volume group and file system creation
supported hardware 491 447
SystemMirror replicated pairs 432
architecture foundation 1 LVM administration 463
management interfaces 13 replicated resources
SystemMirror 7.1 1 adding 451
SystemMirror 7.1 features 23 adding to a resource group 452
PowerHA 6.1 support on AIX 7.1 193 defining to PowerHA 451
PowerHA 7.1 36 LVM administration of 404
considerations 46 reports
file set installation 58 Application Availability and Configuration 358
migration to 151–153 Cluster Configuration Report 366
software installation 59 Cluster Topology Configuration 367
SystemMirror plug-in 21 repository disk 9
volume group consideration 64 changed PVID 322
PPRC cluster 225
failing back pairs to primary site 402 cluster monitoring 206
failing back pairs to secondary site 400 configuration 73
failing over pairs back to primary site 401 hardware requirements 45
file sets 372 heartbeat channel testing 269
path creation 377 LVM command support 207
Prefer Primary Site policy 452, 462 node connection 83
premigration checking 153, 157 previously used for CAA 316
previous version 1 replacement 317
primary node 120–121 snapshot migration 166
problem determination data collection 370 resource configuration, Global Mirror 374
PVID (physical volume ID) 88, 115 prerequisites 375
of repository disk 322 source and target volumes 375
symmetrical configuration 376
resource group
Q adding 392
query action 242 adding from C-SPOC 86
adding Hitachi TrueCopy/HUR replicated resources
R 452
raidscan command 463 adding resources 392
Redbooks Web site, Contact us xiv application list 353
redundant heartbeat testing 260 circular dependencies 33
refresh -s clcomd command 69 clmgr command 120
refresh -s syslogd command 306 configuration 95, 388
relationship configuration 377 crash in node 289
FlashCopy relationship creation 379 creation
Global Copy relationship creation 378 verifying 355
PPRC path creation 377 with SystemMirror plug-in GUI wizard 349
session identifier 379 custom creation 351
sessions for involved LSSs 380 definition 391
source and target volumes 380 dependencies, Start After 32
Reliable Scalable Cluster Technology (RSCT) 2 management 355
AHAFS files 12 CLI 359
architecture changes for v3.1 3 CLI command usage 360
CAA support 3 functionality 357
cluster security services 2 wizard access 355
components 2 moving to another site 395
Group Services (grpsvcs) 2 mutual-takeover dual-node implementation 68
installation 59 OFFLINE DUE TO TARGET OFFLINE 33
PowerHA 5 parent/child dependency 33
prerequisites for 44 predefined creation 353
Remote Monitoring and Control 2 removal 358
resource managers 2 status change 359
Topology Services 2 resource group dependencies
remote site, importing new volume group 416 Start After and Stop After configuration 96
Index 531
steps before starting 139 socksimple command 263
SystemMirror configuration 139 software prerequisites 372
updating /etc/services file 139 software requirements 44
smcli command 257 solid subsystem 8
smcli lslog command 348 solidDB 225
smcli mkcluster command 341 log file names 231
smcli mkfilecollection command 348 log files 230
smcli mksnapshot command 347 SQL interface 228
smcli synccluster -h -v command 363 status 226
smcli undochanges command 363 source and target volumes
smcli utility 22 disaster recovery 375
smit bffcreate command 62 including in Global Mirror session 380
smit clstop command 163 SPPRC file sets 372
SMIT menu 65 SRC subsystem changes during migration 157
changes 66 Start After resource group dependency 32, 297
configuration 66 configuration 96
custom configuration 68, 78 standard configuration testing 298
locating available options 66 testing 297
resource group dependencies configuration 96 Startup Monitoring, testing application startup 298
resources and applications configuration 86 startup processing 38
resources configuration 68 Stop After resource group dependency 32
typical configuration 67, 69 configuration 96
SMIT panel 25 storage
Cluster Snapshot menu 31 agent 389
Cluster Standard Configuration menu 29 Fibre Channel adapters 495
Configure Persistent Node IP Label/Address menu management 345
31 NAS 497
Custom Cluster Configuration menu 30 resources, adding from C-SPOC 86
SMIT tree 25 SAS 498
smitty clstart 28 SCSI 498
smitty clstop 28 system 389–390
smitty hacmp 26 Storage Framework Communication (sfwcom) 213
smit sysmirror fast path 66 storage planning 48
SMIT tree 25, 483 adapters for the repository disk 49
smitty clstart command 28 multipath driver 50
smitty clstop command 28 shared storage for repository disk 48
smitty cm_apps_resources fast path 95 System Storage Interoperation Center 50
smitty hacmp command 26 Storage tab 256
smitty sysmirror command 26 subnetting requirements for IPAT via aliasing 51
PowerHA cluster topology 14 subsystem services status 366
snapshot supported hardware 45, 491
conversion 168 supported storage, third-party multipathing software
failure to restore 169 49–50
restoration 169 SVC (SAN Volume Controller) 497
snapshot migration 161, 164 symmetrical configuration 376
/etc/cluster/rhosts file 168 synccluster command 363
adding shared disk to CAA services 173 synchronization of cluster configuration 360, 416
AIX 6.1.6 and clmigcheck installation 163 CLI 363
checklist 176 GUI 360
clmigcheck program 167 syslog facility 306
cluster verification 175 system event, rootvg 31
conversion 168 System Mirror 7.1
procedure 163 resource management 32
process overview 162 rootvg system event 31
repository disk and multicast IP addresses 166 System Storage Interoperation Center 50
restoration 169 SystemMirror
snapshot creation 163 agent installation 332
stopping the cluster 163 cluster and Smart Assist for DB2 implementation 139
uninstalling SystemMirror 5.5 168 configuration for Smart Assist for DB2 139
SNMP, clstat and cldump utilities 312 SystemMirror 5.5, uninstalling 168
Index 533
U
uname -L command 287
undo changes 363
undochanges command 363
undoing local changes of a configuration 370
unestablished pairs 447
Universal Replicator asynchronous pairing 439
user-defined resource type 34, 100
UUID 225
V
-v option, clmgr command 110
verbose logging level 307
verification
cluster configuration 360, 416
configuration
CLI 363
GUI 360
Hitachi TrueCopy/HUR configuration 453
verification of cluster configuration 360
VGDA, removal from disk 320
view action 243
virtual Ethernet, network planning 51
volume disk group, previous 180
volume groups 120
adding a Global Mirror pair 404, 411
adding LUN pairs 463, 469
adding new logical volume 416
already in use 320
configuration 381
consideration for installation 64
conversion during installation 64
creating 412
creation with file systems on replicated disks 447
importing in the remote site 383
importing to remote site 416
previously used 320
removal when rmcluster command does not 320
testing fallover after adding 417
Volume Groups option 86
volume, dynamically expanding 404
W
web interface 246
wildcards 110
Z
zone 211
IBM PowerHA
SystemMirror 7.1
for AIX ®
Learn how to plan for, IBM PowerHA SystemMirror 7.1 for AIX is a major product announcement
for IBM in the high availability space for IBM Power Systems Servers. This INTERNATIONAL
install, and configure
release now has a deeper integration between the IBM high availability TECHNICAL
PowerHA with the
solution and IBM AIX. It features integration with the IBM Systems SUPPORT
Cluster Aware AIX Director, SAP Smart Assist and cache support, the IBM System Storage ORGANIZATION
component DS8000 Global Mirror support, and support for Hitachi Storage.
This IBM Redbooks publication contains information about the IBM
See how to migrate to, PowerHA SystemMirror 7.1 release for AIX. This release includes
monitor, test, and fundamental changes, in particular departures from how the product has
troubleshoot been managed in the past, which has necessitated this Redbooks BUILDING TECHNICAL
PowerHA 7.1 publication. INFORMATION BASED ON
PRACTICAL EXPERIENCE
This Redbooks publication highlights the latest features of PowerHA
Explore the IBM SystemMirror 7.1 and explains how to plan for, install, and configure IBM Redbooks are developed
Systems Director PowerHA with the Cluster Aware AIX component. It also introduces you to by the IBM International
plug-in and disaster PowerHA SystemMirror Smart Assist for DB2. This book guides you Technical Support
through migration scenarios and demonstrates how to monitor, test, and Organization. Experts from
recovery troubleshoot PowerHA 7.1. In addition, it shows how to use IBM Systems IBM, Customers and Partners
Director for PowerHA 7.1 and how to install the IBM Systems Director from around the world create
Server and PowerHA SystemMirror plug-in. Plus, it explains how to timely technical information
perform disaster recovery using IBM DS8700 Global Mirror and Hitachi based on realistic scenarios.
TrueCopy and Universal Replicator. Specific recommendations
are provided to help you
This publication targets all technical professionals (consultants, IT implement IT solutions more
architects, support staff, and IT specialists) who are responsible for effectively in your
delivering and implementing high availability solutions for their enterprise. environment.