SG 247845

Front cover
IBM PowerHA
SystemMirror 7.1
for AIX
Learn how to plan for, install, and configure
PowerHA with the Cluster Aware AIX component
See how to migrate to, monitor, test,

and troubleshoot PowerHA 7.1
Explore the IBM Systems Director

plug-in and disaster recovery
Dino Quintero
Shawn Bodily
Brandon Boles
Bernhard Buehler
Rajesh Jeyapaul
SangHee Park
Minh Pham
Matthew Radford
Gus Schlachter
Stefan Velica
Fabiano Zimmermann
ibm.com/redbooks
International Technical Support Organization
IBM PowerHA SystemMirror 7.1 for AIX
March 2011
SG24-7845-00
Note: Before using this information and the product it supports, read the information in “Notices” on
page ix.
First Edition (March 2011)
This edition applies to the IBM PowerHA SystemMirror Version 7.1 and IBM AIX Version 6.1 TL6 and 7.1 as
the target.
© Copyright International Business Machines Corporation 2011. All rights reserved.

Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
The team who wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Chapter 1. PowerHA SystemMirror architecture foundation. . . . . . . . . . . . . . . . . . . . . . 1

1.1 Reliable Scalable Cluster Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Overview of the components for Reliable Scalable Cluster Technology. . . . . . . . . 2
1.1.2 Architecture changes for RSCT 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 PowerHA and RSCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Cluster Aware AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 CAA daemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.2 RSCT changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.3 The central repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.4 Cluster event management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Cluster communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.1 Communication interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.2 Communication node status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.3 Considerations for the heartbeat configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3.4 Deciding when a node is down: Round-trip time (rtt) . . . . . . . . . . . . . . . . . . . . . . 20
1.4 PowerHA 7.1 SystemMirror plug-in for IBM Systems Director . . . . . . . . . . . . . . . . . . . 21
1.4.1 Introduction to IBM Systems Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.2 Advantages of using IBM Systems Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.3 Basic architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Chapter 2. Features of PowerHA SystemMirror 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.1 Deprecated features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2 New features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Changes to the SMIT panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.1 SMIT tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.2 The smitty hacmp command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.3 The smitty clstart and smitty clstop commands. . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.4 Cluster Standard Configuration menu. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.5 Custom Cluster Configuration menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.6 Cluster Snapshot menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.7 Configure Persistent Node IP Label/Address menu . . . . . . . . . . . . . . . . . . . . . . . 31
2.4 The rootvg system event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5 Resource management enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.5.1 Start After and Stop After resource group dependencies . . . . . . . . . . . . . . . . . . . 32
2.5.2 User-defined resource type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.5.3 Dynamic node priority: Adaptive failover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.6 CLUSTER_OVERRIDE environment variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.7 CAA disk fencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.8 PowerHA SystemMirror event flow differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.8.1 Startup processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
© Copyright IBM Corp. 2011. All rights reserved. iii

2.8.2 Another node joins the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.8.3 Node down processing normal with takeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Chapter 3. Planning a cluster implementation for high availability . . . . . . . . . . . . . . . 43

3.1 Software requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.1.1 Prerequisite for AIX BOS and RSCT components . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Hardware requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.1 Supported hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.2 Requirements for the multicast IP address, SAN, and repository disk . . . . . . . . . 45
3.3 Considerations before using PowerHA 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Migration planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5.1 Shared storage for the repository disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5.2 Adapters supported for storage communication . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5.3 Multipath driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.5.4 System Storage Interoperation Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.6 Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.6.1 Multicast address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6.2 Network interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6.3 Subnetting requirements for IPAT via aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6.4 Host name and node name. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6.5 Other network considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX . . . . . . . . . . . . . . . . . . . . . . . 53

4.1 Hardware configuration of the test environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.1 SAN zoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.2 Shared storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1.3 Configuring the FC adapters for SAN-based communication . . . . . . . . . . . . . . . . 57
4.2 Installing PowerHA file sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.1 PowerHA software installation example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3 Volume group consideration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Chapter 5. Configuring a PowerHA cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.1 Cluster configuration using SMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1.1 SMIT menu changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1.2 Overview of the test environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.1.3 Typical configuration of a cluster topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.1.4 Custom configuration of the cluster topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.1.5 Configuring resources and applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.1.6 Configuring Start After and Stop After resource group dependencies . . . . . . . . . 96
5.1.7 Creating a user-defined resource type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.1.8 Configuring the dynamic node priority (adaptive failover) . . . . . . . . . . . . . . . . . . 102
5.1.9 Removing a cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2 Cluster configuration using the clmgr tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2.1 The clmgr action commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2.2 The clmgr object classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.2.3 Examples of using the clmgr command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.2.4 Using help in clmgr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.2.5 Configuring a PowerHA cluster using the clmgr command. . . . . . . . . . . . . . . . . 112
5.2.6 Alternative output formats for the clmgr command . . . . . . . . . . . . . . . . . . . . . . . 130
5.2.7 Log file of the clmgr command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.2.8 Displaying the log file content by using the clmgr command . . . . . . . . . . . . . . . 132
5.3 PowerHA SystemMirror for IBM Systems Director . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
iv IBM PowerHA SystemMirror 7.1 for AIX

Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2 . . . . . . . . . . . . . . . . . . 135
6.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.1.1 Installing the required file sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.1.2 Installing DB2 on both nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.1.3 Importing the shared volume group and file systems . . . . . . . . . . . . . . . . . . . . . 137
6.1.4 Creating the DB2 instance and database on the shared volume group . . . . . . . 137
6.1.5 Updating the /etc/services file on the secondary node . . . . . . . . . . . . . . . . . . . . 139
6.1.6 Configuring IBM PowerHA SystemMirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.2 Implementing a PowerHA SystemMirror cluster and Smart Assist for DB2 7.1 . . . . . 139
6.2.1 Preliminary steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.2.2 Starting Smart Assist for DB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.2.3 Completing the configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Chapter 7. Migrating to PowerHA 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

7.1 Considerations before migrating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.2 Understanding the PowerHA 7.1 migration process . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.2.1 Stages of migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.2.2 Premigration checking: The clmigcheck program . . . . . . . . . . . . . . . . . . . . . . . . 157
7.3 Snapshot migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.3.1 Overview of the migration process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
7.3.2 Performing a snapshot migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.3.3 Checklist for performing a snapshot migration . . . . . . . . . . . . . . . . . . . . . . . . . . 176
7.3.4 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
7.4 Rolling migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
7.4.1 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
7.4.2 Performing a rolling migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
7.4.3 Checking your newly migrated cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
7.5 Offline migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
7.5.1 Planning the offline migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
7.5.2 Offline migration flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
7.5.3 Performing an offline migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster . . . . . . . . . . . . . 201

8.1 Collecting information before a cluster is configured . . . . . . . . . . . . . . . . . . . . . . . . . 202
8.2 Collecting information after a cluster is configured . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
8.3 Collecting information after a cluster is running . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
8.3.1 AIX commands and log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
8.3.2 CAA commands and log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
8.3.3 PowerHA 7.1 cluster monitoring tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
8.3.4 PowerHA ODM classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
8.3.5 PowerHA clmgr utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
8.3.6 IBM Systems Director web interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
8.3.7 IBM Systems Director CLI (smcli interface) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Chapter 9. Testing the PowerHA 7.1 cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

9.1 Testing the SAN-based heartbeat channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
9.2 Testing the repository disk heartbeat channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
9.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
9.2.2 Testing environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
9.3 Simulation of a network failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
9.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
9.3.2 Testing environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
9.3.3 Testing a network failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
9.4 Testing the rootvg system event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
Contents v
9.4.1 The rootvg system event. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
9.4.2 Testing the loss of the rootvg volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
9.4.3 Loss of rootvg: What PowerHA logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
9.5 Simulation of a crash in the node with an active resource group . . . . . . . . . . . . . . . . 289
9.6 Simulations of CPU starvation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
9.7 Simulation of a Group Services failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
9.8 Testing a Start After resource group dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
9.8.1 Testing the standard configuration of a Start After resource group dependency 298
9.8.2 Testing application startup with Startup Monitoring configured. . . . . . . . . . . . . . 298
9.9 Testing dynamic node priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
Chapter 10. Troubleshooting PowerHA 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

10.1 Locating the log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
10.1.1 CAA log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
10.1.2 PowerHA log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
10.2 Troubleshooting the migration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
10.2.1 The clmigcheck script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
10.2.2 The ‘Cluster still stuck in migration’ condition . . . . . . . . . . . . . . . . . . . . . . . . . . 308
10.2.3 Existing non-IP networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
10.3 Troubleshooting the installation and configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 312
10.3.1 The clstat and cldump utilities and the SNMP. . . . . . . . . . . . . . . . . . . . . . . . . . 312
10.3.2 The /var/log/clcomd/clcomd.log file and the security keys . . . . . . . . . . . . . . . . 313
10.3.3 The ECM volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
10.3.4 Communication path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
10.4 Troubleshooting problems with CAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
10.4.1 Previously used repository disk for CAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
10.4.2 Repository disk replacement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
10.4.3 CAA cluster after the node restarts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
10.4.4 Creation of the CAA cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
10.4.5 Volume group name already in use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
10.4.6 Changed PVID of the repository disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
10.4.7 The ‘Cluster services are not active’ message . . . . . . . . . . . . . . . . . . . . . . . . . 323
Chapter 11. Installing IBM Systems Director and the PowerHA SystemMirror plug-in .
325
11.1 Installing IBM Systems Director Version 6.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
11.1.1 Hardware requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
11.1.2 Installing IBM Systems Director on AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
11.1.3 Configuring and activating IBM Systems Director. . . . . . . . . . . . . . . . . . . . . . . 328
11.2 Installing the SystemMirror plug-in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
11.2.1 Installing the SystemMirror server plug-in. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
11.2.2 Installing the SystemMirror agent plug-in in the cluster nodes . . . . . . . . . . . . . 330
11.3 Installing the clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
11.3.1 Installing the common agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
11.3.2 Installing the PowerHA SystemMirror agent . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
Chapter 12. Creating and managing a cluster using IBM Systems Director . . . . . . . 333
12.1 Creating a cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
12.1.1 Creating a cluster with the SystemMirror plug-in wizard . . . . . . . . . . . . . . . . . . 334
12.1.2 Creating a cluster with the SystemMirror plug-in CLI . . . . . . . . . . . . . . . . . . . . 339
12.2 Performing cluster management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
12.2.1 Performing cluster management with the SystemMirror plug-in GUI wizard. . . 341
12.2.2 Performing cluster management with the SystemMirror plug-in CLI. . . . . . . . . 347
12.3 Creating a resource group with the SystemMirror plug-in GUI wizard . . . . . . . . . . . 349
vi IBM PowerHA SystemMirror 7.1 for AIX

12.3.1 Creating a custom resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
12.3.2 Creating a predefined resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
12.3.3 Verifying the creation of a resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
12.4 Managing a resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
12.4.1 Resource group management using the SystemMirror plug-in wizard . . . . . . . 355
12.4.2 Managing a resource group with the SystemMirror plug-in CLI . . . . . . . . . . . . 359
12.5 Verifying and synchronizing a configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
12.5.1 Verifying and synchronizing a configuration with the GUI. . . . . . . . . . . . . . . . . 360
12.5.2 Verifying and synchronizing with the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
12.6 Performing cluster monitoring with the SystemMirror plug-in . . . . . . . . . . . . . . . . . . 364
12.6.1 Monitoring cluster activities before starting a cluster . . . . . . . . . . . . . . . . . . . . 364
12.6.2 Monitoring an active cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
12.6.3 Recovering from cluster configuration issues . . . . . . . . . . . . . . . . . . . . . . . . . . 369
Chapter 13. Disaster recovery using DS8700 Global Mirror . . . . . . . . . . . . . . . . . . . . 371

13.1 Planning for Global Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
13.1.1 Software prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
13.1.2 Minimum DS8700 requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
13.1.3 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
13.2 Installing the DSCLI client software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
13.3 Scenario description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
13.4 Configuring the Global Mirror resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
13.4.1 Checking the prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
13.4.2 Identifying the source and target volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
13.4.3 Configuring the Global Mirror relationships. . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
13.5 Configuring AIX volume groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
13.5.1 Configuring volume groups and file systems on primary site . . . . . . . . . . . . . . 381
13.5.2 Importing the volume groups in the remote site . . . . . . . . . . . . . . . . . . . . . . . . 383
13.6 Configuring the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
13.6.1 Configuring the cluster topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
13.6.2 Configuring cluster resources and resource group . . . . . . . . . . . . . . . . . . . . . . 388
13.7 Failover testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
13.7.1 Graceful site failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
13.7.2 Rolling site failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
13.7.3 Site re-integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
13.8 LVM administration of DS8000 Global Mirror replicated resources . . . . . . . . . . . . . 404
13.8.1 Adding a new Global Mirror pair to an existing volume group. . . . . . . . . . . . . . 404
13.8.2 Adding a Global Mirror pair into a new volume group . . . . . . . . . . . . . . . . . . . . 411
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator . . 419
14.1 Planning for TrueCopy/HUR management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
14.1.1 Software prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
14.1.2 Minimum connectivity requirements for TrueCopy/HUR . . . . . . . . . . . . . . . . . . 420
14.1.3 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
14.2 Overview of TrueCopy/HUR management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
14.2.1 Installing the Hitachi CCI software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
14.2.2 Overview of the CCI instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
14.2.3 Creating and editing the horcm.conf files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
14.3 Scenario description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
14.4 Configuring the TrueCopy/HUR resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
14.4.1 Assigning LUNs to the hosts (host groups). . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
14.4.2 Creating replicated pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
14.4.3 Configuring an AIX disk and dev_group association. . . . . . . . . . . . . . . . . . . . . 443
Contents vii
14.4.4 Defining TrueCopy/HUR managed replicated resource to PowerHA . . . . . . . . 451
14.5 Failover testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
14.5.1 Graceful site failover for the Austin site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
14.5.2 Rolling site failure of the Austin site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
14.5.3 Site re-integration for the Austin site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
14.5.4 Graceful site failover for the Miami site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
14.5.5 Rolling site failure of the Miami site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
14.5.6 Site re-integration for the Miami site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
14.6 LVM administration of TrueCopy/HUR replicated pairs. . . . . . . . . . . . . . . . . . . . . . . 463
14.6.1 Adding LUN pairs to an existing volume group . . . . . . . . . . . . . . . . . . . . . . . . . 463
14.6.2 Adding a new logical volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
14.6.3 Increasing the size of an existing file system . . . . . . . . . . . . . . . . . . . . . . . . . . 468
14.6.4 Adding a LUN pair to a new volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
Appendix A. CAA cluster commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477

The lscluster command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
The mkcluster command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
The rmcluster command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
The chcluster command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
The clusterconf command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
Appendix B. PowerHA SMIT tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
Appendix C. PowerHA supported hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491

IBM Power Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
IBM POWER5 systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
IBM POWER6 systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
IBM POWER7 Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494
IBM POWER Blade servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494
IBM storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
Fibre Channel adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
Network-attached storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
Serial-attached SCSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
SCSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
Adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
Fibre Channel adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
Ethernet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
InfiniBand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
SCSI and iSCSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
PCI bus adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
Appendix D. The clmgr man page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519

IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520
Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
viii IBM PowerHA SystemMirror 7.1 for AIX

Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs.
© Copyright IBM Corp. 2011. All rights reserved. ix

Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. These and other IBM trademarked terms are
marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US
registered or common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
AIX® Lotus® Redpaper™
DB2® Power Systems™ Redbooks (logo) ®
Domino® POWER5™ solidDB®
DS4000® POWER6® System i®
DS6000™ POWER7™ System p®
DS8000® POWER7 Systems™ System Storage®
Enterprise Storage Server® PowerHA™ Tivoli®
FileNet® PowerVM™ TotalStorage®
FlashCopy® POWER® WebSphere®
HACMP™ pureScale™ XIV®
IBM® Redbooks®
The following terms are trademarks of other companies:
Snapshot, NetApp, and the NetApp logo are trademarks or registered trademarks of NetApp, Inc. in the U.S.
and other countries.
Java, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other
countries, or both.
Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States,
other countries, or both.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
x IBM PowerHA SystemMirror 7.1 for AIX

Preface
IBM® PowerHA™ SystemMirror 7.1 for AIX® is a major product announcement for IBM in the
high availability space for IBM Power Systems™ Servers. This release now has a deeper
integration between the IBM high availability solution and IBM AIX. It features integration with
the IBM Systems Director, SAP Smart Assist and cache support, the IBM System Storage®
DS8000® Global Mirror support, and support for Hitachi storage.
This IBM Redbooks® publication contains information about the IBM PowerHA SystemMirror
7.1 release for AIX. This release includes fundamental changes, in particular departures from
how the product has been managed in the past, which has necessitated this Redbooks
publication.
This Redbooks publication highlights the latest features of PowerHA SystemMirror 7.1 and
explains how to plan for, install, and configure PowerHA with the Cluster Aware AIX
component. It also introduces you to PowerHA SystemMirror Smart Assist for DB2®. This
book guides you through migration scenarios and demonstrates how to monitor, test, and
troubleshoot PowerHA 7.1. In addition, it shows how to use IBM Systems Director for
PowerHA 7.1 and how to install the IBM Systems Director Server and PowerHA SystemMirror
plug-in. Plus, it explains how to perform disaster recovery using DS8700 Global Mirror and
Hitachi TrueCopy and Universal Replicator.
This publication targets all technical professionals (consultants, IT architects, support staff,
and IT specialists) who are responsible for delivering and implementing high availability
solutions for their enterprise.
The team who wrote this book

This book was produced by a team of specialists from around the world working at the
International Technical Support Organization (ITSO), Poughkeepsie Center.
Dino Quintero is a Project Leader and IT generalist with the ITSO in Poughkeepsie, NY. His
areas of expertise include enterprise continuous availability planning and implementation,
enterprise systems management, virtualization, and clustering solutions. He is currently an
Open Group Master Certified IT Specialist - Server Systems. Dino holds a Master of
Computing Information Systems degree and a Bachelor of Science degree in Computer
Science from Marist College.
Shawn Bodily is a Certified Consulting IT Specialist for Advanced Technical Support

Americas in Dallas, Texas. He has worked for IBM for 12 years and has 14 years of AIX
experience, with 12 years specializing in High-Availability Cluster Multi-Processing
(HACMP™). He is certified in both versions 4 and 5 of HACMP and ATE. He has written and
presented on high availability and storage. Shawn has coauthored five other Redbooks
publications.
Brandon Boles is a Development Support Specialist for PowerHA/HACMP in Austin, Texas.

He has been with IBM for four years and has been doing support, programming, and
consulting with PowerHA and HACMP for 11 years. Brandon has been working with AIX since
version 3.2.5.
© Copyright IBM Corp. 2011. All rights reserved. xi

Bernhard Buehler is an IT Specialist for IBM in Germany. He is currently working for IBM
STG Lab Services in La Gaude, France. He has worked at IBM for 29 years and has 20 years
of experience in the AIX and availability field. His areas of expertise include AIX, PowerHA,
High Availability architecture, script programming, and AIX security. Bernhard has coauthored
several Redbooks publications and several courses in the IBM AIX curriculum.
Rajesh Jeyapaul is the technical lead for IBM Systems Director Power Server management.
His focus is on improving PowerHA SystemMirror, DB2 pureScale™, and the AIX Runtime
Expert plug-in for System Director. He has worked extensively with customers and
specialized in performance analysis under the IBM System p® and AIX environment. His
areas of expertise includes IBM POWER® Virtualization, high availability, and system
management. He has coauthored DS8000 Performance Monitoring and Tuning, SG24-7146,
and Best Practices for DB2 on AIX 6.1 for POWER Systems, SG24-7821. Rajesh holds a
Master in Software Systems degree from the University of BITS, India, and a Master of
Business Administration (MBA) degree from the University of MKU, India.
SangHee Park is a Certified IT Specialist in IBM Korea. He is currently working for IBM
Global Technology Services in Maintenance and Technical Support. He has 5 years of
experience in Power Systems. His areas of expertise include AIX, PowerHA SystemMirror,
and PowerVM™ Virtualization. SangHee holds a bachelor degree in aerospace and
mechanical engineering from Korea Aerospace University.
Minh Pham is currently a Development Support Specialist for PowerHA and HACMP in
Austin, Texas. She has worked for IBM for 10 years, including 6 years in System p
microprocessor development and 4 years in AIX development support. Her areas of expertise
include core and chip logic design for System p and AIX with PowerHA. Minh holds a
Bachelor of Science degree in Electrical Engineering from the University of Texas at Austin.
Matthew Radford is a Certified AIX Support Specialist in IBM UK. He is currently working for
IBM Global Technology Services in Maintenance and Technical Support. He has worked at
IBM for 13 years and is a member of the UKI Technical Council. His areas of expertise include
AIX, and PowerHA. Matthew coauthored Personal Communications Version 4.3 for Windows
95, 98 and NT, SG24-4689. Matthew holds a Bachelor of Science degree in Information
Technology from the University of Glamorgan.
Gus Schlachter is a Development Support Specialist for PowerHA in Austin, TX. He has
worked with HACMP for over 15 years in support, development, and testing. Gus formerly
worked for CLAM/Availant and is an IBM-certified Instructor for HACMP.
Stefan Velica is an IT Specialist who is currently working for IBM Global Technologies
Services in Romania. He has five years of experience in Power Systems. He is a Certified
Specialist for IBM System p Administration, HACMP for AIX, High-end and Entry/Midrange
DS Series, and Storage Networking Solutions. His areas of expertise include IBM System
Storage, PowerVM, AIX, and PowerHA. Stefan holds a bachelor degree in electronics and
telecommunications engineering from Politechnical Institute of Bucharest.
Fabiano Zimmermann is an AIX/SAN/TSM Subject Matter Expert for Nestlé in Phoenix,

Arizona. He has been working with AIX, High Availability and System Storage since 2000. A
former IBM employee, Fabiano has experience and expertise in the areas of Linux®, DB2,
and Oracle. Fabiano is a member of the L3 team that provides worldwide support for the
major Nestlé data centers. Fabiano holds a degree in computer science from Brazil.
xii IBM PowerHA SystemMirror 7.1 for AIX

Front row from left to right: Minh Pham, SangHee Park, Stefan Velica, Brandon Boles, and Fabiano
Zimmermann; back row from left to right: Gus Schlachter, Dino Quintero (project leader), Bernhard
Buehler, Shawn Bodily, Matt Radford, and Rajesh Jeyapaul
Thanks to the following people for their contributions to this project:
Bob Allison
Catherine Anderson
Chuck Coleman
Bill Martin
Darin Meyer
Keith O'Toole
Ashutosh Rai
Hitachi Data Systems
David Bennin
Ella Buslovich
Richard Conway
Octavian Lascu
ITSO, Poughkeepsie Center
Patrick Buah
Michael Coffey
Mark Gurevich
Felipe Knop
Paul Moyer
Skip Russell
Stephen Tovcimak
IBM Poughkeepsie
Eric Fried
Frank Garcia
Kam Lee
Gary Lowther
Deb McLemore
Ravi A. Shankar
Preface xiii
Stephen Tee
Tom Weaver
David Zysk
IBM Austin
Nick Fernholz
Steven Finnes
Susan Jasinski
Robert G. Kovacs
William E. (Bill) Miller
Rohit Krishna Prasad
Ted Sullivan
IBM USA
Philippe Hermes
IBM France
Manohar R Bodke
Jes Kiran
Anantoju Srinivas
IBM India
Claudio Marcantoni
IBM Italy
Now you can become a published author, too!

Here's an opportunity to spotlight your skills, grow your career, and become a published
author—all at the same time! Join an ITSO residency project and help write a book in your
area of expertise, while honing your experience using leading-edge technologies. Your efforts
will help to increase product acceptance and customer satisfaction, as you expand your
network of technical contacts and relationships. Residencies run from two to six weeks in
length, and you can participate either in person or as a remote resident working from your
home base.
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
Send your comments in an email to:
redbooks@us.ibm.com
xiv IBM PowerHA SystemMirror 7.1 for AIX

Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
Stay connected to IBM Redbooks

Find us on Facebook:
http://www.facebook.com/IBMRedbooks
Follow us on Twitter:
http://twitter.com/ibmredbooks
Look for us on LinkedIn:
http://www.linkedin.com/groups?home=&gid=2130806
Explore new Redbooks publications, residencies, and workshops with the IBM Redbooks
weekly newsletter:
https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm
Stay current on recent Redbooks publications with RSS Feeds:
http://www.redbooks.ibm.com/rss.html
Preface xv
xvi IBM PowerHA SystemMirror 7.1 for AIX
1
Chapter 1. PowerHA SystemMirror

architecture foundation
This chapter provides information about the new architecture of the IBM PowerHA
SystemMirror 7.1 for AIX. It includes the differences from previous versions.
This chapter includes the following topics:

Reliable Scalable Cluster Technology
Cluster Aware AIX
Cluster communication
PowerHA 7.1 SystemMirror plug-in for IBM Systems Director
For an introduction to high availability and IBM PowerHA SystemMirror 7.1, see the “IBM
PowerHA SystemMirror for AIX” page at:
http://www.ibm.com/systems/power/software/availability/aix/index.html
© Copyright IBM Corp. 2011. All rights reserved. 1

1.1 Reliable Scalable Cluster Technology
Reliable Scalable Cluster Technology (RSCT) is a set of software components that together
provide a comprehensive clustering environment for AIX, Linux, Solaris, and Microsoft®
Windows®. RSCT is the infrastructure used by various IBM products to provide clusters with
improved system availability, scalability, and ease of use.
This section provides an overview of RSCT, its components, and the communication paths
between these components. Several helpful IBM manuals, white papers, and Redbooks
publications are available about RSCT. This section focuses on the components that affect
PowerHA SystemMirror.
To find the most current documentation for RSCT, see the RSCT library in the IBM Cluster
Information Center at:
http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.
cluster.rsct.doc%2Frsctbooks.html
1.1.1 Overview of the components for Reliable Scalable Cluster Technology

RSCT has the following main components:
Topology Services
This component provides node and network failure detection.
Group Services
This component provides cross-node or process coordination on some cluster
configurations. For a detailed description about how Group Services work, see IBM
Reliable Scalable Cluster Technology: Group Services Programming Guide, SA22-7888,
at:
http://publibfp.boulder.ibm.com/epubs/pdf/a2278889.pdf
RSCT cluster security services
This component provides the security infrastructure that enables RSCT components to
authenticate the identity of other parties.
Resource Monitoring and Control (RMC) subsystem
This subsystem is the scalable, reliable backbone of RSCT. It runs on a single machine or
on each node (operating system image) of a cluster. Also, it provides a common
abstraction for the resources of the individual system or the cluster of nodes. You can use
RMC for single system monitoring or for monitoring nodes in a cluster. However, in a
cluster, RMC provides global access to subsystems and resources throughout the cluster.
Therefore, it provides a single monitoring and management infrastructure for clusters.
Resource managers
A resource manager is a software layer between a resource (a hardware or software entity
that provides services to some other component) and RMC. A resource manager maps
programmatic abstractions in RMC into the actual calls and commands of a resource.
For a more detailed description of the RSCT components, see the IBM Reliable Scalable
Cluster Technology: Administration Guide, SA22-7889, at the following web address:
http://publibfp.boulder.ibm.com/epubs/pdf/22788919.pdf
2 IBM PowerHA SystemMirror 7.1 for AIX

1.1.2 Architecture changes for RSCT 3.1
RSCT version 3.1 is the first version that supports Cluster Aware AIX (CAA). Although this
section provides a high-level introduction to the RSCT architecture changes to support, you
can find more details about CAA in 1.2, “Cluster Aware AIX” on page 7.
As shown in Figure 1-1 on page 3, RSCT 3.1 can operate without CAA in “non-CAA” mode.
You use the non-CAA mode if you use one of the following products:
PowerHA versions before PowerHA 7.1
A mixed cluster with PowerHA 7.1 and older PowerHA versions
Existing RSCT Peer Domains (RPD) that were created before RSCT 3.1 was installed
A new RPD, when you specify during creation that the system must not use or create a
CAA cluster
Figure 1-1 shows both modes in which RSCT 3.1 can be used (with or without CAA). The left
part shows the non-CAA mode, which is equal to the older RSCT versions. The right part
shows the CAA-based mode. The difference between these modes is that Topology Services
has been replaced with CAA.
Important: On a given node, use only one RSCT version at a time.
RSCT without CAA RSCT with CAA

RSCT RSCT
Monitoring and Control

Monitoring and Control
Resource
Resource
Resource Group Resource Group

Manager Services Manager Services
(grpsvcs) (cthags)
Topology
Services
CAA
AIX AIX
Figure 1-1 RSCT 3.1
RSCT 3.1 is available for both AIX 6.1 and AIX 7.1. To use CAA, for RSCT 3.1 on AIX 6.1, you
must have TL 6 or later installed.
CAA on AIX 6.1 TL 6: The use of CAA on AIX 6.1 TL 6 is enabled only for PowerHA 7.1.
Chapter 1. PowerHA SystemMirror architecture foundation 3

Figure 1-2 shows a high-level architectural view of how IBM high availability (HA) applications
PowerHA, IBM Tivoli® System Automation for Multiplatforms, and Virtual I/O Server (VIOS)
Clustered Storage use the RSCT and CAA architecture.
Figure 1-2 HA applications using the RSCT and CAA architecture

1.1.3 PowerHA and RSCT
Figure 1-3 shows the non-CAA communication paths between PowerHA and RSCT.
Non-CAA mode is still used when you have a PowerHA version 6.1 or earlier, even if you are
using AIX 7.1.
The main communication goes from PowerHA to Group Services (grpsvcs), then to Topology
Services (topsvcs), and back to PowerHA. The communication path from PowerHA to RMC is
used for PowerHA Process Application Monitors. Another case where PowerHA uses RMC is
when a resource group is configured with the Dynamic Node Priority policy.
Figure 1-3 PowerHA using RSCT without CAA

Figure 1-4 shows the new CAA-based communication paths of PowerHA, RSCT, and CAA.
You use this architecture when you have PowerHA v7.1 or later. It is the same architecture for
AIX 6.1 TL 6 and AIX 7.1 or later. As in the previous architecture, the main communication
goes from PowerHA to Group Services. However, in Figure 1-4, Group Services
communicates with CAA.
Figure 1-4 RSCT with Cluster Aware AIX (CAA)
Example 1-1 lists the cluster processes on a running PowerHA 7.1 cluster.
Group Services subsystem name: Group Services now uses the subsystem name
cthags, which replaces grpsvcs. Group Services is now started with a different control
script (cthags) and in turn from a different subsystem name cthags.
Example 1-1 Output of lssrc

# lssrc -a | egrep "rsct|ha|svcs|caa|cluster" | grep -v _rm
cld caa 4980920 active
clcomd caa 4915400 active
clconfd caa 5243070 active
cthags cthags 4456672 active
ctrmc rsct 5767356 active
clstrmgrES cluster 10813688 active
solidhac caa 10420288 active
solid caa 5832836 active
clevmgrdES cluster 5177370 active
clinfoES cluster 11337972 active
ctcas rsct inoperative
topsvcs topsvcs inoperative

grpsvcs grpsvcs inoperative
grpglsm grpsvcs inoperative
emsvcs emsvcs inoperative
emaixos emsvcs inoperative
1.2 Cluster Aware AIX

Cluster Aware AIX introduces fundamental clustering capabilities into the base operating
system AIX. Such capabilities include the creation and definition of the set of nodes that
comprise the cluster. CAA provides the tools and monitoring capabilities for the detection of
node and interface health.
File sets: CAA is provided by the non-PowerHA file sets bos.cluster.rte, bos.ahafs, and
bos.cluster.solid. The file sets are on the AIX Install Media or in the TL6 of AIX 6.1.
More information: For more information about CAA, see Cluster Management,
SC23-6779, and the IBM AIX Version 7.1 Differences Guide, SG24-7910.
CAA provides a set of tools and APIs to enable clustering on the AIX operating system. CAA
does not provide the application monitoring and resource failover capabilities that PowerHA
provides. PowerHA uses the CAA capabilities. Other applications and software programs can
use the APIs and command-line interfaces (CLIs) that CAA provides to make their
applications and services “Cluster Aware” on the AIX operating system.
Figure 1-2 on page 4 illustrates how applications can use CAA. The following products and
parties can use CAA technology:
RSCT (3.1 and later)
PowerHA (7.1 and later)
VIOS (CAA support in a future release)
Third-party ISVs, service providers, and software products
CAA provides the following features among others:

Central repository
– Configuration
– Security
Quorumless (CAA does not require a quorum to be up and operational.)
Monitoring capabilities for custom actions
Fencing aids
– Network
– Storage
– Applications
The following sections explain the concepts of the CAA central repository, RSCT changes,
and how PowerHA 7.1 uses CAA.

1.2.1 CAA daemons
When CAA is active in your cluster, you notice the daemon services running as shown in
Figure 1-5.
chile:/ # lssrc -g caa

Subsystem Group PID Status
Figure 1-5 CAA services
CAA includes the following services:

clcomd This daemon is the cluster communications daemon, which has changed in
PowerHA 7.1. In previous versions of PowerHA, it was called clcomdES. The
location of the rhosts file that PowerHA uses has also changed. The rhosts file
used by the clcomd service is in the /etc/cluster/rhosts directory. The old
clcomdES rhosts file in the /usr/es/sbin/cluster/etc directory is not used.
cld The cld daemon runs on each node and determines whether the local node
must be the primary or the secondary solidDB® database server.
solid The solid subsystem provides the database engine, and solidHAC is used for
high availability of the IBM solidDB database. Both run on the primary and the
secondary database servers.
In a two-node cluster, the primary database is mounted on node 1

(/clrepos_private1), and the secondary database is mounted on node 2
(/clrepos_private2). These nodes have the solid and solidHAC subsystems
running.
In a three-node cluster configuration, the third node acts as a standby for the
other two nodes. The solid subsystem (solid and solidHAC) is not running,
and the file systems (/clrepos_private1 and /clrepos_private2) are not
mounted.
If a failure occurs on the primary or secondary nodes of the cluster, the third
node activates the solid subsystem. It mounts either the primary or secondary
file system, depending on the node that has failed. See 1.2.3, “The central
repository” on page 9, for information about file systems.
clconfd The clconfd subsystem runs on each node of the cluster. The clconfd daemon
wakes up every 10 minutes to synchronize any necessary cluster changes.
1.2.2 RSCT changes

IBM PowerHA now uses CAA, instead of RSCT, to handle the cluster topology, including
heartbeating, configuration information, and live notification events. PowerHA still
communicates with RSCT Group Services (grpsvcs replaced by cthags), but PowerHA has
replaced the topsvcs function with the new CAA function. CAA reports the status of the

topology to cthags, by using Autonomic Health Advisory File System API (AHAFS) events,
which are fed up to cthagsrhosts.
For information about the RSCT changes, see 1.1.2, “Architecture changes for RSCT 3.1” on
page 3.
1.2.3 The central repository

A major part of CAA is the central repository. The central repository is stored on a dedicated
storage area network (SAN) disk that is shared between all participating nodes. This
repository contains the following structures:
Bootstrap repository (BSR)
LV1, LV2, LV3 (private LVs)
solidDB (primary location (/clrepos_private1) and secondary location
(/clrepos_private2))
CAA repository disk: The CAA repository disk is reserved for use by CAA only. Do not
attempt to change any of it. The information in this chapter is provided for information only
to help you understand the purpose of the new disk and file system structure.
Figure 1-6 shows an overview of the CAA repository disk and its structure.
Figure 1-6 Cluster repository disk structure
If you installed and configured PowerHA 7.1, your cluster repository disk is displayed as
varied on (active) in lspv output as shown in Figure 1-7 on page 10. In this figure, the disk
label has changed to caa_private0 to remind you that this disk is for private use by CAA only.
Figure 1-7 on page 10 also shows a volume group, called caavg_private, which must always
be varied on (active) when CAA is running. CAA is activated when PowerHA 7.1 is installed

and configured. If you are performing a migration or have an earlier level of PowerHA
installed, CAA is not active.
If you have a configured cluster and find that caavg_private is not varied on (active), your
CAA cluster has a potential problem. See Chapter 10, “Troubleshooting PowerHA 7.1” on
page 305, for guidance about recovery in this situation.
chile:/ # lspv
hdisk1 000fe4114cf8d1ce None
caa_private0 000fe40163c54011 caavg_private active
hdisk3 000fe4114cf8d2ec None
hdisk4 000fe4114cf8d3a1 diskhb
hdisk5 000fe4114cf8d441 None
hdisk6 000fe4114cf8d4d5 None
hdisk8 000fe4114cf8d608 ny_datavg
hdisk0 000fe40140a5516a rootvg active
Figure 1-7 lspv command showing the caa_private repository disk
You can view the structure of caavg_private from the standpoint of a Logical Volume
Manager (LVM) as shown in Figure 1-8. The lsvg command shows the structure of the file
system.
chile:/ # lsvg -l caavg_private

caavg_private:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
caalv_private1 boot 1 1 1 closed/syncd N/A
caalv_private3 boot 4 4 1 open/syncd N/A
fslv00 jfs2 4 4 1 open/syncd
/clrepos_private1
fslv01 jfs2 4 4 1 closed/syncd
/clrepos_private2
powerha_crlv boot 1 1 1 closed/syncd N/A
Figure 1-8 The lsvg output of CAA
This file system has a special reserved structure. CAA mounts some file systems for its own
use as shown in Figure 1-9 on page 11. The fslv00 file system contains the solidDB
database mounted as /clrepos_private1 because the node is the primary node of the
cluster. If you look at the output for the second node, you might have /clrepos_private2
mounted instead of /clrepos_private1. See 1.2.1, “CAA daemons” on page 8, for an
explanation of the solid subsystem.
Important: CAA creates a file system for solidDB on the default lv name (fslv00, fslv01).
If you have a default name of lv for existing file systems that is outside of CAA, ensure that
both nodes have the same lv names. For example, if node A has the names fslv00,
fslv01, and fslv02, node B must have the same names. You must not have any default lv
names in your cluster nodes so that CAA can use fslv00, fslv01 for the solidDB.
Also a /aha, which is a special pseudo file system, is mounted in memory and used by the
AHAFS. See “Autonomic Health Advisor File System” on page 11 for more information.

Important: Do not interfere with this volume group and its file systems. For example,
forcing a umount of /aha on a working cluster causes the node to halt.
For more information about CAA, see Cluster Management, SC23-6779, at the following web
address:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.clusteraware/c
lusteraware_pdf.pdf
1.2.4 Cluster event management

With PowerHA 7.1, event management is handled by using a new pseudo file-system
architecture called the Autonomic Health Advisor File System. With this pseudo file system,
application programming interfaces (APIs) can program the monitoring of events by reading
and writing events to the file system.
Autonomic Health Advisor File System

The AHAFS is part of the AIX event infrastructure for AIX and AIX clusters and is what CAA
uses as its monitoring framework. The AHAFS file system is automatically mounted when you
create the cluster (Figure 1-9).
chile:/ # mount
node mounted mounted over vfs date options
-------- --------------- --------------- ------ ------------ ---------------
/dev/hd4 / jfs2 Sep 30 13:37 rw,log=/dev/hd8
/dev/hd2 /usr jfs2 Sep 30 13:37 rw,log=/dev/hd8
/dev/hd9var /var jfs2 Sep 30 13:37 rw,log=/dev/hd8
/dev/hd3 /tmp jfs2 Sep 30 13:37 rw,log=/dev/hd8
/dev/hd1 /home jfs2 Sep 30 13:38 rw,log=/dev/hd8
/dev/hd11admin /admin jfs2 Sep 30 13:38 rw,log=/dev/hd8
/proc /proc procfs Sep 30 13:38 rw
/dev/hd10opt /opt jfs2 Sep 30 13:38 rw,log=/dev/hd8
/dev/livedump /var/adm/ras/livedump jfs2 Sep 30 13:38
rw,log=/dev/hd8
/aha /aha ahafs Sep 30 13:46 rw
/dev/fslv00 /clrepos_private1 jfs2 Sep 30 13:52
rw,dio,log=INLINE
Figure 1-9 AHAFS file system mounted
Event handling entails the following process:

1. Create a monitor file based on the /aha directory.
2. Write the required information to the monitor file to represent the wait type (either a
select() call or a blocking read() call). Indicate when to trigger the event, such as a state
change of node down.
3. Wait in a select() call or a blocking read() call.
4. Read from the monitor file to obtain the event data. The event data is then fed to Group
Services.
The event information is retrieved from CAA, and any changes are communicated by using
AHAFS events. RSCT Group Services uses the AHAFS services to obtain events on the

cluster. This information is provided by cluster query APIs and is fed to Group Services.
Figure 1-10 shows a list of event monitor directories.
drwxrwxrwt 1 root system 0 Oct 1 17:04 linkedCl.monFactory

drwxrwxrwt 1 root system 1 Oct 1 17:04 networkAdapterState.monFactory
drwxrwxrwt 1 root system 1 Oct 1 17:04 nodeAddress.monFactory
drwxrwxrwt 1 root system 0 Oct 1 17:04 nodeContact.monFactory
drwxrwxrwt 1 root system 1 Oct 1 17:04 nodeList.monFactory
drwxrwxrwt 1 root system 1 Oct 1 17:04 nodeState.monFactory
chile:/aha/cluster #
Figure 1-10 Directory listing of /aha/cluster
The AHAFS files used in RSCT

The following AHAFS event files are used in RSCT:
Node state, such as NODE_UP or NODE_DOWN
/aha/cluster/nodeState.monFactory/nodeStateEvent.mon
Node configuration, such as node added or deleted
/aha/cluster/nodeList.monFactory/nodeListEvent.mon
Adapter state, such as ADAPTER_UP or ADAPTER_DOWN and interfaces added or deleted
/aha/cluster/networkAdapterState.monFactory/networkAdapterStateEvent.mon
Adapter configuration
/aha/cluster/nodeAddress.monFactory/nodeAddressEvent.mon
Process exit (Group Services daemon), such as PROCESS_DOWN
/aha/cpu/processMon.monFactory/usr/sbin/rsct/bin/hagsd.mon
Example of a NODE_DOWN event

A NODE_DOWN event is written to the nodeStateEvent.mon file in the nodeState.monFactory
directory. A NODE_DOWN event from the nodeStateEvent.mon file is interpreted as “a given node
has failed.” In this situation, the High Availability Topology Services (HATS) API generates an
Hb_Death event on the node group.
Example of a network ADAPTER_DOWN event

If a network adapter failure occurs, an ADAPTER_DOWN event is generated in the
networkAdapterStateEvent.mon file. This event is interpreted as “a given network interface
has failed.” In this situation, the HATS API generates an Hb_Death event on the adapter group.
Example of Group Services daemon failure

When you get a PROCESS_DOWN event because of a failure in Group Services, the event is
generated in the hagsd.mon file. This event is treated as a NODE_DOWN event, which is similar to
pre-CAA behavior. No PROCESS_UP event exists because, when the new Group Services
daemon is started, it broadcasts a message to peer daemons.
Filtering duplicated or invalid events

AHAFS handles duplicate or invalid events. For example, if a NODE_DOWN event is generated for
a node that is already marked as down, the event is ignored. The same applies for “up” events
and adapter events. Node events for local nodes are also ignored.

1.3 Cluster communication
Cluster Aware AIX indicates which nodes are in the cluster and provides information about
these nodes including their state. A special “gossip” protocol is used over the multicast
address to determine node information and implement scalable reliable multicast. No
traditional heartbeat mechanism is employed. Gossip packets travel over all interfaces. The
communication interfaces can be traditional networking interfaces (such as an Ethernet) and
storage fabrics (SANs with Fibre Channel, SAS, and so on). The cluster repository disk can
also be used as a communication device.
Gossip protocol: The gossip protocol determines the node configuration and then
transmits the gossip packets over all available networking and storage communication
interfaces. If no storage communication interfaces are configured, only the traditional
networking interfaces are used. For more information, see “Cluster Aware concepts” at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.clusterawar
e/claware_concepts.htm
1.3.1 Communication interfaces

The highly available cluster has several communication mechanisms. This section explains
the following interface concepts:
IP network interfaces
SAN-based communication (SFWCOM) interface
Central cluster repository-based communication (DPCOM) interface
Output of the lscluster -i command
The RESTRICTED and AIX_CONTROLLED interface state
Point of contact
IP network interfaces
IBM PowerHA communicates over available IP interfaces using a multicast address. PowerHA
use all IP interfaces that are configured with an address and are in an UP state as long as
they are reachable across the cluster.
PowerHA SystemMirror management interfaces: PowerHA SystemMirror and Cluster

Aware for AIX use all network interfaces that are available for cluster communication. All of
these interfaces are discovered by default and are used for health management and other
cluster communication. You can use the PowerHA SystemMirror management interfaces to
remove any interface that you do not want to be used for application availability. For
additional information, see “Cluster communication” topic in the AIX 7.1 Information Center
at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.
clusteraware/claware_comm_benifits.htm
Cluster communication requires the use of a multicast IP address. You can specify this
address when you create the cluster, or you can have one generated automatically when you
synchronize the initial cluster configuration.

Cluster topology configuration on sydney node: The following PowerHA cluster
topology is configured from by using smitty sysmirror on the sydney node:
NODE perth:
Network ether01
perthb2 192.168.201.136
perth 192.168.101.136
NODE sydney:
Network ether01
sydneyb2 192.168.201.135
sydney 192.168.101.135
A default multicast address of 228.168.101.135 is generated for the cluster. PowerHA

takes the IP address of the node and changes its most significant part to 228 as shown in
the following example:
x.y.z.t -> 228.y.z.t
An overlap of the multicast addresses might be generated by default in the case of two
clusters with interfaces in the same virtual LAN (VLAN). This occurs when their IP
addresses are similar to the following example:
x1.y.z.t
x2.y.z.t
The netmon.cf configuration file is not required with CAA and PowerHA 7.1.
The range 224.0.0.0–224.0.0.255 is reserved for local purposes, such as administrative and
maintenance tasks. The data that they receive is never forwarded by multicast routers.
Similarly, the range 239.0.0.0–239.255.255.255 is reserved for administrative scooping.
These special multicast groups are regularly published in the assigned numbers RFC.1
If multicast traffic is present in the adjacent network, you must ask the network administrator
for multicast IP address allocation for your cluster. Also, ensure that the multicast traffic
generated by any of the cluster nodes is properly forwarded by the network infrastructure
toward the other cluster nodes. The Internet Group Management Protocol (IGMP) must be
enabled.
Interface states
Network interfaces can have any of the following common states. You can see the interface
state in the output of the lscluster -i command, as shown in Example 1-2 on page 16.
UP The interface is up and active.
STALE The interface configuration data is stale, which happens when
communication has been lost, but was previously up at some point.
DOWN SOURCE HARDWARE RECEIVE / SOURCE HARDWARE TRANSMIT
The interface is down because of a failure to receive or transmit, which
can happen in the event of a cabling problem.
DOWN SOURCE SOFTWARE
The interface is down in AIX software only.
SAN-based communication (SFWCOM) interface

Redundant high-speed communication channels can be established between the hosts
through the SAN fabric. To use this communication path, you must complete additional setup
1 http://tools.ietf.org/html/rfc3171

for the Fibre Channel (FC) adapters. Configure the server FC ports in the same zone of the
SAN fabric, and set their Target Mode Enable (tme) attribute to yes. Then enable the dynamic
tracking and fast fail. The SAS adapters do not require special setup. Based on this setup, the
CAA Storage Framework provides a SAN-based heartbeat. This heartbeat is an effective
replacement for all the non-IP heartbeat mechanisms used in earlier releases.
Enabling SAN fiber communication: To enable SAN fiber communication for cluster
communication, you must configure the Target Mode Enable attribute for FC adapters. See
Example 4-4 on page 57 for details.
Configure your cluster in an environment that supports SAN fabric-based communication.

This approach provides another channel of redundancy to help reduce the risk of getting a
partitioned (split) cluster.
The Virtual SCSI (VSCSI) SAN heartbeat depends on VIOS 2.2.0.11-FP24 SP01.
Interface state
The SAN-based communication (SFWCOM) interface has one state available, the UP state.
The UP state indicates that the SFWCOM interface is active. You can see the interface state
in the output of the lscluster -i command as shown in Example 1-2 on page 16.
Unavailable SAN fiber communication: When SAN fiber communication is unavailable,

the SFWCOM section is not listed in the output of the lscluster -i command. A DOWN
state is not shown.
Central cluster repository-based communication (DPCOM) interface

Heartbeating and other cluster messaging are also achieved through the central repository
disk. The repository disk is used as another redundant path of communication between the
nodes. A portion of the repository disk is reserved for node-to-node heartbeat and message
communication. This form of communication is used when all other forms of communication
have failed. The CAA Storage Framework provides a heartbeat through the repository disk,
which is only used when IP or SAN heartbeating no longer works.
When the underlying hardware infrastructure is available, you can proceed with the PowerHA
cluster topology configuration. The heartbeat starts right after the first successful “Verify and
Synchronization” operation, when the CAA cluster is created and activated by the PowerHA.
Interface states
The Central cluster repository-based communication (DPCOM) interface has the following
available states. You can see the interface state in the output of the lscluster -i command,
which is shown in Example 1-2.
UP AIX_CONTROLLED
Indicates that the interface is UP, but under AIX control. The user
cannot change the status of this interface.
UP RESTRICTED AIXCONTROLLED
Indicates that the interface is UP and under AIX system control, but is
RESTRICTED from monitoring mode.
STALE The interface configuration data is stale. This state occurs when
communication is lost, but was up previously at some point.

Output of the lscluster -i command
Example 1-2 shows the output from the lscluster -i command. The output shows the
interfaces and the interface states as explained in the previous sections.
Example 1-2 The lscluster -i output for one node

lscluster -i
Network/Storage Interface Query
Cluster Name: au_cl

Cluster uuid: d77ac57e-cc1b-11df-92a4-00145ec5bf9a
Number of nodes reporting = 2
Number of nodes expected = 2
Node sydney
Node uuid = f6a81944-cbce-11df-87b6-00145ec5bf9a
Number of interfaces discovered = 4
Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9a
Smoothed rrt across interface = 8
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 110 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255 netmas
k 255.255.252.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netm
ask 0.0.0.0
Mac address = 0.14.5e.c5.bf.9b
Interface state UP
k 255.255.252.0
1
ask 0.0.0.0
Interface number 3 sfwcom
Mac address = 0.0.0.0.0.0

ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
Interface state UP RESTRICTED AIX_CONTROLLED
Node perth
Node uuid = 15bef17c-cbcf-11df-951c-00145e5e3182
Mac address = 0.14.5e.e7.25.d9
Interface state UP
k 255.255.252.0
1
ask 0.0.0.0
Interface state UP
k 255.255.252.0
1
ask 0.0.0.0

Interface state UP
The RESTRICTED and AIX_CONTROLLED interface state

When the network and storage interfaces in the cluster are active and available, the cluster
repository disk appears as restricted and controlled by AIX. (The restricted term identifies the
disk as “not currently used.”) In the output from the lscluster commands, the term dpcom is
used for the cluster repository disk as a communication device and is initially noted as UP
RESTRICTED AIX_CONTROLLED.
When the system determines that the node has lost the normal network or storage interfaces,
the system activates (unrestrict) the cluster repository disk interface (dpcom) and begins
using it for communications. At this point, the interface state changes to UP AIX_CONTROLLED
(unrestricted, but still system controlled).
Point of contact
The output of the lscluster -m command shows a reference to a point of contact as shown in
Example 1-3 on page 19. The local node is displayed as N/A, and the remote node is
displayed as en0 UP. CAA monitors the state and points of contact between the nodes for both
communication interfaces.
A point of contact indicates that a node has received a packet from the other node over the
interface. The point-of-contact status UP indicates that the packet flow is continuing. The
point-of-contact monitor tracks the number of UP points of contact for each communication
interface on the node. If this count reaches zero, the interface is marked as reachable through
the cluster repository disk only.
1.3.2 Communication node status

The node communication status is indicated by the State of Node value in the lscluster -m
command output (Example 1-3 on page 19). The cluster node can have the following
communication states:
UP Indicates that the node is up.
UP NODE_LOCAL Indicates that the node is up and is the local node in the cluster.
UP NODE_LOCAL REACHABLE THROUGH REPOS DISK ONLY
Indicates that the local node is up, but that it is reachable through the
repository disk only.

When a node can only communicate by using the cluster repository
disk, the output from the lscluster command notes it as REACHABLE
THROUGH REPOS DISK ONLY.
When the normal network or storage interfaces become available
again, the system automatically detects the restoration of
communication interfaces, and again places dpcom in the restricted
state. See “The RESTRICTED and AIX_CONTROLLED interface
state” on page 18.
UP REACHABLE THROUGH REPOS DISK ONLY
Indicates that the local node is up. It is reachable through the
repository disk only, but not through a local node.
DOWN Indicates that the node is down. If the node does not have access to
the cluster repository disk, the node is marked as down.
Example 1-3 The lscluster -m output

lscluster -m
Calling node query for all nodes
Node query number of nodes examined: 2
Node name: chile

Cluster shorthand id for node: 1
uuid for node: 7067c3fa-ca95-11df-869b-a2e310452004
State of node: UP NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
newyork local 5f2f5d38-cd78-11df-b986-a2e310452003
Number of points_of_contact for node: 0

Point-of-contact interface & contact state
n/a
------------------------------
Node name: serbia

uuid for node: 8a5e2768-ca95-11df-8775-a2e312537404
State of node: UP
newyork local 5f2f5d38-cd78-11df-b986-a2e310452003

en0 UP

Interface up, point of contact down: This phrase means that an interface might be up but
a point-of-contact might be down. In this state, no packets are received from the other
node.
1.3.3 Considerations for the heartbeat configuration

In previous versions of PowerHA, the heartbeat configuration was necessary to configure a
non-IP heartbeat configuration, such as disk-based heartbeating. PowerHA 7.1 no longer
supports disk-based heartbeat monitoring. CAA uses all available interfaces to perform
heartbeat monitoring, including the Repository Disk-based and SAN fiber heartbeat
monitoring methods. Both types of heartbeat monitoring are replacements for the previous
non-IP heartbeat configuration. The cluster also performs heartbeat monitoring, similar to
how it use to perform it, across all available network interfaces.
Heartbeat monitoring is performed by sending and receiving gossip packets across the
network with the multicast protocol. CAA uses heartbeat monitoring to determine
communication problems that need to be reflected in the cluster information.
1.3.4 Deciding when a node is down: Round-trip time (rtt)

CAA monitors the interfaces of each node by using the multicast protocol and gossip packets.
Gossip packets are periodically sent from each node in the cluster for timing purposes. These
gossip packets are automatically replied to by the other nodes of the cluster. The packet
exchanges are used to calculate the round-trip time.
The round-trip time value is shown in the output of the lscluster -i and lscluster -m
commands. The mean deviation in network rtt is the average round-trip time, which is
automatically managed by CAA. Unlike previous versions of PowerHA and HACMP, no
heartbeat tuning is necessary. See Example 1-2 on page 16 and Figure 1-11 for more
information.
Smoothed rtt to node:7

Figure 1-11 Extract from the lscluster -m command output showing the rtt values
Statistical projections are directly employed to compute node-down events. By using normal
network dropped packet rates and the projected round-trip times with mean deviations, the
cluster can determine when a packet was lost or not sent. Each node monitors the time when
a response is due from other nodes in the cluster. If a node finds that a node is overdue, a
node down protocol is initiated in the cluster to determine if the node is down or if network
isolation has occurred.
This algorithm is self-adjusting to load and network conditions, providing a highly reliable and
scalable cluster. Expected round-trip times and variances rise quickly when load conditions
cause delays. Such delays cause the system to wait longer before setting a node down state.
Such a state provides for a high probability of valid state information. (Quantitative
probabilities of errors can be computed.) Conversely, expected round-trip times and variances
fall quickly when delays return to normal.
The cluster automatically adjusts to variances in latency and bandwidth characteristics of

various network and storage interfaces.

1.4 PowerHA 7.1 SystemMirror plug-in for IBM Systems
Director
PowerHA SystemMirror provides a plug-in to IBM Systems Director, giving you a graphical
user interface to manage a cluster. This topic includes the following sections:
Introduction to IBM Systems Director
Advantages of using IBM Systems Director
Basic architecture
1.4.1 Introduction to IBM Systems Director

IBM Systems Director provides systems management personnel with a
single-point-of-control, helping to reduce IT management complexity and cost. With IBM
Systems Director, IT personnel can perform the following tasks:
Optimize computing and network resources
Quickly respond to business requirements with greater delivery flexibility
Attain higher levels of services management with streamlined management of physical,
virtual, storage, and network resources
A key feature of IBM Systems Director is a consistent user interface with a focus on driving
common management tasks. IBM Systems Director provides a unified view of the total IT
environment, including servers, storage, and network. With this view, users can perform tasks
with a single tool, IBM Systems Director.
1.4.2 Advantages of using IBM Systems Director

IBM Systems Director offers the following advantages:
A single, centralized view into all PowerHA SystemMirror clusters
– Centralized and secure access point
Everyone logs in to the same machine, simplifying security and providing an audit trail
of user activities.
– Single sign-on (SSO) capability
After the initial setup is done, using standard Director mechanisms, the password of
each individual node being managed no longer must be provided. Customers log in to
the Director server by using their account on that one machine only and have access to
all PowerHA clusters under management by that server.
Two highly accessible interfaces
– Graphical interface
The GUI helps to explain and show relationships. It also guides customers through the
learning phase, improving their chances of success with Systems Director.
• Instant and nearly instant help for just about everything
• Maximum, interactive assistance with many tasks
• Maximum error checking
• SystemMirror enterprise health summary

– Textual interface
As with all IBM Systems Director plug-ins, the textual interface (also known as the CLI)
is available through the smcli utility of IBM Systems Director. The namespace (which is
not needed) is sysmirror, for example smcli sysmirror help.
• Maximum speed
• Centralized, cross-cluster scripting
A common, IBM unified interface (learn once, manage many)
More IBM products are now plugging into Systems Director. Although each individual
plug-in is different, the common framework around each on remains the same, reducing
the education burden of customers. Another benefit is in the synergies that might be used
by having multiple products all sharing a common data store on the IBM Systems Director
server.
To learn more about the advantages of IBM Systems Director, see the PowerHA 7.1
presentation by Peter Schenke at:
http://www-05.ibm.com/ch/events/systems/pdf/6_PowerHA_7_1_News.pdf
1.4.3 Basic architecture

Figure 1-12 shows the basic architecture of IBM Systems Director for PowerHA. IBM Systems
Director is used to quickly and easily scan subnets to find and load AIX systems. When these
systems are unlocked (when the login ID and password are provided), and if PowerHA is
installed on any of these systems, they are automatically discovered and loaded by the
plug-ins.
Three-tier architecture provides scalability:

User Interface Management Server Director Agent
User Interface
Director Agent Web-based interface
Automatically installed on AIX 7.1 & AIX V6.1 TL06 Command-line
interface
AIX
P D Director
PowerHA
Agent
P D
P D
P D
Secure communication
P D
P D Director Server
Central point of control
Supported on AIX, Linux,
P D Discovery of clusters and Windows
and resources Agent manager
Figure 1-12 Basic architecture of IBM Systems Director for PowerHA

2
Chapter 2. Features of PowerHA

SystemMirror 7.1
This chapter explains which previously supported features of PowerHA SystemMirror have
been removed. It also provides information about the new features in PowerHA SystemMirror
Standard Edition 7.1 for AIX.

Deprecated features
New features
Changes to the SMIT panel
The rootvg system event
Resource management enhancements
CLUSTER_OVERRIDE environment variable
CAA disk fencing
PowerHA SystemMirror event flow differences

2.1 Deprecated features
PowerHA SystemMirror 7.1 has removed support for the following previously available
features:
IP address takeover (IPAT) via IP replacement
Locally administered address (LAA) for hardware MAC address takeover (HWAT)
Heartbeat over IP aliases
clcomdES with the /usr/es/sbin/cluster/etc/rhosts directory is replaced by the Cluster
Aware AIX (CAA) clcomd with the /etc/cluster/rhosts directory
The following IP network types:
– ATM
– FDDI
– Token Ring
The following point-to-point (non-IP) network types:
– RS232
– TMSCSI
– TMSSA
– Disk heartbeat (diskhb)
– Multi-node disk heartbeat (mndhb)
Two-node configuration assistant
WebSMIT (replaced with the IBM Systems Director plug-in)
Site support in this version
– Cross-site Logical Volume Manager (LVM) mirroring (available in PowerHA 7.1 SP3)
IPV6 support in this version
IP address takeover via IP aliasing is now the only supported IPAT option. SAN heartbeat,
provided by the CAA repository disk, and FC heartbeat, as described in the following section,
have replaced all point-to-point (non-IP) network types.
2.2 New features

The new version of PowerHA uses much simpler heartbeat management. This method uses
multicasting, which reduces the burden on the customer to define aliases for heartbeat
monitoring. By default, it supports dual communication paths for most data center
deployments by using both the IP network and the SAN connections (available in 7.1 SP3 and
later). These communication paths are done through the CAA and the central repository disk.
PowerHA SystemMirror 7.1 introduces the following features:

SMIT panel enhancements
The rootvg system event
Systems Director plug-in
Resource management enhancements
– StartAfter
– StopAfter
User-defined resource type

Dynamic node priority: Adaptive failover
Additional disk fencing by CAA
New Smart Assists for the following products:
– SAP NetWeaver 7.0 (2004s) SR3
– IBM FileNet® 4.5.1
– IBM Tivoli Storage Manager 6.1
– IBM Lotus® Domino® Server
– SAP MaxDB v7.6 and 7.7
The clmgr tool
The clmgr tool is the new command-line user interface (CLI) with which an administrator
can use a uniform interface to deploy and maintain clusters. For more information, see 5.2,
“Cluster configuration using the clmgr tool” on page 104.
2.3 Changes to the SMIT panel

PowerHA SystemMirror 7.1 includes several changes to the SMIT panel since the release of
PowerHA 6.1. This topic focuses on the most used items on the panel and not the technical
changes behind these items. These changes can help experienced system administrators to
quickly find the paths to the functions they need to implement in their new clusters.
In PowerHA SystemMirror 7.1, the SMIT panel has the following key changes:
Separation of menus by function
Addition of the Custom Cluster Configuration menu
Removal of Extended Distance menus from the base product
Removal of unsupported dialogs or menus
Changes to some terminology
New dialog for specifying repository and cluster IP address
Many changes in topology and resource menus
2.3.1 SMIT tree

The SMIT tree offers several changes that make it easier for system administrators to find the
task they want to perform. For an overview of these changes, see Appendix B, “PowerHA
SMIT tree” on page 483. To access a list of the SMIT tree and available fast paths, use the
smitty path: smitty hacmp  Can't find what you are looking for ?.
Chapter 2. Features of PowerHA SystemMirror 7.1 25

2.3.2 The smitty hacmp command
Figure 2-1 shows the SMIT screens that you see when you use the smitty hacmp command
or the path: smitty  Communications Applications and Services  PowerHA
SystemMirror. It compares PowerHA 5.5, PowerHA 6.1, and PowerHA SystemMirror 7.1.
Figure 2-1 The screens shown after running the smitty hacmp command
In PowerHA SystemMirror 7.1, the smitty sysmirror (or smit sysmirror) command provides
a new fast path to the PowerHA start menu in SMIT. The old fast path (smitty hacmp) is still
valid.

Figure 2-2 shows, in more detail, where some of the main functions moved to. Minor changes
have been made to the following paths, which are not covered in this Redbooks publication:
System Management (C-SPOC)
Problem Determination Tools
Can’t find what you are looking for ?
Not sure where to start ?
The “Initialization and Standard Configuration” path has been split into two paths: Cluster
Nodes and Networks and Cluster Applications and Resources. For more details about these
paths, see 2.3.4, “Cluster Standard Configuration menu” on page 29. Some features for the
Extended Configuration menu have moved to the Custom Cluster Configuration menu. For
more details about custom configuration, see 2.3.5, “Custom Cluster Configuration menu” on
page 30.
smitty sysmirror
Figure 2-2 PowerHA SMIT start panel

2.3.3 The smitty clstart and smitty clstop commands
The SMIT screens to start and stop a cluster did not change, and the fast path is still the
same. Figure 2-3 shows the Start Cluster Services panels for PowerHA versions 5.5, 6.1, and
7.1.
Although the SMIT path did not change, some of the wording has changed. For example, the
word “HACMP” was replaced with “Cluster Services.” The path with the new wording is smitty
hacmp  System Management (C-SPOC)  PowerHA SystemMirror Services, and then
you select either the “Start Cluster Services” or “Stop Cluster Services” menu.
Figure 2-3 The screens that are shown when running the smitty clstart command

2.3.4 Cluster Standard Configuration menu
In previous versions, the “Cluster Standard Configuration” menu was called the “Initialization
and Standard Configuration” menu. This menu is now split into the following menu options as
indicated in Figure 2-2 on page 27:
Cluster Nodes and Networks
Cluster Applications and Resources
This version has a more logical flow. The topology configuration and management part is in
the “Cluster Nodes and Networks” menu. The resources configuration and management part
is in the “Cluster Applications and Resources” menu.
Figure 2-4 shows some tasks and where they have moved to. The dotted line shows where
Smart Assist was relocated. The Two-Node Cluster Configuration Assistant no longer exists.
Figure 2-4 Cluster standard configuration

2.3.5 Custom Cluster Configuration menu
The “Custom Cluster Configuration” menu is similar to the “Extended Configuration” menu in
the previous release. Unlike the “Extended Configuration” menu, which contains entries that
were duplicated from the standard menu path, the “Custom Cluster Configuration” menu in
PowerHA SystemMirror 7.1 does not contain these duplicate entries. Figure 2-5 shows an
overview of where some of the functions have moved to. The Custom Cluster Configuration
menu is shown in the upper-right corner, and the main PowerHA SMIT menu is shown in the
lower-right corner.
Figure 2-5 Custom Cluster Configuration menu

2.3.6 Cluster Snapshot menu
The content of the Cluster Snapshot menu did not change compared to PowerHA 6.1
(Figure 2-6). However, the path to this menu has changed to smitty sysmirror  Cluster
Nodes and Networks  Manage the Cluster  Snapshot Configuration.
Snapshot Configuration
Move cursor to desired item and press Enter.
Create a Snapshot of the Cluster Configuration

Change/Show a Snapshot of the Cluster Configuration
Remove a Snapshot of the Cluster Configuration
Restore the Cluster Configuration From a Snapshot
Configure a Custom Snapshot Method
Figure 2-6 Snapshot Configuration menu
2.3.7 Configure Persistent Node IP Label/Address menu

The content of the SMIT panel to add or change a persistent IP address did not change
compared to PowerHA 6.1 (Figure 2-7). However, the path to it changed to smitty hacmp 
Cluster Nodes and Networks  Manage Nodes  Configure Persistent Node IP
Label/Addresses.
Configure Persistent Node IP Label/Addresses
Add a Persistent Node IP Label/Address

Change/Show a Persistent Node IP Label/Address
Remove a Persistent Node IP Label/Address
Figure 2-7 Configure Persistent Node IP Label/Addresses menu
2.4 The rootvg system event

PowerHA SystemMirror 7.1 introduces system events. These events are handled by a new
subsystem called clevmgrdES. The rootvg system event allows for the monitoring of loss of
access to the rootvg volume group. By default, in the case of loss of access, the event logs an
entry in the system error log and reboots the system. If required, you can change this option
in the SMIT menu to log only an event entry and not to reboot the system. For further details
about this event and a test example, see 9.4.1, “The rootvg system event” on page 286.

2.5 Resource management enhancements
PowerHA SystemMirror 7.1 offers the following new resource and resource group
configuration choices. They provide more flexibility in administering resource groups across
the various nodes in the cluster.
Start After and Stop After resource group dependencies
User-defined resource type
Adaptive failover
2.5.1 Start After and Stop After resource group dependencies

The previous version of PowerHA has the following types of resource group dependency
runtime policies:
Parent-child
Online on the Same Node
Online on Different Nodes
Online On Same Site Location
These policies are insufficient for supporting some complex applications. For example, the
FileNet application server must be started only after its associated database is started. It
does not need to be stopped if the database is brought down for some time and then started.
The following dependencies have been added to PowerHA:

Start After dependency
Stop After dependency
The Start After and Stop After dependencies use source and target resource group
terminology. The source resource group depends on the target resource group as shown in
Figure 2-8.
db_rg
Target
Start After
Source
app_rg
Figure 2-8 Start After resource group dependency

For Start After dependency, the target resource group must be online on any node in the
cluster before a source (dependent) resource group can be activated on a node. Resource
groups can be released in parallel and without any dependency.
Similarly, for Stop After dependency, the target resource group must be offline on any node in
the cluster before a source (dependent) resource group can be brought offline on a node.
Resource groups are acquired in parallel and without any dependency.
A resource group can serve as both a target and a source resource group, depending on
which end of a given dependency link it is placed. You can specify three levels of
dependencies for resource groups. You cannot specify circular dependencies between
resource groups.
A Start After dependency applies only at the time of resource group acquisition. During a
resource group release, these resource groups do not have any dependencies. A Start After
source resource group cannot be acquired on a node until its target resource group is fully
functional. If the target resource group does not become fully functional, the source resource
group goes into an OFFLINE DUE TO TARGET OFFLINE state. If you notice that a resource group
is in this state, you might need to troubleshoot which resources need to be brought online
manually to resolve the resource group dependency.
When a resource group in a Start After target role falls over from one node to another, the
resource groups that depend on it are unaffected.
After the Start After source resource group is online, any operation (such as bring offline or
move resource group) on the target resource group does not affect the source resource
group. A manual resource group move or bring resource group online on the source resource
group is not allowed if the target resource group is offline.
A Stop After dependency applies only at the time of a resource group release. During
resource group acquisition, these resource groups have no dependency between then. A
Stop After source resource group cannot be released on a node until its target resource group
is offline.
When a resource group in a Stop After source role falls over from one node to another, its
related target resource group is released as a first step. Then the source (dependent)
resource group is released. Next, both resource groups are acquired in parallel, assuming
that no start after or tparent-child dependency exists between these resource groups.
A manual resource group move or bring resource group offline on the Stop After source
resource group is not allowed if the target resource group is online.
Summary: In summary, the source Start After and Stop After target resource groups have
the following dependencies:
Source Start After target: The source is brought online after the target resource group.
Source Stop After target: The source is brought offline after the target resource group.

A parent-child dependency can be seen as being composed of two parts with the newly
introduced Start After and Stop After dependencies. Figure 2-9 shows this logical
equivalence.
Figure 2-9 Comparing Start After, Stop After, and parent-child resource group (rg) dependencies
If you configure a Start After dependency between two resource groups in your cluster, the
applications in these resource groups are started in the configured sequence. To ensure that
this process goes smoothly, configure application monitors and use a Startup Monitoring
mode for the application included in the target resource group.
For a configuration example, see 5.1.6, “Configuring Start After and Stop After resource
group dependencies” on page 96.
2.5.2 User-defined resource type

With PowerHA, you can add your own resource types and specify management scripts to
configure how and where PowerHA processes the resource type. You can then configure a
user-defined resource instance for use in a resource group.
A user-defined resource type is one that you can define for a customized resource that you
can add to a resource group. A user-defined resource type contains several attributes that
describe the properties of the instances of the resource type.
When you create a user-defined resource type, you must choose processing order among
existing resource types. PowerHA SystemMirror processes the user-defined resources at the
beginning of the resource acquisition order if you choose the FIRST value. If you chose any
other value, for example, VOLUME_GROUP, the user-defined resources are acquired after varying
on the volume groups. Then they are released before varying off the volume groups. These
resources are existing resource types. You can choose from a pick list in the SMIT menu.

Figure 2-10 shows the existing resource type and acquisition or release order. A user-defined
resource can be any of the following types:
FIRST
WPAR
SERVICEIP
TAPE (DISKS)
VOLUME_GROUP
FILE_SYSTEM
APPLICATION
DISK
RSCT with CAA

Release Order
Acquisition Order
FILE SYSTEM
Userdefined
Resource
SERVICE IP
APPLICATION
Figure 2-10 Processing order of the resource type
2.5.3 Dynamic node priority: Adaptive failover

The framework for dynamic node priority is already present in the previous versions of
PowerHA. This framework determines the takeover node at the time of a failure according to
one of the following policies:
cl_highest_free_mem
cl_highest_idle_cpu
cl_lowest_disk_busy
The cluster manager queries the Resource Monitoring and Control (RMC) subsystem every
3 minutes to obtain the current value of these attributes on each node. Then the cluster
manager distributes them cluster-wide. For an architecture overview of PowerHA and RSCT,
see 1.1.3, “PowerHA and RSCT” on page 5.

The dynamic node priority feature is enhanced in PowerHA SystemMirror 7.1 to support the
following policies:
cl_lowest_nonzero_udscript_rc
cl_highest_udscript_rc
The return code of a user-defined script is used in determining the destination node.
When you select one of the criteria, you must also provide values for the DNP script path and
DNP timeout attributes for a resource group. PowerHA executes the supplied script and
collects the return codes from all nodes. If you choose the cl_highest_udscript_rc policy,
collected values are sorted. The node that returned the highest value is selected as a
candidate node to fall over. Similarly, if you choose the cl_lowest_nonzero_udscript_rc
policy, collected values are sorted. The node that returned lowest nonzero positive value is
selected as a candidate takeover node. If the return value of the script from all nodes is the
same or zero, the default node priority is considered. PowerHA verifies the script existence
and the execution permissions during verification.
Time-out value: When you select a time-out value, ensure that it is within the time period
for running and completing a script. If you do not specify a time-out value, a default value
equal to the config_too_long time is specified.
For information about configuring the dynamic node priority, see 5.1.8, “Configuring the
dynamic node priority (adaptive failover)” on page 102.
2.6 CLUSTER_OVERRIDE environment variable

In PowerHA SystemMirror 7.1, the use of several AIX commands on cluster resources can
potentially impair the integrity of the cluster configuration. PowerHA SystemMirror 7.1
provides C-SPOC versions of these functions, which are safer to use in the cluster
environment. You can avoid this usage by using the commands outside of C-SPOC. By
default, it is set to allow the use of these commands outside of C-SPOC.
To restrict people from using these commands in the command line, you can change the
default value from yes to no:
1. Locate the following line in the /etc/environment file:
CLUSTER_OVERRIDE=yes
2. Change the line to the following line:
CLUSTER_OVERRIDE=no
The following commands are affected by this variable:

chfs
crfs
chgroup
chlv
chpasswd
chuser
chvg
extendlv
extendvg
importvg
mirrorvg

mkgroup
mklv
mklvcopy
mkuser
mkvg
reducevg
If the CLUSTER_OVERRIDE variable has the value no, you see an error message similar to the
one shown in Example 2-1.
Example 2-1 Error message when using CLUSTER_OVERRIDE=no

# chfs -a size=+1 /home
The command must be issued using C-SPOC or the override environment variable must
be set.
In this case, use the equivalent C-SPOC CLI called cli_chfs. See the C-SPOC man page for
more details.
Deleting the CLUSTER_OVERRIDE variable: You also see the message shown in
Example 2-1 if you delete the CLUSTER_OVERRIDE variable in your /etc/environment file.
2.7 CAA disk fencing

CAA introduces another level of disk fencing beyond what PowerHA and gsclvmd provide by
using enhanced concurrent volume groups (ECVGs). In previous releases of PowerHA when
using ECVGs in a fast disk takeover mode, the volume groups are in full read/write (active)
mode on the node owning the resource group. Any standby candidate node has the volume
group varied on in read only (passive) mode.
The passive state allows only read access to a volume group special file and the first 4 KB of
a logical volume. Write access through standard LVM is not allowed. However, low-level
commands, such as dd, can bypass LVM and write directly to the disk.
The new CAA disk fencing feature prevents writes from any other nodes to the disk device,
invalidating the potential for a lower-level operation, such as dd, to succeed. However, any
system that has access to that disk might be a member of the CAA cluster. Therefore, its still
important to zone the storage appropriately so that only cluster nodes have the disks
configured.
The PowerHA SystemMirror 7.1 announcement letter explains this fencing feature as a
storage framework that is embedded in the operating system to aid in storage device
management. As part of the framework, fencing disks or disk groups are supported. Fencing
shuts off write access to the shared disks from any entity on the node (irrespective of the
privileges associated with the entity trying to access the disk). Fencing is exploited by
PowerHA SystemMirror to implement strict controls in regard to shared disks and their access
solely from one the nodes that is sharing the disk. Fencing ensures that, when the workload
moves to another node for continuing operations, access to the disks on the departing node is
turned off for write operations.

2.8 PowerHA SystemMirror event flow differences
The event flow process occurs when the PowerHA SystemMirror cluster starts.
2.8.1 Startup processing

In this example, a resource group must be started on a node. The start server is not done
until the necessary resources are acquired. Figure 2-11 illustrates the necessary steps to
move the acquired resource groups during a node failure.
1)rg_move_acquire
r
es
Start
Cluste
process_resources (NONE)
lls
servic
for each RG:

ca process_resources (ACQUIRE)
RC process_resources (SERVICE_LABELS)
acquire_service_addr
acquire_aconn_service en0 net_ether_01
clstrmgrES process_resources (DISKS)
process_resources (VGS)
Event process_resources (LOGREDO)
cal process_resources (FILESYSTEMS)
Manager ls process_resources (SYNC_VGS)
RC process_resources (TELINIT)
< Event Summary >
2) rg_move_complete
for each RG: process resources (APPLICATIONS)
start_server app01
process_resources (ONLINE)
< Event Summary >
Figure 2-11 First node starting the cluster services
TE_RG_MOVE_ACQUIRE is the SystemMirror event listed in the debug file. The

/usr/es/sbin/cluster/events/rg_online.rp recovery program is listed in the HACMP rules
Object Data Manager (ODM) file (Example 2-2).
Example 2-2 The rg_online.rp file

all "rg_move_fence" 0 NULL
barrier
#
all "rg_move_acquire" 0 NULL
#
barrier
#
all "rg_move_complete" 0 NULL
The following section explains what happens when a subsequent node joins the cluster.

2.8.2 Another node joins the cluster
When another node starts, it must first join the cluster. If a resource group needs to fall back,
then rg_move_release is done. If the resource group fallback is not needed, the
rg_move_release is skipped. The numbers indicate the order of the steps. The same number
means that parallel processing is taking place. Example 2-3 shows the messages on the
process flow.
Example 2-3 Debug file showing the process of another node joining the cluster
Debug file:
[TE_JOIN_NODE_DEP] r
[TE_RG_MOVE_ACQUIRE]
[TE_JOIN_NODE_DEP_COMPLETE]i
cluster.log file node1:

Nov 23 00:35:06 AIX: EVENT START: node_up node2
Nov 23 00:35:06 AIX: EVENT COMPLETED: node_up node2 0
Nov 23 00:35:11 AIX: EVENT START: rg_move_fence node1 2
Nov 23 00:35:11 AIX: EVENT COMPLETED: rg_move_fence node1 2 0
Nov 23 00:35:11 AIX: EVENT START: rg_move_acquire node1 2
Nov 23 00:35:11 AIX: EVENT START: rg_move node1 2 ACQUIRE
Nov 23 00:35:11 AIX: EVENT COMPLETED: rg_move node1 2 ACQUIRE 0
Nov 23 00:35:11 AIX: EVENT COMPLETED: rg_move_acquire node1 2 0
Nov 23 00:35:15 AIX: EVENT START: rg_move_complete node1 2
Nov 23 00:35:15 AIX: EVENT COMPLETED: rg_move_complete node1 2 0
Nov 23 00:35:18 AIX: EVENT START: node_up_complete node2
Nov 23 00:35:18 AIX: EVENT COMPLETED: node_up_complete node2 0
cluster.log file node2

Nov 23 00:35:06 AIX: EVENT START: node_up node2
Nov 23 00:35:08 AIX: EVENT COMPLETED: node_up node2 0
Nov 23 00:35:11 AIX: EVENT START: acquire_service_addr
Nov 23 00:35:13 AIX: EVENT START: acquire_aconn_service en2 appsvc_
Nov 23 00:35:13 AIX: EVENT COMPLETED: acquire_aconn_service en2 app
Nov 23 00:35:13 AIX: EVENT COMPLETED: acquire_service_addr 0
Nov 23 00:35:15 AIX: EVENT START: start_server appBctrl
Nov 23 00:35:16 AIX: EVENT COMPLETED: start_server appBctrl 0
Nov 23 00:35:18 AIX: EVENT START: node_up_complete node2
Nov 23 00:35:18 AIX: EVENT COMPLETED: node_up_complete node2 0

Figure 2-12 shows the process flow when another node joins the cluster.
g
n nin
ru t
ar
clstrmgrES clstrmgrES St ster s
u e
Event Messages Event Cl rvic
Manager Manager ca s e
ll 1) rg_move_release
ca C
1) rg_move_release
ll
R
R
C
nothing
fallback to higher node
call
2) rg_move_acquire
(see node leaves slide)
ca
RC
Same sequence as
ll
2) rg_move_acquire node 1 up (previous visual)
RC
Nothing
3) rg_move_complete
3)rg_move_complete for each RG:
nothing process resources (APPLICATIONS)
start_server app02
process_resources (ONLINE)
If no fallback, rg_move_release is not done process_resources (NONE)
< Event Summary >
Figure 2-12 Another node joining the cluster
The next section explains what happens when a node leaves the cluster voluntarily.

2.8.3 Node down processing normal with takeover
In this example, a resource group is on the departing node and must be moved to one of the
remaining nodes.
Node failure
The situation is slightly different if the node on the right fails suddenly. Because a node is not
in a position to run any events, the calls to process_resources listed under the right node are
not run as shown in Figure 2-13.
ning p
run Sto ter
s
Clu vices
clstrmgrES clstrmgrES
ca
Event Event ll ser 1) rg_move_release
Manager Messages Manager
for each RG:
RC
ca C
process_resources (RELEASE)
ll
1) rg_move_release
R
process_resources (APPLICATIONS)
stop_server app02
nothing process_resources (FILESYSTEMS)
ca
process_resources (VGS)
ll
ca
2) rg_move_acquire
RC
process_resources (SERVICE_LABELS)
ll
RC
for each RG: release_service_addr

service address < Event Summary >
disks
2) rg_move_acquire
nothing
3) rg_move_complete
3) rg_move_complete
start server
Figure 2-13 Node leaving the cluster (stopped)
Example 2-4 shows details about the process flow from the clstrmgr.debug file.
Example 2-4 clstrmgr.debug file

clstrmgr.debug file:
[TE_FAIL_NODE_DEP]
[TE_RG_MOVE_RELEASE]
[TE_RG_MOVE_ACQUIRE]
[TE_FAIL_NODE_DEP_COMPLETE]
cluster.log file node1
Nov 23 06:24:21 AIX: EVENT COMPLETED: rg_move node1 1 RELEASE 0
Nov 23 06:24:21 AIX: EVENT COMPLETED: rg_move_release node1 1 0
Nov 23 06:24:35 AIX: EVENT START: acquire_service_addr
Nov 23 06:24:36 AIX: EVENT START: acquire_aconn_service en2 appsvc_

Nov 23 06:24:36 AIX: EVENT COMPLETED: acquire_aconn_service en2 app
Nov 23 06:24:36 AIX: EVENT COMPLETED: acquire_service_addr 0
Nov 23 06:24:36 AIX: EVENT START: acquire_takeover_addr
Nov 23 06:24:38 AIX: EVENT COMPLETED: acquire_takeover_addr 0
Nov 23 06:24:41 AIX: EVENT START: start_server appActrl
Nov 23 06:24:42 AIX: EVENT START: start_server appBctrl
Nov 23 06:24:42 AIX: EVENT COMPLETED: start_server appBctrl 0
Nov 23 06:24:49 AIX: EVENT COMPLETED: start_server appActrl 0
Nov 23 06:24:51 AIX: EVENT START: node_down_complete node2
Nov 23 06:24:51 AIX: EVENT COMPLETED: node_down_complete node2 0
cluster.log node2
Nov 23 06:24:21 AIX: EVENT START: rg_move_release node1 1
Nov 23 06:24:21 AIX: EVENT START: rg_move node1 1 RELEASE
Nov 23 06:24:21 AIX: EVENT START: stop_server appActrl
Nov 23 06:24:21 AIX: EVENT START: stop_server appBctrl
Nov 23 06:24:22 AIX: EVENT COMPLETED: stop_server appBctrl 0
Nov 23 06:24:24 AIX: EVENT COMPLETED: stop_server appActrl 0
Nov 23 06:24:27 AIX: EVENT START: release_service_addr
Nov 23 06:24:28 AIX: EVENT COMPLETED: release_service_addr 0
Nov 23 06:24:29 AIX: EVENT START: release_takeover_addr
Nov 23 06:24:30 AIX: EVENT COMPLETED: release_takeover_addr 0
Nov 23 06:24:30 AIX: EVENT COMPLETED: rg_move node1 1 RELEASE 0
Nov 23 06:24:30 AIX: EVENT COMPLETED: rg_move_release node1 1 0
Nov 23 06:24:51 AIX: EVENT START: node_down_complete node2
Nov 23 06:24:52 AIX: EVENT COMPLETED: node_down_complete node2 0

3
Chapter 3. Planning a cluster

implementation for high
availability
This chapter provides guidance for planning a cluster implementation for high availability with
IBM PowerHA SystemMirror 7.1 for AIX. It explains the software, hardware, and storage
requirements with a focus on PowerHA 7.1.
For more details about planning, consider the following publications:

PowerHA for AIX Cookbook, SG24-7739
PowerHA SystemMirror Version 7.1 for AIX Planning Guide, SC23-6758-01

Software requirements
Hardware requirements
Considerations before using PowerHA 7.1
Migration planning
Storage
Network

3.1 Software requirements
Because PowerHA 7.1 for AIX uses Cluster Aware AIX (CAA) functionality, the following
minimum versions of AIX and Reliable Scalable Cluster Technology (RSCT) are required:
AIX 6.1 TL6 or AIX 7.1
RSCT 3.1
CAA cluster: PowerHA SystemMirror creates the CAA cluster automatically. You do not
manage the CAA configuration or state directly. You can use the cluster commands to view
the CAA status directly.
Download and install the latest service packs for AIX and PowerHA from IBM Fix Central at:
http://www.ibm.com/support/fixcentral
3.1.1 Prerequisite for AIX BOS and RSCT components

The following Base Operating System (BOS) components for AIX are required for PowerHA:
bos.adt.lib
bos.adt.libm
bos.adt.syscalls
bos.ahafs
bos.clvm.enh
bos.cluster
bos.data
bos.net.tcp.client
bos.net.tcp.server
bos.rte.SRC
bos.rte.libc
bos.rte.libcfg
bos.rte.libcur
bos.rte.libpthreads
bos.rte.lvm
bos.rte.odm
cas.agent (required for the IBM Systems Director plug-in)
The following file sets on the AIX base media are required:
rsct.basic.rte
rsct.compat.basic.hacmp
rsct.compat.clients.hacmp
The appropriate versions of RSCT for the supported AIX releases are also supplied with the
PowerHA installation media.
3.2 Hardware requirements

The nodes of your cluster can be hosted on any hardware system on which installation of AIX
6.1 TL6 or AIX 7.1 is supported. They can be hosted as a full system partition or inside a
logical partition (LPAR).

The right design methodology can help eliminate network and disk single points of failure
(SPOF) by using redundant configurations. Have at least two network adapters connected to
different Ethernet switches in the same virtual LAN (VLAN). EtherChannel is supported with
PowerHA. Employ dual-fabric SAN connections to the storage subsystems using at least two
Fibre Channel (FC) adapters and appropriate multipath drivers. Use Redundant Array of
Independent Disks (RAID) technology to protect data from any disk failure.
This topic describes the hardware that is supported.
3.2.1 Supported hardware

Your hardware, including the firmware and the AIX multipath driver, must be in a supported
configuration. For more information about hardware, see Appendix C, “PowerHA supported
hardware” on page 491.
More information: For a list of the supported FC adapters, see “Setting up cluster storage
communication” in the AIX 7.1 Information Center at:
clusteraware/claware_comm_setup.htm
See the readme files that are provided with the base PowerHA file sets and the latest service
pack. See also the PowerHA SystemMirror 7.1 for AIX Standard Edition Information Center
at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.doc/doc/base/
powerha.htm
The nodes of your cluster can be any system on which the installation of AIX 6.1 TL6 or
AIX 7.1 is supported, either as a full system partition or as a logical partition (LPAR).
Design methodologies can help eliminate network and disk single points of failure (SPOF) by
using redundant configurations. Use at least two network adapters connected to different
Ethernet switches in the same virtual LAN (VLAN). (PowerHA also supports the use of
EtherChannel.) Similarly, use dual-fabric storage area network (SAN) connections to the
storage subsystems with at least two Fibre Channel (FC) adapters and appropriate multipath
drivers. Also use Redundant Array of Independent Disks (RAID) technology to protect data
from any disk failure.
3.2.2 Requirements for the multicast IP address, SAN, and repository disk
Cluster communication requires the use of a multicast IP address. You can specify this
address when you create the cluster, or you can have one be generated automatically. The
ranges 224.0.0.0–224.0.0.255 and 239.0.0.0–239.255.255.255 are reserved for
administrative and maintenance purposes. If multicast traffic is present in the adjacent
network, you must ask the network administrator for a multicast IP address allocation. Also,
ensure that the multicast traffic that is generated by each of the cluster nodes is properly
forwarded by the network infrastructure to any other cluster node.
If you use SAN-based heartbeat, you must have zoning setup to ensure connectivity between
host FC adapters. You also must activate the Target Mode Enabled parameter on the involved
FC adapters.
Hardware redundancy at the storage subsystem level is mandatory for the Cluster Repository
disk. Logical Volume Manager (LVM) mirroring of the repository disk is not supported. The disk
Chapter 3. Planning a cluster implementation for high availability 45

must be at least 1 GB in size and not exceed 10 GB. For more information about supported
hardware for the cluster repository disk, see 3.5.1, “Shared storage for the repository disk” on
page 48.
CAA support: Currently CAA only supports the repository disk Fibre Channel or SAS
disks as described in the “Cluster communication” topic in the AIX 7.1 Information Center
at:
clusteraware/claware_comm_benifits.htm
3.3 Considerations before using PowerHA 7.1

You must be aware of the following considerations before planning to use PowerHA 7.1:
You cannot change the host name in a configured cluster.
After the cluster is synchronized, you are unable to change the host name of any of the
cluster nodes. Therefore, changing the host name is not supported.
You cannot change the cluster name in a configured cluster.
After the cluster is synchronized, you are unable to change the name of the cluster. If you
want to change the cluster name, you must completely remove and recreate the cluster.
You cannot change the repository location or cluster IP address in a configured cluster.
After the cluster is synchronized, you are unable to change the repository disk or cluster
multicast IP address. To change the repository disk or the cluster multicast IP address,
you must completely remove and recreate the cluster.
No IPV6 support is available, which is a restriction from the CAA implementation.
3.4 Migration planning

Before migrating your cluster, you must be aware of the following considerations:
The required software
– AIX
– Virtual I/O Server (VIOS)
Multicast address
Repository disk
FC heartbeat support
All non-IP networks support removed
– RS232
– TMSCSI
– TMSSA
– Disk heartbeat (DISKHB)
– Multinode disk heartbeat (MNDHB)
IP networks support removed
– Asynchronous transfer mode (ATM)
– Fiber Distributed Data Interface (FDDI)
– Token ring

IP Address Takeover (IPAT) via replacement support removed
Heartbeat over alias support removed
Site support not available in this version
IPV6 support not available in this version
You can migrate from High-Availability Cluster Multi-Processing (HACMP) or PowerHA

versions 5.4.1, 5.5, and 6.1 only. If you are running a version earlier than HACMP 5.4.1, you
must upgrade to a newer version first.
TL6: AIX must be at a minimum version of AIX 6.1 TL6 (6.1.6.0) on all nodes before
migration. Use of AIX 6.1 TL6 SP2 or later is preferred.
Most migration scenarios require a two-part upgrade. First, you migrate AIX to the minimum
version of AIX 6.1 TL6 on all nodes. You must reboot each node after upgrading AIX. Second,
you migrate to PowerHA 7.1 by using the offline, rolling, or snapshot scenario as explained in
Chapter 7, “Migrating to PowerHA 7.1” on page 151.
In addition, keep in mind the following considerations:

Multicast address
A multicast address is required for communication between the nodes (used by CAA).
During the migration, you can specify this address or allow CAA to automatically generate
one for you.
Discuss the multicast address with your network administrator to ensure that such
addresses are allowed on your network. Consider firewalls and routers that might not have
this support enabled.
CAA repository disk
A shared disk that is zoned in and available to all nodes in the cluster is required. This disk
is reserved for use by CAA only.
VIOS support
You can configure a PowerHA 7.1 cluster on LPARs that are using resources provided by a
VIOS. However, the support of your CAA repository disk has restrictions.
Support for vSCSI: CAA repository disk support for virtual SCSI (vSCSI) is officially
introduced in AIX 6.1 TL6 SP2 and AIX 7.1 SP2. You can create a vSCSI disk
repository at AIX 6.1 TL6 base levels, but not at SP1. Alternatively, direct SAN
connection logical unit numbers (LUNs) or N_Port ID Virtualization (NPIV) LUNs are
supported with all versions.
SAN heartbeat support

One of the new features of PowerHA 7.1 is the ability to use the SAN fabric for another
communications route between hosts. This feature is implemented through CAA and
replaces Non-IP support in previous versions.
Adapters for SAN heartbeat: This feature requires 4 GB or 8 GB adapters, which

must be direct attach or virtualized. If the adapters are virtualized as vSCSI through
VIOS or by using NPIV, VIOS 2.2.0.11-FP24 SP01 is required.

Heartbeat support for non-IP configurations (such as disk heartbeat)
Disk-based heartbeat, MNDHB, RS232, TMSCSI, and TMSSA are no longer supported
configurations with PowerHA 7.1. When you migrate, be aware that you cannot keep these
configurations. When the migration is completed, these definitions are removed from the
Object Data Manager (ODM).
As an alternative, PowerHA 7.1 uses SAN-based heartbeat, which is configured
automatically when you migrate.
Removal of existing network hardware support
FDDI, ATM, and token ring are no longer supported. You must remove this hardware
before you begin the migration.
IPAT via IP replacement
IPAT via IP replacement for address takeover is no longer supported. You must remove
this configuration before you begin the migration.
Heartbeat over aliases
Configurations using heartbeat over aliases are no longer supported. You must remove
this configuration before you begin the migration.
PowerHA SystemMirror for AIX Enterprise Edition (PowerHA/XD) configurations
The latest version of PowerHA/XD is 6.1. You cannot migrate this version to PowerHA 7.1.
3.5 Storage
This section provides details about storage planning considerations for high availability of
your cluster implementation.
3.5.1 Shared storage for the repository disk

You must dedicate a shared disk with a minimum size of 1 GB as a central repository for the
cluster configuration data of CAA. For this disk, configure intrinsic data redundancy by using
hardware RAID features of the external storage subsystems.
For additional information about the shared disk, see the PowerHA SystemMirror Version 7.1
for AIX Standard Edition Concepts and Facilities Guide, SC23-6751. See also the PowerHA
SystemMirror Version 7.1 announcement information or the PowerHA SystemMirror Version
7.1 for AIX Standard Edition Planning Guide, SC23-6758-01, for a complete list of supported
devices.
The following disks are supported (through Multiple Path I/O (MPIO)) for the repository disk:
All FC disks that configure as MPIO
IBM DS8000, DS3000, DS4000®, DS5000, XIV®, ESS800, SAN Volume Controller (SVC)
EMC: Symmetrix, DMX, CLARiiON
HDS: 99XX, 96XX, OPEN series
IBM System Storage N series/NetApp®: All models of N series and all NetApp models
common to N series
VIOS vSCSI
All IBM serial-attached SCSI (SAS) disks that configure as MPIO
SAS storage

The following storage types are known to work with MPIO but do not have a service
agreement:
HP
SUN
Compellent
3PAR
LSI
Texas Memory Systems
Fujitsu
Toshiba
Support for third-party multipathing software: At the time of writing, some third-party
multipathing software was not supported.
3.5.2 Adapters supported for storage communication

At the time of this writing, only the 4 GB and 8 GB FC adapters are supported. Also the
daughter card for IBM System p blades and Emulex FC adapters are supported. See
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Planning Guide, SC23-6758-01,
for additional information.
The following FC and SAS adapters are supported for connection to the repository disk:
4 GB Single-Port Fibre Channel PCI-X 2.0 DDR Adapter (FC 1905; CCIN 1910)
4 GB Single-Port Fibre Channel PCI-X 2.0 DDR Adapter (FC 5758; CCIN 280D)
4 GB Single-Port Fibre Channel PCI-X Adapter (FC 5773; CCIN 5773)
4 GB Dual-Port Fibre Channel PCI-X Adapter (FC 5774; CCIN 5774)
4 Gb Dual-Port Fibre Channel PCI-X 2.0 DDR Adapter (FC 1910; CCIN 1910)
4 Gb Dual-Port Fibre Channel PCI-X 2.0 DDR Adapter (FC 5759; CCIN 5759)
8 Gb PCI Express Dual Port Fibre Channel Adapter (FC 5735; CCIN 577D)
8 Gb PCI Express Dual Port Fibre Channel Adapter 1Xe Blade (FC 2B3A; CCIN 2607)
3 Gb Dual-Port SAS Adapter PCI-X DDR External (FC 5900 and 5912; CCIN 572A)
More information: For the most current list of supported storage adapters for shared disks
other than the repository disk, contact your IBM representative. Also see the “IBM
PowerHA SystemMirror for AIX” web page at:
The PowerHA software supports the following disk technologies as shared external disks in a
highly available cluster:
SCSI drives, including RAID subsystems
FC adapters and disk subsystems
Data path devices (VPATH): SDD 1.6.2.0, or later
Virtual SCSI (vSCSI) disks
Support for vSCSI: CAA repository disk support for vSCSI is officially introduced in
AIX 6.1 TL6 SP2 and AIX 7.1 SP2. You can create a vSCSI disk repository at AIX 6.1
TL6 base levels, but not at SP1. Alternatively, direct SAN connection LUNs or NPIV
LUNs are supported with all versions.
You can combine these technologies within a cluster. Before choosing a disk technology,
review the considerations for configuring each technology as described in the following
section.

3.5.3 Multipath driver
AIX 7.1 does not support the IBM Subsystem Device Driver (SDD) for TotalStorage® Enterprise
Storage Server®, the IBM System Storage DS8000, and the IBM System Storage SAN Volume
Controller. Instead, you can use the IBM Subsystem Device Driver Path Control Module
(SDDPCM) or native AIX MPIO Path Control Module (PCM) for multipath support on AIX7.1.
AIX MPIO is an architecture that uses PCMs. The following PCMs are all supported:
SDDPCM
HDLM PCM
AIXPCM
SDDPCM only supports DS6000™, DS8000, SVC, and some models of DS4000. HDLM PCM
only supports Hitachi storage devices. AIXPCM supports all storage devices that System p
servers and VIOS support. AIXPCM supports storage devices from over 25 storage vendors.
Support for third-party multipath drivers: At the time of writing, other third-party
multipath drivers (such as EMC PowerPath, and Veritas) are not supported. This limitation
is planned to be resolved in a future release.
See the “Support Matrix for Subsystem Device Driver, Subsystem Device Driver Path Control
Module, and Subsystem Device Driver Device Specific Module” at:
http://www.ibm.com/support/docview.wss?rs=540&uid=ssg1S7001350
Also check whether the coexistence of different multipath drivers using different FC ports on
the same system is supported for mixed cases. For example, the cluster repository disk might
be a on storage or FC adapter other than the shared data disks.
3.5.4 System Storage Interoperation Center

To check the compatibility of your particular storage and SAN infrastructure with PowerHA,
see the System Storage Interoperation Center (SSIC) site at:
http://www.ibm.com/systems/support/storage/config/ssic
3.6 Network
The networking requirements for PowerHA SystemMirror 7.1 differ from all previous versions.
This section focuses specifically on the differences of the following requirements:
Multicast address
Network interfaces
Subnetting requirements for IPAT via aliasing
Host name and node name
Other network considerations
– Single adapter networks
– Virtual Ethernet (VIOS)
IPv6: IPv6 is not supported in PowerHA SystemMirror 7.1.
For additional information, and details about common features between versions, see the
PowerHA for AIX Cookbook, SG24-7739.

3.6.1 Multicast address
The CAA functionality in PowerHA SystemMirror 7.1 employs multicast addressing for
heartbeating. Therefore, the network infrastructure must handle and allow the use of multicast
addresses. If multicast traffic is present in the adjacent network, you must ask the network
administrator for a multicast IP address allocation. Also, ensure that the multicast traffic
generated by each of the cluster nodes is properly forwarded by the network infrastructure
toward any other cluster node.
3.6.2 Network interfaces

Because PowerHA SystemMirror uses CAA, CAA forces the use of all common network
(Ethernet, InfiniBand, or both) interfaces between the cluster nodes for communications. You
cannot limit which interfaces are used or configured to the cluster.
In previous versions, the network Failure Detection Rate (FDR) policy was tunable, which is
no longer true in PowerHA SystemMirror 7.1.
3.6.3 Subnetting requirements for IPAT via aliasing

In terms of subnetting requirements, IPAT via aliasing is now the only IPAT option available.
IPAT via aliasing has the following subnet requirements:
All base IP addresses on a node must be on separate subnets.
All service IP addresses must be on a separate subnet from any of the base subnets.
The service IP addresses can all be in the same or different subnets.
The persistent IP address can be in the same or a different subnet from the service IP
address.
If the networks are a single adapter configuration, both the base and service IP addresses
are allowed on the same subnet.
3.6.4 Host name and node name

In PowerHA SystemMirror 7.1, both the cluster node name and AIX host name be the same.
3.6.5 Other network considerations

Other network considerations for using PowerHA SystemMirror 7.1 include single adapter
networks and virtual Ethernet.
Single adapter networks

Through the use of EtherChannel, Shared Ethernet Adapters (SEA), or both at the VIOS
level, it is common today to have redundant interfaces act as one logical interface to the AIX
client or cluster node. In these configurations, historically users configured a netmon.cf file to
ping additional external interfaces or addresses. The netmon.cf configuration file is no longer
required.
Virtual Ethernet
In previous versions, when using virtual Ethernet, users configured a special formatted
netmon.cf file to ping additional external interfaces or addresses by using specific outbound
interfaces. The netmon.cf configuration file no longer applies.

4
Chapter 4. Installing PowerHA SystemMirror

7.1 for AIX
This chapter explains how to install the IBM PowerHA SystemMirror 7.1 for AIX Standard
Edition software.

Hardware configuration of the test environment
Installing PowerHA file sets
Volume group consideration

4.1 Hardware configuration of the test environment
Figure 4-1 shows a hardware overview of the test environment to demonstrate the installation
and configuration procedures in this chapter. It consists of two IBM Power 570 logical
partitions (LPARs), both SAN-attached to a DS4800 storage subsystem and connected to a
common LAN segment.
Figure 4-1 PowerHA Lab environment
4.1.1 SAN zoning

In the test environment, the conventional SAN zoning is configured between each host and
the storage subsystem to allow for the host attachment of the shared disks.
For the cluster SAN-based communication channel, two extra zones are created as shown in
Example 4-1. One zone includes the fcs0 ports of each server, and the other zone includes
the fcs1 ports of each server.
Example 4-1 Host-to-host zoning for SAN-based channel

sydney:/ # for i in 0 1; do lscfg -vpl fcs$i|grep "Network Address";done
Network Address.............10000000C974C16E
Network Address.............10000000C974C16F
perth:/ # for i in 0 1; do lscfg -vpl fcs$i|grep "Network Address";done

Network Address.............10000000C97720D8
Network Address.............10000000C97720D9

Fabric1:
zone: Syndey_fcs0__Perth_fcs0
10:00:00:00:c9:74:c1:6e
10:00:00:00:c9:77:20:d8
Fabric2:
zone: Syndey_fcs1__Perth_fcs1
10:00:00:00:c9:74:c1:6f
10:00:00:00:c9:77:20:d9
This dual zone setup provides redundancy for the SAN communication channel at the Cluster
Aware AIX (CAA) storage framework level. The dotted lines in Figure 4-2 represent the
initiator-to-initiator zones added on top of the conventional ones, connecting host ports to
storage ports.
Figure 4-2 Host-to-host zoning
4.1.2 Shared storage

Three Redundant Array of Independent Disks (RAID) logical drives are configured on the
DS4800 storage subsystem and are presented to both AIX nodes. One logical drive hosts the
cluster repository disk. On the other two drives, the shared storage space is configured for
application data.
Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX 55

Example 4-2 shows that each disk is available through two paths on different Fibre Channel
(FC) adapters.
Example 4-2 FC path setup on AIX nodes

sydney:/ # for i in hdisk1 hdisk2 hdisk3 ; do lspath -l $i;done
Enabled hdisk1 fscsi0
perth:/ # for i in hdisk1 hdisk2 hdisk3 ; do lspath -l $i;done

Defined hdisk3 fscsi0
The multipath driver being used is the AIX native MPIO. In Example 4-3, the mpio_get_config
command shows identical LUNs on both nodes, as expected.
Example 4-3 MPIO shared LUNs on AIX nodes

sydney:/ # mpio_get_config -Av
Frame id 0:
Storage Subsystem worldwide name: 60ab800114632000048ed17e
Controller count: 2
Partition count: 1
Partition 0:
Storage Subsystem Name = 'ITSO_DS4800'
hdisk LUN # Ownership User Label
hdisk1 7 B (preferred) PW-0201-L7
hdisk2 8 A (preferred) PW-0201-L8
perth:/ # mpio_get_config -Av

Frame id 0:
Storage Subsystem worldwide name: 60ab800114632000048ed17e
Controller count: 2
Partition count: 1
Partition 0:
Storage Subsystem Name = 'ITSO_DS4800'
hdisk LUN # Ownership User Label
hdisk2 8 A (preferred) PW-0201-L8

4.1.3 Configuring the FC adapters for SAN-based communication
To properly configure the FC adapters for the cluster SAN-based communication, follow these
steps:
X in fcsX: In the following steps, the X in fcsX represents the number of the FC adapters.
You must complete this procedure for each FC adapter that is involved in cluster
SAN-based communication.
1. Unconfigure fcsX:
rmdev -Rl fcsX
fcsX device busy: If the fcsX device is busy when you use the rmdev command, enter
the following commands:
chdev -P -l fcsX -a tme=yes
chdev -P -l fscsiX -a dyntrk=yes -a fc_err_recov=fast_fail
Then restart the system.
2. Change tme attribute value to yes in the fcsX definition:

chdev -l fcsX -a tme=yes
3. Enable the dynamic tracking and the fast-fail error recovery policy on the corresponding
fscsiX device:
chdev -l fscsiX -a dyntrk=yes -a fc_err_recov=fast_fail
4. Configure fcsX port and its associated Storage Framework Communication device:
cfgmgr -l fcsX;cfgmgr -l sfwcommX
5. Verify the configuration changes by running the following commands:
lsdev -C | grep -e fcsX -e sfwcommX
lsattr -El fcsX | grep tme
lsattr -El fscsiX | grep -e dyntrk -e fc_err_recov
Example 4-4 illustrates the procedure for port fcs0 on node sydney.
Example 4-4 SAN-based communication channel setup

sydney:/ # lsdev -l fcs0
fcs0 Available 00-00 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03)
sydney:/ # lsattr -El fcs0|grep tme

tme no Target Mode Enabled True
sydney:/ # rmdev -Rl fcs0

fcnet1 Defined
sfwcomm0 Defined
fscsi0 Defined
fcs0 Defined
sydney:/ # chdev -l fcs0 -a tme=yes

fcs0 changed
sydney:/ # chdev -l fscsi0 -a dyntrk=yes -a fc_err_recov=fast_fail

fscsi0 changed
sydney:/ # cfgmgr -l fcs0;cfgmgr -l sfwcomm0
sydney:/ # lsdev -C|grep -e fcs0 -e sfwcomm0

fcs0 Available 01-00 8Gb PCI Express Dual Port FC Adapter
(df1000f114108a03)
sfwcomm0 Available 01-00-02-FF Fiber Channel Storage Framework Comm
sydney:/ # lsattr -El fcs0|grep tme

tme yes Target Mode Enabled True
sydney:/ # lsattr -El fscsi0|grep -e dyntrk -e fc_err_recov

dyntrk yes Dynamic Tracking of FC Devices True
fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True
4.2 Installing PowerHA file sets

At a minimum, you must have the following PowerHA runtime executable files:
cluster.es.client
cluster.es.server
cluster.es.cspoc
Depending on the functionality required for your environment, additional file sets might be
selected for installation.
Migration consideration: Installation on top of a previous release is considered a

migration. Additional steps are required for migration including running the clmigcheck
command. For more information about migration, see Chapter 7, “Migrating to PowerHA
7.1” on page 151.
PowerHA SystemMirror 7.1 for AIX Standard Edition includes the Smart Assists images. For
more details about the Smart Assists functionality and new features, see 2.2, “New features”
on page 24.
The PowerHA for IBM Systems Director agent file set comes with the base installation media.
To learn more about PowerHA SystemMirror for IBM Systems Director, see 5.3, “PowerHA
SystemMirror for IBM Systems Director” on page 133.
You can install the required packages in the following ways:

From a CD
From a hard disk to which the software has been copied
From a Network Installation Management (NIM) server
Installation from a CD is more appropriate for small environments. Use NFS export and
import for remote nodes to avoid multiple CD maneuvering or image copy operations.
The following section provides an example of how to use a NIM server to install the PowerHA
software.

4.2.1 PowerHA software installation example
This section guides you through an example of installing the PowerHA software. This example
runs on the server configuration shown in 4.1, “Hardware configuration of the test
environment” on page 54.
Installing the AIX BOS components and RSCT

Some of the prerequisite file sets might already be present, or they might be missing from
previous installations, updates, and removals. To begin, a consistent AIX image must be
installed. The test environment entailed starting with a “New and Complete Overwrite” of AIX
6.1.6.1 installation from a NIM server. Example 4-5 shows how to check the AIX version and
the consistency of the installation.
Example 4-5 Initial AIX image

sydney:/ # oslevel -s
6100-06-01-1043
sydney:/ # lppchk -v
sydney:/ #
In Example 4-6, the lslpp command lists the prerequisites that are already installed and the
ones that are missing in a single output.
Example 4-6 Checking the installed and missing prerequisites

sydney:/ # lslpp -L bos.adt.lib bos.adt.libm bos.adt.syscalls bos.clvm.enh \
> bos.cluster.rte bos.cluster.solid bos.data bos.ahafs bos.net.tcp.client \
> bos.net.tcp.server bos.rte.SRC bos.rte.libc bos.rte.libcfg \
> bos.rte.libcur bos.rte.libpthreads bos.rte.lvm bos.rte.odm \
> bos.rte.libcur bos.rte.libpthreads bos.rte.lvm bos.rte.odm \
> rsct.basic.rte rsct.compat.basic.hacmp rsct.compat.clients.hacmp
Fileset Level State Type Description (Uninstaller)
----------------------------------------------------------------------------
bos.adt.lib 6.1.2.0 C F Base Application Development
Libraries
lslpp: Fileset bos.adt.libm not installed.
lslpp: Fileset bos.adt.syscalls not installed.
bos.cluster.rte 6.1.6.1 C F Cluster Aware AIX
bos.cluster.solid 6.1.6.1 C F POWER HA Business Resiliency
solidDB
lslpp: Fileset bos.clvm.enh not installed.
lslpp: Fileset bos.data not installed.
bos.net.tcp.client 6.1.6.1 C F TCP/IP Client Support
bos.net.tcp.server 6.1.6.0 C F TCP/IP Server
bos.rte.SRC 6.1.6.0 C F System Resource Controller
bos.rte.libc 6.1.6.1 C F libc Library
bos.rte.libcfg 6.1.6.0 C F libcfg Library
bos.rte.libcur 6.1.6.0 C F libcurses Library
bos.rte.libpthreads 6.1.6.0 C F pthreads Library
bos.rte.lvm 6.1.6.0 C F Logical Volume Manager
bos.rte.odm 6.1.6.0 C F Object Data Manager
rsct.basic.rte 3.1.0.1 C F RSCT Basic Function
rsct.compat.basic.hacmp 3.1.0.1 C F RSCT Event Management Basic
Function (HACMP/ES Support)

rsct.compat.clients.hacmp 3.1.0.0 C F RSCT Event Management Client
Figure 4-3 shows selection of the appropriate lpp_source on the NIM server, aix6161, by
following the path smitty nim  Install and Update Software  Install Software. You
select all of the required file sets on the next panel.
Install and Update Software
Install Software
Update Installed Software to Latest Level (Update All)
Install Software Bundle
Update Software by Fix (APAR)
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
• Select the LPP_SOURCE containing the install images •
• •
• Move cursor to desired item and press Enter. •
• •
• aix7100g resources lpp_source •
• aix7101 resources lpp_source •
• ha71sp1 resources lpp_source •
• aix6160-SP1-only resources lpp_source •
• •
• F1=Help F2=Refresh F3=Cancel •
• Esc+8=Image Esc+0=Exit Enter=Do •
F1• /=Find n=Find Next •
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 4-3 Installing the prerequisites: Selecting lpp_source

Figure 4-4 shows one of the selected file sets, bos.clvm. Although it is not required for
another file set, bos.clvm is mandatory for PowerHA 7.1 because only enhanced concurrent
volume groups (ECVGs) are supported. See 10.3.3, “The ECM volume group” on page 313,
for more details.
Ty••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Pr• Software to Install •
• •
[T• Move cursor to desired item and press Esc+7. Use arrow keys to scroll. •
* • ONE OR MORE items can be selected. •
* • Press Enter AFTER making all selections. •
• •
• [MORE...2286] •
• + 6.1.6.1 POWER HA Business Resiliency solidDB •
• + 6.1.6.0 POWER HA Business Resiliency solidDB •
• •
• > bos.clvm ALL •
• + 6.1.6.0 Enhanced Concurrent Logical Volume Manager •
• •
• bos.compat ALL •
• + 6.1.6.0 AIX 3.2 Compatibility Commands •
• [MORE...4498] •
[M• •
F1• Esc+7=Select Esc+8=Image Esc+0=Exit •
Es• Enter=Do /=Find n=Find Next •
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 4-4 Installing the prerequisites: Selecting the file sets
After installing from the NIM server, ensure that each node remains at the initial version of AIX
and RSCT, and check the software consistency, as shown in Example 4-7.
Example 4-7 Post-installation check of the prerequisites

sydney:/ # oslevel -s
6100-06-01-1043
sydney:/ # lslpp -L rsct.basic.rte rsct.compat.basic.hacmp \

> rsct.compat.clients.hacmp
----------------------------------------------------------------------------
rsct.basic.rte 3.1.0.1 C F RSCT Basic Function
rsct.compat.basic.hacmp 3.1.0.1 C F RSCT Event Management Basic
rsct.compat.clients.hacmp 3.1.0.0 C F RSCT Event Management Client

Installing the PowerHA file sets
To prepare an lpp_source that contains the required base and updated file sets, follow these
steps:
1. Copy the file set from the media to a directory on the NIM server by using the smit
bffcreate command.
2. Apply the latest service pack in the same directory by using the smit bffcreate
command.
3. Create an lpp_source resource that points to the directory on the NIM server.
Example 4-8 lists the contents of the lpp_source. As mentioned previously, both the Smart
Assist file sets and PowerHA for IBM Systems Director agent file set come with the base
media.
Example 4-8 The contents of lpp_source in PowerHA SystemMirror

nimres1:/ # lsnim -l ha71sp1
ha71sp1:
class = resources
type = lpp_source
arch = power
Rstate = ready for use
prev_state = unavailable for use
location = /nimrepo/lpp_source/HA71
alloc_count = 0
server = master
nimres1:/nimres1:/ # ls /nimrepo/lpp_source/HA71
.toc
cluster.adt.es
cluster.doc.en_US.assist
cluster.doc.en_US.assist.db2.html.7.1.0.1.bff
cluster.doc.en_US.assist.oracle.html.7.1.0.1.bff
cluster.doc.en_US.assist.websphere.html.7.1.0.1.bff
cluster.doc.en_US.es
cluster.doc.en_US.es.html.7.1.0.1.bff
cluster.doc.en_US.glvm.html.7.1.0.1.bff
cluster.es.assist
cluster.es.assist.common.7.1.0.1.bff
cluster.es.assist.db2.7.1.0.1.bff
cluster.es.assist.domino.7.1.0.1.bff
cluster.es.assist.ihs.7.1.0.1.bff
cluster.es.assist.sap.7.1.0.1.bff
cluster.es.cfs
cluster.es.cfs.rte.7.1.0.1.bff
cluster.es.client
cluster.es.client.clcomd.7.1.0.1.bff
cluster.es.client.lib.7.1.0.1.bff
cluster.es.client.rte.7.1.0.1.bff
cluster.es.cspoc
cluster.es.director.agent
cluster.es.migcheck
cluster.es.nfs
cluster.es.server
cluster.es.server.diag.7.1.0.1.bff
cluster.es.server.events.7.1.0.1.bff

cluster.es.server.rte.7.1.0.1.bff
cluster.es.server.utils.7.1.0.1.bff
cluster.es.worksheets
cluster.license
cluster.man.en_US.es.data
cluster.msg.en_US.assist
cluster.msg.en_US.es
rsct.basic_3.1.0.0
rsct.compat.basic_3.1.0.0
rsct.compat.clients_3.1.0.0
rsct.core_3.1.0.0
rsct.exp_3.1.0.0
rsct.opt.fence_3.1.0.0
rsct.opt.stackdump_3.1.0.0
rsct.opt.storagerm_3.1.0.0
rsct.sdk_3.1.0.0
Example 4-9 shows the file sets that were selected for the test environment and installed from
the lpp_source that was prepared previously. Each node requires a PowerHA license.
Therefore, you must install the license file set.
Example 4-9 List of installed PowerHA file sets

sydney:/ # lslpp -L cluster.*
----------------------------------------------------------------------------
Infrastructure
cluster.es.client.lib 7.1.0.1 C F PowerHA SystemMirror Client
Libraries
cluster.es.client.rte 7.1.0.1 C F PowerHA SystemMirror Client
Runtime
cluster.es.client.utils 7.1.0.0 C F PowerHA SystemMirror Client
Utilities
cluster.es.client.wsm 7.1.0.0 C F Web based Smit
cluster.es.cspoc.cmds 7.1.0.0 C F CSPOC Commands
cluster.es.cspoc.dsh 7.1.0.0 C F CSPOC dsh
cluster.es.cspoc.rte 7.1.0.0 C F CSPOC Runtime Commands
cluster.es.migcheck 7.1.0.0 C F PowerHA SystemMirror Migration
support
cluster.es.server.cfgast 7.1.0.0 C F Two-Node Configuration
Assistant
cluster.es.server.diag 7.1.0.1 C F Server Diags
cluster.es.server.events 7.1.0.1 C F Server Events
cluster.es.server.rte 7.1.0.1 C F Base Server Runtime
cluster.es.server.testtool
7.1.0.0 C F Cluster Test Tool
cluster.es.server.utils 7.1.0.1 C F Server Utilities
cluster.license 7.1.0.0 C F PowerHA SystemMirror
Electronic License
cluster.man.en_US.es.data 7.1.0.0 C F Man Pages - U.S. English
cluster.msg.en_US.assist 7.1.0.0 C F PowerHA SystemMirror Smart
Assist Messages - U.S. English
cluster.msg.en_US.es.client
7.1.0.0 C F PowerHA SystemMirror Client
Messages - U.S. English
cluster.msg.en_US.es.server

7.1.0.0 C F Recovery Driver Messages -
U.S. English
Then verify the installed software as shown in Example 4-10. The prompt return by the lppchk
command confirms the consistency of the installed file sets.
Example 4-10 Verifying the installed PowerHA filesets consistency

sydney:/ # lppchk -c cluster.*
sydney:/ #
4.3 Volume group consideration

PowerHA 7.1 supports only the use of enhanced concurrent volume groups. If you try to add
an existing non-current volume group to a PowerHA resource group, it fails if it is not already
imported on the other node with the error message shown in Figure 4-5.
Auto Discover/Import of Volume Groups was set to true.

Gathering cluster information, which may take a few minutes.
claddres: test_vg is not a shareable volume group.

Could not perform all imports.
No ODM values were changed.
<01> Importing Volume group: test_vg onto node: chile: FAIL
Verification to be performed on the following:
Cluster Topology
Cluster Resources
Figure 4-5 Error message when adding a volume group
To work around the problem shown in Figure 4-5, manually import the volume group on the
other node by using the following command:
importvg -L test_vg hdiskx
After the volume group is added to the other node, the synchronization and verification are
then completed.
Volume group conversion: The volume group is automatically converted to an enhanced

concurrent volume group during the first startup of the PowerHA cluster.

5
Chapter 5. Configuring a PowerHA cluster

To configure a PowerHA cluster, you can choose from the following options.
SMIT
SMIT is the most commonly used way to manage and configure a cluster. The SMIT
menus are available after the cluster file sets are installed. The learning cycle for using
SMIT is shorter than the learning cycle for using the command-line interface (CLI). For
more information about using SMIT to configure a cluster, see 5.1, “Cluster configuration
using SMIT” on page 66.
PowerHA SystemMirror plug-in for IBM Systems Director
IBM Systems Director is for users who are ready to use and want to use it to manage and
configure the PowerHA clusters. You might choose this option if you are working with large
environments for central management of all clusters.
You can choose from two methods, as explained in the following sections, to configure a
cluster using IBM Systems Director:
– 12.1.1, “Creating a cluster with the SystemMirror plug-in wizard” on page 334
– 12.1.2, “Creating a cluster with the SystemMirror plug-in CLI” on page 339
The clmgr CLI
You can use the clmgr utility for configuration tasks. However, its purpose is to provide a
uniform scripting interface for deployments in larger environments and to perform
day-to-day cluster management. For more information about using this tool, see 5.2,
“Cluster configuration using the clmgr tool” on page 104.
You can perform most administration tasks with any of these options. The option that you
choose depends on which one you prefer and which one meets the requirements of your
environment.

Cluster configuration using SMIT
Cluster configuration using the clmgr tool
PowerHA SystemMirror for IBM Systems Director

5.1 Cluster configuration using SMIT
This topic includes the following sections:
SMIT menu changes
Overview of the test environment
Typical configuration of a cluster topology
Custom configuration of the cluster topology
Configuring resources and applications
Configuring Start After and Stop After resource group dependencies
Creating a user-defined resource type
Configuring the dynamic node priority (adaptive failover)
Removing a cluster
5.1.1 SMIT menu changes

The SMIT menus for PowerHA SystemMirror 7.1 are restructured to simplify configuration
and administration by grouping menus by function.
Locating available options: If you are familiar with the SMIT paths from an earlier
version, and need to locate a specific feature, use the “Can’t find what you are looking for
?” feature from the main SMIT menu to list and search the available options.
To enter the top-level menu, use the new fast path, smitty sysmirror. The fast path on earlier
versions, smitty hacmp, still works. From the main menu, the highlighted options shown in
Figure 5-1 are available to help with topology and resources configuration. Most of the tools
necessary to configure cluster components are under “Cluster Nodes and Networks” and
“Cluster Applications and Resources.” Some terminology has changed, and the interface
looks more simplified for easier navigation and management.
PowerHA SystemMirror


Custom Cluster Configuration
Can't find what you are looking for ?

Figure 5-1 Top-level SMIT menu
Because topology monitoring has been transferred to CAA, its management has been
simplified. Support for non-TCP/IP heartbeat has been transferred to CAA and is no longer a
separate configurable option. Instead of multiple menu options and dialogs for configuring
non-TCP/IP heartbeating devices, a single option is available plus a window (Figure 5-2) to
specify the CAA cluster repository disk and the multicast IP address.

Up-front help information and navigation aids, similar to the last two items in the top-level
menu in Figure 5-1 (Can't find what you are looking for ? and Not sure where to start ?), are
now available in some of the basic panels. See the last menu option in Figure 5-2 (What are a
repository disk and cluster IP address ?) for an example. The context-sensitive help (F1 key)
in earlier versions is still available.
Initial Cluster Setup (Typical)
Setup a Cluster, Nodes and Networks

Define Repository Disk and Cluster IP Address
What are a repository disk and cluster IP address ?
F1=Help F2=Refresh F3=Cancel Esc+8=Image

Esc+9=Shell Esc+0=Exit Enter=Do
Figure 5-2 Help information
The top resource menus keep only the commonly used options, and the less frequently used
menus are deeper in the hierarchy, under a new Custom Cluster Configuration menu. This
menu includes various customizable and advanced options, similar to the “Extended
Configuration” menu in earlier versions. See 2.3, “Changes to the SMIT panel” on page 25,
for a layout that compares equivalent menu screens in earlier versions with the new screens.
The Verify and Synchronize functions now have a simplified form in most of the typical menus,
while the earlier customizable version is available in more advanced contexts.
Application server versus application controller: Earlier versions used the term
application server to refer to the scripts that are used to start and stop applications under
SystemMirror control. In version 7.1, these scripts are referred to as application
controllers.
A System Events dialog is now available in addition to the user-defined events and pre- and
post-event commands for predefined events from earlier versions. For more information about
this dialog, see 9.4, “Testing the rootvg system event” on page 286.
SSA disks are no longer supported in AIX 6.1, and the RSCT role has been diminished.
Therefore, some related menu options have been removed. See Chapter 2, “Features of
PowerHA SystemMirror 7.1” on page 23, for more details about the new and obsolete
features.
For a topology configuration, SMIT provides two possible approaches that resemble the
previous Standard and Extended configuration paths: typical configuration and custom
configuration.
Typical configuration
The smitty sysmirror  Cluster Nodes and Networks  Initial Cluster Setup (Typical)
configuration path provides the means to configure the basic components of a cluster in a few
steps. Discovery and selection of configuration information is automated, and default values
are provided whenever possible. If you need to use specific values instead of the default
Chapter 5. Configuring a PowerHA cluster 67

paths that are provided, you can change them later or use the custom configuration path
instead.
Custom configuration
Custom cluster configuration options are not typically required or used by most customers.
However they provide extended flexibility in configuration and management options. These
options are under the Custom Cluster Configuration option in the top-level panel. If you want
complete control over which components are added to the cluster, and create them piece by
piece, you can configure the cluster topology with the SMIT menus. Follow the path Custom
Cluster Configuration  Initial Cluster Setup (Custom). With this path, you can also set
your own node and network names, other than the default ones. Alternatively, you can choose
only specific network interfaces to support the clustered applications. (By default, all IP
configured interfaces are used.)
Resources configuration
The Cluster Applications and Resources menu in the top-level panel groups the commonly
used options for configuring resources, resource groups, and application controllers.
Other resource options that are not required in most typical configurations are under the
Custom Cluster Configuration menu. They provide dialogs and options to perform the
following tasks:
Configure a custom disk, volume group, and file system methods for cluster resources
Customize resource recovery and service IP label distribution policy
Customize and event
Most of the resources menus and dialogs are similar to their counterparts in earlier versions.
For more information, see the existing documentation about the previous releases listed in
“Related publications” on page 519.
5.1.2 Overview of the test environment

The cluster used in the test environment is a mutual-takeover, dual-node implementation with
two resource groups, one on each node. Figure 5-3 on page 69 shows the cluster
configuration on top of the hardware infrastructure introduced in 4.1, “Hardware configuration
of the test environment” on page 54.

Figure 5-3 Mutual-takeover, dual-node cluster
By using this setup, we can present various aspects of a typical production implementation,
such as topology redundancy or more complex resource configuration. As an example, we
configure SAN-based heartbeating and introduce the new Start After and Stop After resource
group dependencies.
5.1.3 Typical configuration of a cluster topology

This section explains step-by-step how to configure a basic PowerHA cluster topology using
the typical cluster configuration path. For an example of using the custom cluster
configuration path, see 5.1.4, “Custom configuration of the cluster topology” on page 78.
Prerequisite: Before reading this section, you must have configured all your networks and
storage devices as explained in 3.2, “Hardware requirements” on page 44.
The /etc/cluster/rhosts directory must be populated with all cluster IP addresses before
using PowerHA SystemMirror. This process was done automatically in earlier versions, but is
now a required, manual process. The addresses that you enter in this file must include the
addresses that resolve to the host name of the cluster nodes. If you update this file, you must
refresh the clcomd subsystem with the refresh -s clcomd command.

In previous releases of PowerHA, you were not required to have the host name resolve into
an IP address. From the information based on the PowerHA release notes, you are required
to resolve the host name.
Important: Previous releases used the clcomdES subsystem, which read information from
the /usr/es/sbin/cluster/etc/rhosts directory. The clcomdES subsystem is no longer
used. Therefore, you must configure the clcomd subsystem as explained in this section.
Also, ensure that you have one unused shared disk available for the cluster repository.
Example 5-1 shows the lspv command output on the systems sydney and perth. The first
part shows the output from the node sydney, and the second part shows the output from
perth.
Example 5-1 lspv command output before configuring PowerHA

sydney:/ # lspv
hdisk0 00c1f170488a4626 rootvg active
hdisk1 00c1f170fd6b4d9d dbvg
hdisk2 00c1f170fd6b50a5 appvg
hdisk3 00c1f170fd6b5126 None
---------------------------------------------------------------------------
perth:/ # lspv
hdisk0 00c1f1707c6092fe rootvg active
hdisk1 00c1f170fd6b4d9d dbvg
hdisk2 00c1f170fd6b50a5 appvg
hdisk3 00c1f170fd6b5126 None
Node names: The sydney and perth node names have no implication on extended
distance capabilities. The names have been used only for node names.
Defining a cluster
To define a cluster, follow these steps:
1. Use the smitty sysmirror or smitty hacmp fast path.
2. In the PowerHA SystemMirror menu (Figure 5-4), select the Cluster Nodes and
Networks option.


Custom Cluster Configuration
Can't find what you are looking for ?

Figure 5-4 Menu that is displayed after entering smitty sysmirror

3. In the Cluster Nodes and Networks menu (Figure 5-5), select the Initial Cluster Setup
(Typical) option.
Manage the Cluster

Manage Nodes
Manage Networks and Network Interfaces
Discover Network Interfaces and Disks
Verify and Synchronize Cluster Configuration

Figure 5-5 Cluster Nodes and Networks menu
4. In the Initial Cluster Setup (Typical) menu (Figure 5-6), select the Setup a Cluster, Nodes
and Networks option.
Setup a Cluster, Nodes and Networks

What are a repository disk and cluster IP address ?

Figure 5-6 Initial cluster setup (typical)
5. From the Setup a Cluster, Nodes, and Networks panel (Figure 5-7 on page 72), complete
the following steps:
a. Specify the repository disk and the multicast IP address.
The cluster name is based on the host name of the system. You can use this default or
replace it with a name you want to use. In the text environment, the cluster is named
australia.
b. In the New Nodes field, define the IP label that you want to use to communicate to the
other systems. In this example, we plan to build a two-node cluster where the two
systems are named sydney and perth. If you want to create a cluster with more than
two nodes, you can specify more than one system by using the F4 key. The advantage
is that you do not get typographical errors, and you can verify that the /etc/hosts file
contains your network addresses.
The Currently Configured Node(s) field lists all the configured nodes or lists the host
name of the system you are working on if nothing is configured so far.
c. Press Enter.

Setup Cluster, Nodes and Networks (Typical)
Type or select values in entry fields.

Press Enter AFTER making all desired changes.
[Entry Fields]
* Cluster Name [australia]
New Nodes (via selected communication paths) [perth] +
Currently Configured Node(s) sydney
Figure 5-7 Setup a Cluster, Nodes and Networks panel
The COMMAND STATUS panel (Figure 5-8) indicates that the cluster creation completed
successfully.
COMMAND STATUS
Command: OK stdout: yes stderr: no
Before command completion, additional instructions may appear below.
[TOP]
Cluster Name: australia_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: None
Cluster IP Address:
There are 2 node(s) and 1 network(s) defined
NODE perth:
Network net_ether_01
perth 192.168.101.136
NODE sydney:
sydney 192.168.101.135
No resource groups defined

clharvest_vg: Initializing....
Gathering cluster information, which may take a few minutes...
clharvest_vg: Processing...
Storing the following information in file
/usr/es/sbin/cluster/etc/config/clvg_config
perth:
[MORE...93]
Figure 5-8 Cluster creation completed successfully

If you receive an error message similar to the example in Figure 5-9, you might have missed a
step. For example, you might not have added the host names to /etc/cluster/rhosts
directory or forgot to use the refresh -s clcomd command. Alternatively, you might have to
change the host name in the /etc/cluster/rhosts directory to a full domain-based host
name.
Reminder: After you change the /etc/cluster/rhosts directory, enter the refresh -s
clcomd command.
COMMAND STATUS
Command: failed stdout: yes stderr: no
Warning: There is no cluster found.

cllsclstr: No cluster defined
cllsclstr: Error reading configuration
Figure 5-9 Failure to set up the initial cluster
When you look in more detail at the output, you might notice that the system adds your entries
to the cluster configuration and runs a discovery on the systems. You also get information
about the discovered shared disks that are listed.
Configuring the repository disk and cluster multicast IP address

After you configure the cluster, configure the repository disk and the cluster multicast IP
address.
1. Go back to the Initial Cluster Setup (Typical) panel (Figure 5-6 on page 71). You can use
the path smitty sysmirror  Cluster Nodes and Networks  Initial Cluster Setup
(Typical) or the smitty cm_setup_menu fast path.
2. In the Initial Cluster Setup (Typical) panel, select the Define Repository and Cluster IP
Address option.

3. In the Define Repository and Cluster IP Address panel (Figure 5-10), complete these
steps:
a. Press the F4 key to select the disk that you want to use as the repository disk for CAA.
As shown in Example 5-1 on page 70, only one unused shared disk, hdisk3, remains.
b. Leave the Cluster IP Address field empty. The system generates an appropriate
address for you.
The cluster IP address is a multicast address that is used for internal cluster
communication and monitoring. Specify an address manually only if you have an
explicit reason to do so. For more information about the cluster multicast IP address,
see “Requirements for the multicast IP address, SAN, and repository disk” on page 45.
Multicast address not specified: If you did not specify a multicast address, you
can see the one that AIX chose for you in the output of the cltopinfo command.
c. Press Enter.
Define Repository and Cluster IP Address

[Entry Fields]
* Cluster Name australia
* Repository Disk [None] +
Cluster IP Address []
+--------------------------------------------------------------------------+
| Repository Disk |
| |
| Move cursor to desired item and press Enter. |
| |
| hdisk3 |
| |
| F1=Help F2=Refresh F3=Cancel |
F1| F8=Image F10=Exit Enter=Do |
F5| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 5-10 Define Repository and Cluster IP Address panel

Then the COMMAND STATUS panel (Figure 5-11) opens.
COMMAND STATUS
[TOP]
Cluster Name: australia
Repository Disk: hdisk3
Cluster IP Address:
NODE perth:
perth 192.168.101.136
NODE sydney:
sydney 192.168.101.135
Current cluster configuration:
[BOTTOM]
Figure 5-11 COMMAND STATUS showing OK for adding a repository disk
This process only updates the information in the cluster configuration. If you use the lspv
command on any nodes in the cluster, each node still shows the same output as listed in
Example 5-1 on page 70. When the cluster is synchronized the first time, both the CAA
cluster and repository disk are created.
Creating a cluster with host names in the FQDN format

In the testing environments, we create working clusters with both short and fully qualified
domain name (FQDN) host names. To use the FQDN, you must follow this guidance:
The /etc/hosts file has the FQDN entry first, right after the IP address, and then the short
host name as an alias for each label. In this case, the FQDN name is used by CAA
because CAA always uses the host name for its node names, regardless of whether the
host name is short or FQDN.
Define the PowerHA node names with the short names because dots are not accepted as
part of a node name.
As long as the /etc/hosts file contains the FQDN entry first, and then the short name as
an alias, the host name can be either FQDN or short in your configuration.
As long as the /etc/hosts file contains the FQDN entry first, and then the short name as
an alias, the /etc/cluster/rhosts file can contain only the short name. This file is only
used for the first synchronization of the cluster, when the Object Data Manager (ODM)
classes are still not populated with the communication paths for the nodes. The same

function as /usr/es/sbin/cluster/etc/rhosts file exists in previous PowerHA and
HACMP versions.
When you are defining the interfaces to PowerHA, choose either the short or long name
from the pick lists in SMIT. PowerHA always uses the short name at the end. The same
guidance applies for service or persistent addresses.
Logical partition (LPAR) names continue to be the short ones, even if you use FQDN for
host names.
Example 5-2 shows a configuration that uses host names in the FQDN format.
Example 5-2 Configuration using host names in the FQDN format

seoul.itso.ibm.com:/ # clcmd cat /etc/hosts
-------------------------------
NODE seoul.itso.ibm.com
-------------------------------
127.0.0.1 loopback localhost # loopback (lo0) name/address
::1 loopback localhost # IPv6 loopback (lo0) name/address
192.168.101.143 seoul-b1.itso.ibm.com seoul-b1 # Base IP label 1
192.168.101.144 busan-b1.itso.ibm.com busan-b1 # Base IP label 1
10.168.101.43 seoul.itso.ibm.com seoul # Persistent IP
10.168.101.44 busan.itso.ibm.com busan # Persistent IP
10.168.101.143 poksap-db.itso.ibm.com poksap-db # Service IP label
10.168.101.144 poksap-en.itso.ibm.com poksap-en # Service IP label
10.168.101.145 poksap-er.itso.ibm.com poksap-er # Service IP label
-------------------------------
NODE busan.itso.ibm.com
-------------------------------
127.0.0.1 loopback localhost # loopback (lo0) name/address
::1 loopback localhost # IPv6 loopback (lo0) name/address
10.168.101.43 seoul.itso.ibm.com seoul # Persistent IP
10.168.101.44 busan.itso.ibm.com busan # Persistent IP
10.168.101.143 poksap-db.itso.ibm.com poksap-db # Service IP label
10.168.101.144 poksap-en.itso.ibm.com poksap-en # Service IP label
10.168.101.145 poksap-er.itso.ibm.com poksap-er # Service IP label
seoul.itso.ibm.com:/ # clcmd hostname

-------------------------------
-------------------------------
seoul.itso.ibm.com
-------------------------------
-------------------------------
busan.itso.ibm.com
seoul.itso.ibm.com:/ # clcmd cat /etc/cluster/rhosts

-------------------------------
-------------------------------
seoul
busan
-------------------------------
-------------------------------
seoul
busan

seoul.itso.ibm.com:/ # clcmd lsattr -El inet0
-------------------------------
-------------------------------
authm 65536 Authentication Methods True
bootup_option no Use BSD-style Network Configuration True
gateway Gateway True
hostname seoul.itso.ibm.com Host Name True
rout6 IPv6 Route True
route net,,0,192.168.100.60 Route True
-------------------------------
-------------------------------
authm 65536 Authentication Methods True
bootup_option no Use BSD-style Network Configuration True
gateway Gateway True
hostname busan.itso.ibm.com Host Name True
rout6 IPv6 Route True
route net,,0,192.168.100.60 Route True
seoul.itso.ibm.com:/ # cllsif
Adapter Type Network Net Type Attribute Node IP Address Hardware Address Interface
Name Global Name Netmask Alias for HB Prefix Length
busan-b1 boot net_ether_01 ether public busan 192.168.101.144 en0 255.255.255.0
24
busan-b2 boot net_ether_01 ether public busan 192.168.201.144 en2 255.255.255.0
24
poksap-er service net_ether_01 ether public busan 10.168.101.145
255.255.255.0 24
poksap-en service net_ether_01 ether public busan 10.168.101.144
255.255.255.0 24
poksap-db service net_ether_01 ether public busan 10.168.101.143
255.255.255.0 24
seoul-b1 boot net_ether_01 ether public seoul 192.168.101.143 en0 255.255.255.0
24
seoul-b2 boot net_ether_01 ether public seoul 192.168.201.143 en2 255.255.255.0
24
poksap-er service net_ether_01 ether public seoul 10.168.101.145
255.255.255.0 24
poksap-en service net_ether_01 ether public seoul 10.168.101.144
255.255.255.0 24
poksap-db service net_ether_01 ether public seoul 10.168.101.143
255.255.255.0 24
seoul.itso.ibm.com:/ # cllsnode
Node busan
Interfaces to network net_ether_01
Communication Interface: Name busan-b1, Attribute public, IP address 192.168.101.144
Communication Interface: Name busan-b2, Attribute public, IP address 192.168.201.144
Communication Interface: Name poksap-er, Attribute public, IP address 10.168.101.145
Communication Interface: Name poksap-en, Attribute public, IP address 10.168.101.144
Communication Interface: Name poksap-db, Attribute public, IP address 10.168.101.143
Node seoul
Interfaces to network net_ether_01
Communication Interface: Name seoul-b1, Attribute public, IP address 192.168.101.143
Communication Interface: Name seoul-b2, Attribute public, IP address 192.168.201.143
Communication Interface: Name poksap-er, Attribute public, IP address 10.168.101.145
Communication Interface: Name poksap-en, Attribute public, IP address 10.168.101.144
Communication Interface: Name poksap-db, Attribute public, IP address 10.168.101.143
# LPAR names
seoul.itso.ibm.com:/ # clcmd uname -n
-------------------------------

-------------------------------
seoul
-------------------------------
-------------------------------
busan
seoul.itso.ibm.com:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
sapdb ONLINE seoul
OFFLINE busan
sapen ONLINE seoul

OFFLINE busan
saper ONLINE busan

OFFLINE seoul
# The output below shows that CAA always use the hostname for its node names
# The Power HA nodenames are: seoul, busan
seoul.itso.ibm.com:/ # lscluster -c
Cluster query for cluster korea returns:
Cluster uuid: 02d20290-d578-11df-871d-a24e50543103
Number of nodes in cluster = 2
Cluster id for node busan.itso.ibm.com is 1
Primary IP address for node busan.itso.ibm.com is 10.168.101.44
Cluster id for node seoul.itso.ibm.com is 2
Primary IP address for node seoul.itso.ibm.com is 10.168.101.43
Number of disks in cluster = 2
for disk cldisk2 UUID = 428e30e8-657d-8053-d70e-c2f4b75999e2 cluster_major = 0 cluster_minor = 2
for disk cldisk1 UUID = fe1e9f03-005b-3191-a3ee-4834944fcdeb cluster_major = 0 cluster_minor = 1
Multicast address for cluster is 228.168.101.43
5.1.4 Custom configuration of the cluster topology

For the custom configuration path example, we use the test environment from 4.1, “Hardware
configuration of the test environment” on page 54.
As a preliminary step, add the base IP aliases in /etc/cluster/rhosts file on each node and
refresh the CAA clcomd daemon. Example 5-3 illustrates this step on the node sydney.
Example 5-3 Populating the /etc/cluster/rhosts file

sydney:/ # cat /etc/cluster/rhosts
sydney
perth
sydneyb2
perthb2
sydney:/ # stopsrc -s clcomd;startsrc -s clcomd

0513-044 The clcomd Subsystem was requested to stop.
0513-059 The clcomd Subsystem has been started. Subsystem PID is 4980906.

Performing a custom configuration
To perform a custom configuration, follow these steps:
1. Access the Initial Cluster Setup (Custom) panel (Figure 5-12) by following the path smitty
sysmirror  Custom Cluster Configuration  Cluster Nodes and Networks  Initial
Cluster Setup (Custom). This task shows how to use each option on this menu.
Initial Cluster Setup (Custom)
Cluster
Nodes
Networks
Network Interfaces
Figure 5-12 initial Cluster Setup (Custom) panel for a custom configuration
2. Define the cluster:

a. From the Initial Cluster Setup (Custom) panel (Figure 5-12), follow the path Cluster 
Add/Change/Show a Cluster.
b. In the Add/Change/Show a Cluster panel (Figure 5-13), define the cluster name,
australia.
Add/Change/Show a Cluster

[Entry Fields]
* Cluster Name [australia]
Figure 5-13 Adding a cluster
3. Add the nodes:

a. From the Initial Cluster Setup (Custom) panel, select the path Nodes  Add a Node,
b. In the Add a Node panel (Figure 5-14), specify the first node, sydney, and the path that
is taken to initiate communication with the node. The cluster Node Name might be
different from the host name of the node.
c. Add the second node, perth, in the same way as you did for the sydney node.
Add a Node

Entry Fields]
* Node Name [sydney]
Communication Path to Node [sydney] +
Figure 5-14 Add a Node panel

4. Add a network:
a. From the Initial Cluster Setup (Custom) panel, follow the path Networks  Add a
Network.
b. In the Add a Network panel (Figure 5-15), For Network Type, select ether.
c. Define a PowerHA logical network, ether01, and specify its netmask. This logical
network is later populated with the corresponding base and service IP labels. You can
define more networks if needed.
Add a Network

[Entry Fields]
* Network Name [ether01]
* Network Type ether
* Netmask(IPv4)/Prefix Length(IPv6) [255.255.252.0]
Figure 5-15 Add a Network panel
5. Add the network interfaces:

a. From the Initial Cluster Setup (Custom) panel, follow the path Network Interfaces 
Add a Network Interface.
b. Select the logical network and populate it with the appropriate interfaces. In the
example shown in Figure 5-16, we select the only defined ether01 network, and add
the interface sydneyb2 on the sydney node. Add in all the other interfaces in the same
way.
Tip: You might find it useful to remember the following points:

The sydneyb1 and perthb1 addresses are defined in the same subnet network.
The sydnetb2 and perthb2 addresses are defined in another subnet network.
All interfaces must have the same network mask.
Add a Network Interface

[Entry Fields]
* IP Label/Address [sydneyb2] +
* Network Type ether
* Network Name ether01
* Node Name [sydney] +
Network Interface []
Figure 5-16 Add a Network Interface panel

6. Define the repository disk and cluster IP address:
a. From the Initial Cluster Setup (Custom) panel, select the Define Repository Disk and
Cluster IP Address option.
b. Choose the physical disk that is used as a central repository of the cluster configuration
and specify the multicast IP address to be associated with this cluster. In the example
shown in Figure 5-17, we let the cluster automatically generate a default value for the
multicast IP address.
Define Repository and Cluster IP Address

[Entry Fields]
* Cluster Name australia
* Repository Disk [hdisk1] +
Cluster IP Address []
Figure 5-17 Define Repository Disk and Cluster IP Address panel
Verifying and synchronizing the custom configuration

With the cluster topology defined, you can verify and synchronize the cluster for the first time.
When the first Verify and Synchronize Cluster Configuration action is successful, the
underlying CAA cluster is activated, and the heartbeat messages begin. We use the
customizable version of the Verify and Synchronize Cluster Configuration command.
Figure 5-18 shows an example where the Automatically correct errors found during
verification? option changed from the default value of No to Yes.
PowerHA SystemMirror Verification and Synchronization

[Entry Fields]
* Verify, Synchronize or Both [Both] +
* Include custom verification library checks [Yes] +
* Automatically correct errors found during [Yes] +
verification?
* Force synchronization if verification fails? [No] +

* Verify changes only? [No] +
* Logging [Standard] +
Figure 5-18 Verifying and synchronizing the cluster configuration (advanced)

Upon successful synchronization, check the PowerHA topology and the CAA cluster
configuration by using cltopinfo and lscluster -c commands on any node. Example 5-4
shows usage of the PowerHA cltopinfo command. It also shows how the topology
configured on the node sydney looks on the node perth after synchronization.
Example 5-4 PowerHA cluster topology

perth:/ # cltopinfo
Repository Disk: caa_private0
Cluster IP Address:
NODE perth:
Network ether01
perthb2 192.168.201.136
perth 192.168.101.136
NODE sydney:
Network ether01
sydneyb2 192.168.201.135
sydney 192.168.101.135
Example 5-5 shows a summary configuration of the CAA cluster created during the
synchronization phase.
Example 5-5 CAA cluster summary configuration

perth:/ # lscluster -c
Cluster query for cluster australia returns:
Cluster id for node perth is 1
Primary IP address for node perth is 192.168.101.136
Cluster id for node sydney is 2
Primary IP address for node sydney is 192.168.101.135
For more details about the CAA cluster status, see the following section.
Initial CAA cluster status

Check the status of the CAA cluster by using lscluster command. As shown in Example 5-6,
the lscluster -m command lists the node and point-of-contact status information. A
point-of-contact status indicates that a node has received communication packets across this
interface from another node.
Example 5-6 CAA cluster node status

sydney:/ # lscluster -m

Node name: perth
uuid for node: 15bef17c-cbcf-11df-951c-00145e5e3182
State of node: UP
australia local 98f28ffa-cfde-11df-9a82-00145ec5bf9a

sfwcom UP
en2 UP
en1 UP
------------------------------
Node name: sydney

uuid for node: f6a81944-cbce-11df-87b6-00145ec5bf9a
australia local 98f28ffa-cfde-11df-9a82-00145ec5bf9a

n/a
sydney:/ #
Example 5-7 shows detailed interface information provided by the lscluster -i command. It
shows information about the network interfaces and the other two logical interfaces that are
used for cluster communication:
sfwcom The node connection to the SAN-based communication channel.
dpcom The node connection to the repository disk.
Example 5-7 CAA cluster interface status

sydney:/ # lscluster -i

Node sydney

Interface state UP
IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255 netmask
255.255.252.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface state UP
255.255.252.0
netmask 0.0.0.0
Interface state UP
Node perth

Interface state UP
255.255.252.0
netmask 0.0.0.0
Interface state UP
255.255.252.0
netmask 0.0.0.0
Interface state UP

5.1.5 Configuring resources and applications
This section continues to build up the cluster by configuring its resources, resource groups,
and application controllers. The goal is to prepare the setup that is needed to introduce the
new Start After and Stop After resource group dependencies in PowerHA 7.1. For a
configuration example for these dependencies, see 5.1.6, “Configuring Start After and Stop
After resource group dependencies” on page 96.
Adding storage resources and resource groups from C-SPOC

To add storage resources and resource groups form C-SPOC, follow these steps:
1. Use the smitty cl_lvm fast path or follow the path smitty sysmirror  System
Management (C-SPOC)  Storage to configure storage resources.
2. Create two volume groups, dbvg and appvg. In the Storage panel (Figure 5-19), select the
path Volume Groups  Create a Volume Group (smitty cl_createvg fast path).
Storage
Volume Groups
Logical Volumes
File Systems
Physical Volumes
Figure 5-19 C-SPOC storage panel
The Volume Groups option is the preferred method for creating a volume group, because it
is automatically configured on all of the selected nodes. Since the release of PowerHA 6.1,
most operations on volume groups, logical volumes, and file systems no longer require
these objects to be in a resource group. Smart menus check for configuration and state
problems and prevent invalid operations before they can be initiated.

3. In the Volume Groups panel, in the Node Names dialog (Figure 5-20), select the nodes for
configuring the volume groups.
Volume Groups
List All Volume Groups

Create a Volume Group
Create a Volume Group with Data Path Devices
Set Characteristics of a Volume Group

Enable a Volume Group for Fast Disk Takeover or Concurrent Access
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
• Node Names •
• •
• Move cursor to desired item and press Esc+7. •
• ONE OR MORE items can be selected. •
• Press Enter AFTER making all selections. •
• •
• > perth •
• > sydney •
• •
• Esc+7=Select Esc+8=Image Esc+0=Exit •
F1• Enter=Do /=Find n=Find Next •
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 5-20 Nodes selection

In the Volume Groups panel (Figure 5-21), only the physical shared disks that are
accessible on the selected nodes are displayed (Physical Volume Names menu).
4. In the Physical Volume Names menu (inset in Figure 5-21), select the volume group type.
Volume Groups


••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
• Physical Volume Names •
• •
• •
• 00c1f170674f3d6b ( hdisk1 on all selected nodes ) •
• 00c1f1706751bc0d ( hdisk2 on all selected nodes ) •
• •
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 5-21 Shared disk selection
PVID: This step automatically creates physical volume IDs (PVIDs) for the unused (no
PVID) shared disks. A shared disk might have different names on selected nodes, but
the PVID is the same.

5. In the Create a Volume Group panel (Figure 5-22), specify the volume group name and
the resource group name.
Use the Resource Group Name field to include the volume group into an existing resource
group or automatically create a resource group to hold this volume group. After the
resource group is created, synchronize the configuration for this change to take effect
across the cluster.

[TOP] [Entry Fields]

Node Names perth,sydney
Resource Group Name [dbrg]
+
PVID 00c1f170674f3d6b
VOLUME GROUP name [dbvg]
Physical partition SIZE in megabytes 4 +
Volume group MAJOR NUMBER [37] #
Enable Cross-Site LVM Mirroring Verification false +
Enable Fast Disk Takeover or Concurrent Access Fast Disk Takeover +
Volume Group Type Original
CRITICAL volume group? no +
Figure 5-22 Creating a volume group in C-SPOC

6. Leave the resource group field empty and create or associate the resource group later.
When a volume group is known on multiple nodes, it is displayed in pick lists as <Not in a
Resource Group>. Figure 5-23 shows an example of a pick list.
Logical Volumes
List All Logical Volumes by Volume Group

Add a Logical Volume
Show Characteristics of a Logical Volume
Set Characteristics of a Logical Volume
Change a Logical Volume
Remove a Logical Volume
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
• Select the Volume Group that will hold the new Logical Volume •
• •
• •
• #Volume Group Resource Group Node List •
• appvg <Not in a Resource Group> perth,sydney •
• caavg_private <Not in a Resource Group> perth,sydney •
• dbvg dbrg perth,sydney •
• •
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 5-23 Adding a logical volume in C-SPOC
7. In the C-SPOC Storage panel (Figure 5-19 on page 86), define the logical volumes and
file systems by selecting the Logical Volumes and File Systems options. The
intermediate and final panels for these actions are similar to those panels in previous
releases.
You can list the file systems that you created by following the path C-SPOC Storage 
File Systems  List All File Systems by Volume Group. The COMMAND STATUS
panel (Figure 5-24) shows the list of file systems for this example.
COMMAND STATUS
#File System Volume Group Resource Group Node List

/appmp appvg <None> sydney,perth
/clrepos_private1 caavg_private <None> sydney,perth
/clrepos_private2 caavg_private <None> sydney,perth
/dbmp dbvg dbrg sydney,perth
Figure 5-24 Listing of file systems in C-SPOC

Resources and resource groups
By following the path smitty sysmirror  Cluster Applications and Resources, you see
the Cluster Applications and Resources menu (Figure 5-25) for resources and resource group
management.
Make Applications Highly Available (Use Smart Assists)

Resources
Resource Groups

Figure 5-25 Cluster Applications and Resources menu
Smart Assists: The “Make Applications Highly Available (Use Smart Assists)” function
leads to a menu of all installed Smart Assists. If you do not see the Smart Assist that you
need, verify that the corresponding Smart Assist file set is installed.
Configuring application controllers

To configure the application controllers, follow these steps:
1. From the Cluster Applications and Resources menu, select Resources.
2. In the Resources menu (Figure 5-26), select the Configure User Applications (Scripts
and Monitors) option to configure the application scripts.
Alternatively, use the smitty cm_user_apps fast path or smitty sysmirror  Cluster
Applications and Resources  Resources  Configure User Applications (Scripts
and Monitors).
Resources
Configure User Applications (Scripts and Monitors)

Configure Service IP Labels/Addresses
Configure Tape Resources

Figure 5-26 Resources menu

3. In the Configure User Applications (Scripts and Monitors) panel (Figure 5-27), select the
Application Controller Scripts option.
Configure User Applications (Scripts and Monitors)
Application Controller Scripts

Application Monitors
Configure Application for Dynamic LPAR and CoD Resources
Show Cluster Applications

Figure 5-27 Configure user applications (scripts and monitors)
4. In the Application Controller Scripts panel (Figure 5-28), select the Add Application
Controller Scripts option.
Application Controller Scripts
Add Application Controller Scripts

Change/Show Application Controller Scripts
Remove Application Controller Scripts
What is an "Application Controller" anyway ?

Figure 5-28 Application controller scripts

5. In the Add Application Controller Scripts panel (Figure 5-29), which looks similar to the
panels in previous versions, follow these steps:
a. In the Application Controller Name field, type the name that you want use as a label for
your application. In this example, we use the name dbac.
b. As in previous versions, in the Start Script field, provide the location of your application
start script.
c. In the Stop Script field, specify the location of your stop script. In this example, we
specify /HA71/db_start.sh as the start script and /HA71/db_stop.sh as the stop script.
d. Optional: To monitor your application, in the Application Monitor Name(s) field, select
one or more application monitors. However, you must define the application monitors
before you can use them here. For an example, see “Configuring application
monitoring for the target resource group” on page 98.
Add Application Controller Scripts

[Entry Fields]
* Application Controller Name [dbac]
* Start Script [/HA71/db_start.sh]
* Stop Script [/HA71/db_stop.sh]
Application Monitor Name(s) +
Figure 5-29 Adding application controller scripts
The configuration of the applications is completed. The next step is to configure the service IP
addresses.
Configuring IP service addresses

To configure the IP service addresses, follow these steps:
1. Return to the Resource panel (Figure 5-26 on page 91) by using the
smitty cm_resources_menu fast path or smitty sysmirror  Cluster Applications and
Resources  Resources.
2. In the Resource panel, select the Configure Service IP Labels/Addresses option.
3. In the Configure Service IP Labels/Addresses menu (Figure 5-30), select the Add a
Service IP Label/Address option.
Add a Service IP Label/Address

Change/ Show a Service IP Label/Address
Remove Service IP Label(s)/Address(es)
Configure Service IP Label/Address Distribution Preferences
Figure 5-30 Configure Service IP Labels/Addresses menu

4. In the Network Name subpanel (Figure 5-31), select the network to which you want to add
the Service IP Address. In this example, only one network is defined.

Change/ Show a Service IP Label/Address
Remove Service IP Label(s)/Address(es)
Configure Service IP Label/Address Distribution Preferences
+--------------------------------------------------------------------------+
| Network Name |
| |
| |
| ether01 (192.168.100.0/22 192.168.200.0/22) |
| |
| F8=Image F10=Exit Enter=Do |
F9+--------------------------------------------------------------------------+
Figure 5-31 Network Name subpanel for the Add a Service IP Label/Address option
5. In the Add a Service IP Label/Address panel, which changes as shown in Figure 5-32, in
the IP Label/Address field, select the service address that you want to add.
Service address defined: As in previous versions, the service address must be

defined in the /etc/hosts file. Otherwise, you cannot select it by using the F4 key.
You can use the Netmask(IPv4)/Prefix Length(IPv6) field to define the netmask. With IPv4,
you can leave this field empty. The Network Name field is prefilled.

[Entry Fields]
* IP Label/Address sydneys +
Netmask(IPv4)/Prefix Length(IPv6) []
* Network Name ether01
Figure 5-32 Details of the Add a Service IP Label/Address panel
You have now finished configuring the resources. In this example, you defined one service IP
address. If you need to add more service IP addresses, repeat the steps as indicated in this
section.
As explained in the following section, the next step is to configure the resource groups.

Configuring resource groups
To configure the resource groups, follow these steps:
1. Go to the Cluster Applications and Resources panel (Figure 5-25 on page 91).
Alternatively, use the smitty cm_apps_resources fast path or smitty sysmirror  Cluster
Applications and Resources.
2. In the Cluster Applications and Resources panel, select Resource Groups.
3. In the Resource Groups menu (Figure 5-33), add a resource group by selecting the Add a
Resource Group option.
Resource Groups
Add a Resource Group

Change/Show Nodes and Policies for a Resource Group
Change/Show Resources and Attributes for a Resource Group
Remove a Resource Group
Configure Resource Group Run-Time Policies
Show All Resources by Node or Resource Group
What is a "Resource Group" anyway ?

Figure 5-33 Resource Groups menu
4. In the Add a Resource Group panel (Figure 5-34), as in previous versions of PowerHA,
specify the resource group name, the participating nodes, and the policies.
Add a Resource Group

[Entry Fields]
* Resource Group Name [dbrg]
* Participating Nodes (Default Node Priority) [sydney perth] +
Startup Policy Online On Home Node O> +

Fallover Policy Fallover To Next Prio> +
Fallback Policy Fallback To Higher Pr> +
Figure 5-34 Add a Resource Group panel

5. Configure the resources into the resource group. If you need more than one resource
group, repeat the previous step to add a resource group.
a. To configure the resources to the resource group, go back to the Resource Groups
panel (Figure 5-33 on page 95), and select the Change/Show Resources and
Attributes for a Resource Group.
b. In the Change/Show Resources and Attributes for a Resource Group panel
(Figure 5-35), define the resources for the resource group.
Change/Show All Resources and Attributes for a Resource Group


Resource Group Name dbrg
Participating Nodes (Default Node Priority) sydney perth
Startup Policy Online On Home Node O>

Fallover Policy Fallover To Next Prio>
Fallback Policy Fallback To Higher Pr>
Fallback Timer Policy (empty is immediate) [] +
Service IP Labels/Addresses [sydneys] +

Application Controllers [dbac] +
Volume Groups [dbvg] +

Use forced varyon of volume groups, if necessary false +
[MORE...24]
Figure 5-35 Change/Show All Resources and Attributes for a Resource Group panel
You have now finished configuring the resource group.
Next, you synchronize the cluster nodes. If the Verify and Synchronize Cluster Configuration
task is successfully completed, you can start your cluster. However, you might first want to
see if the CAA cluster was successfully created by using the lscluster -c command.
5.1.6 Configuring Start After and Stop After resource group dependencies
In this section, you configure a Start After resource group dependency and similarly create a
Stop After resource group dependency. For more information about Start After and Stop After
resource group dependencies, see 2.5.1, “Start After and Stop After resource group
dependencies” on page 32.

You can manage Start After dependencies between resource groups by following the path
smitty sysmirror  Cluster Applications and Resources  Resource Groups 
Configure Resource Group Run-Time Policies  Configure Dependencies between
Resource Groups  Configure Start After Resource Group Dependency. Figure 5-36
shows the Configure Start After Resource Group Dependency menu.
Configure Start After Resource Group Dependency
Add Start After Resource Group Dependency

Change/Show Start After Resource Group Dependency
Remove Start After Resource Group Dependency
Display Start After Resource Group Dependencies
Figure 5-36 Configuring Start After Resource Group dependency menu
To add a new dependency, in the Configure Start After Resource Group Dependency menu,
select the Add Start After Resource Group Dependency option. In this example, we
already configured the dbrg and apprg resource groups. The apprg resource group is defined
as the source (dependent) resource group as shown in Figure 5-37.

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
• Select the Source Resource Group •
• •
• •
• apprg •
• dbrg •
• •
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 5-37 Selecting the source resource group of a Start After dependency

Figure 5-38 shows dbrg resource group defined as the target resource group.

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
• Select the Target Resource Group •
• •
• •
• dbrg •
• •
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 5-38 Selecting the target resource group of a Start After dependency
Example 5-8 shows the result.
Example 5-8 Start After dependency configured

sydney:/ # clrgdependency -t'START_AFTER' -sl
#Source Target
apprg dbrg
sydney:/ #
Configuring application monitoring for the target resource group

The Start After dependency guarantees that only the source resource group is started after
the target resource group is started. You might need the application in your source resource
group (source startup script) to start only after a full and successful start of the application in
your target resource group (after target startup script returns 0). In this case, you must
configure the startup monitoring for your target application. The dummy scripts in
Example 5-9 show the configuration of the test cluster.
Example 5-9 Dummy scripts for target and source applications

sydney:/HA71 # ls -l
total 48
-rwxr--r-- 1 root system 226 Oct 12 07:00 app_mon.sh
-rwxr--r-- 1 root system 283 Oct 12 07:06 app_start.sh
-rwxr--r-- 1 root system 233 Oct 12 07:03 app_stop.sh
-rwxr--r-- 1 root system 201 Oct 12 06:03 db_mon.sh
-rwxr--r-- 1 root system 274 Oct 12 07:24 db_start.sh
-rwxr--r-- 1 root system 229 Oct 12 06:04 db_stop.sh

The remainder of this task continues from the configuration started in “Configuring application
controllers” on page 91. You only have to add a monitor for the dbac application controller that
you already configured.
Follow the path smitty sysmirror  Cluster Applications and Resources 

Resources  Configure User Applications (Scripts and Monitors)  Add Custom
Application Monitor. The Add Custom Application Monitor panel (Figure 5-39) is displayed.
We do not explain the fields here because they are the same as the fields in previous
versions. However, keep in mind that the Monitor Mode value Both means both startup
monitoring and long-running monitoring.
Add Custom Application Monitor

[Entry Fields]
* Monitor Name [dbam]
* Application Controller(s) to Monitor dbac +
* Monitor Mode [Both] +
* Monitor Method [/HA71/db_mon.sh]
Monitor Interval [30] #
Hung Monitor Signal [] #
* Stabilization Interval [120] #
* Restart Count [3] #
Restart Interval [] #
* Action on Application Failure [fallover] +
Notify Method [/HA71/db_stop.sh]
Cleanup Method [/HA71/db_start.sh]
Restart Method []
Figure 5-39 Adding the dbam custom application monitor

Similarly, you can configure an application monitor and an application controller for the apprg
resource group as shown in Figure 5-40.
Change/Show Custom Application Monitor

[Entry Fields]
* Monitor Name appam
Application Controller(s) to Monitor appac +
* Monitor Mode [Long-running monitori> +
* Monitor Method [/HA71/app_mon.sh]
Hung Monitor Signal [9] #
Restart Count [3] #
Restart Interval [594] #
Notify Method []
Cleanup Method [/HA71/app_stop.sh]
Restart Method [/HA71/app_start.sh]
Figure 5-40 Configuring the appam application monitor and appac application controller
For a series of tests performed on this configuration, see 9.8, “Testing a Start After resource
group dependency” on page 297.
5.1.7 Creating a user-defined resource type

Now create a user-defined resource type by using SMIT:
1. To define a user-defined resource type, follow the path smitty sysmirror  Custom
Cluster Configuration  Resources  Configure User Defined Resources and
Types  Add a User Defined Resource Type.
Resource type management: PowerHA SystemMirror automatically manages most

resource types.

2. In the Add a User Defined Resource Type panel (Figure 5-41), define a resource type.
Also select the processing order from the pick list.
Add a User Defined Resource Type

[Entry Fields]
* Resource Type Name [my_resource_type]
* Processing order [] +
Verification Method []
Verification Type [Script] +
Start Method []
Stop Method []
+--------------------------------------------------------------------------+
¦ Processing order ¦
¦ ¦
¦ Move cursor to desired item and press Enter. ¦
¦ ¦
¦ FIRST ¦
¦ WPAR ¦
¦ VOLUME_GROUP ¦
¦ FILE_SYSTEM ¦
¦ SERVICEIP ¦
¦ TAPE ¦
¦ APPLICATION ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
F1¦ Esc+8=Image Esc+0=Exit Enter=Do ¦
Es¦ /=Find n=Find Next ¦
Es+--------------------------------------------------------------------------+
Figure 5-41 Adding a user-defined resource type
3. After you create your own resource, add it to the resource group. The resource group can
be shown in the pick list. This information is stored in the HACMresourcetype,
HACMPudres_def, and HACMPudresouce cluster configuration files.

5.1.8 Configuring the dynamic node priority (adaptive failover)
As mentioned in 2.5.3, “Dynamic node priority: Adaptive failover” on page 35, in PowerHA
7.1, you can decide node priority based on the return value of your own script. To configure
the dynamic node priority (DNP), follow these steps:
1. Follow the path smitty sysmirror  Cluster Applications and Resource  Resource
Groups  Change/Show Resources and Attributes for a Resource Group (if you
already have your resource group).
As you can see in Change/Show Resources and Attributes for a Resource Group panel
(Figure 5-42), the algeria_rg resource group has default node priority. The participating
nodes are algeria, brazil, and usa.
2. To configure DNP, choose the dynamic node priority policy. In this example, we chose
cl_lowest_nonzero_udscript_rc as the dynamic node priority. Usage of this DNP means
the node that has the lowest return from the DNP script gets the highest priority among the
nodes. Also define the DNP script path and timeout value.
DNP script for the nodes: Ensure that all nodes have the DNP script and that the
script has executable mode. Otherwise, you receive an error message while running the
synchronization or verification process.
For a description of this test scenario, see 9.9, “Testing dynamic node priority” on
page 302.
Change/Show All Resources and Attributes for a Resource Group


Resource Group Name algeria_rg
Participating Nodes (Default Node Priority) algeria brazil usa
* Dynamic Node Priority Policy [cl_lowest_nonzero_uds> +
DNP Script path <HTTPServer/bin/DNP.sh] /
DNP Script timeout value [20] #
Startup Policy Online On Home Node O>

Fallover Policy Fallover Using Dynami>
Fallback Policy Fallback To Higher Pr>
Fallback Timer Policy (empty is immediate) [] +
[MORE...11]
F1=Help F2=Refresh F3=Cancel F4=List

Esc+5=Reset Esc+6=Command Esc+7=Edit Esc+8=Image
Esc+9=Shell Esc+0=Exit Enter=Do
Figure 5-42 Configuring DNP in a SMIT session

5.1.9 Removing a cluster
You can remove your cluster by using the path smitty sysmirror  Cluster Nodes and
Networks  Manage the Cluster  Remove the Cluster Definition.
Removing a cluster consists of deleting the PowerHA definition and deleting the CAA cluster
from AIX. Removing the CAA cluster is the last step of the Remove operation as shown in
Figure 5-43.
COMMAND STATUS
Attempting to delete node "Perth" from the cluster...

Attempting to delete the local node from the cluster...
Attempting to delete the cluster from AIX ...
F1=Help F2=Refresh F3=Cancel Esc+6=Command

Esc+8=Image Esc+9=Shell Esc+0=Exit /=Find
n=Find Next
Figure 5-43 Removing the cluster
Normally, deleting the cluster with this method removes both the PowerHA SystemMirror and
the CAA cluster definitions from the system. If a problem is encountered while PowerHA is
trying to remove the CAA cluster, you might need to delete the CAA cluster manually. For
more information, see Chapter 10, “Troubleshooting PowerHA 7.1” on page 305.
After you remove the cluster, ensure that the caavg_private volume group is no longer
displayed as shown in Figure 5-44.
--- before ---

# lspv
caa_private0 000fe40120e16405 caavg_private active
hdisk6 000fe4114cf8d258 dbvg
hdisk7 000fe4114cf8d2ec applvg
hdisk0 000fe411201305c3 rootvg active
--- after ---
# lspv
hdisk5 000fe40120e16405 None
hdisk6 000fe4114cf8d258 dbvg
hdisk7 000fe4114cf8d2ec applvg
hdisk0 000fe411201305c3 rootvg active
Figure 5-44 The lspv command output before and after removing a cluster

5.2 Cluster configuration using the clmgr tool
PowerHA 7.1 introduces the clmgr command-line tool. This tool is partially new. It is based on
the clvt tool with the following improvements:
Consistent usage across the supported functions
Improved usability
Improved serviceability
Uses fully globalized message catalog
Multiple levels of debugging
Automatic help
To see the possible values for the attributes, use the man clvt command.
5.2.1 The clmgr action commands

The following actions are currently supported in the clmgr command:
add
delete
manage
modify
move
offline
online
query
recover
sync
view
For a list of actions, you can use clmgr command with no arguments. See “The clmgr
command” on page 106 and Example 5-10 on page 106.
Most of the actions in the list provide aliases. Table 5-1 shows the current actions and their
abbreviations and aliases.
Table 5-1 Command aliases

Actual Synonyms or aliases
add a, create, make, mk
query q, ls, get
modify mod, ch, set
delete de, rem, rm er
online on, start
offline off, stop
move mov, mv
recover rec
sync sy
verify ve
view vi, cat

Actual Synonyms or aliases
manage mg
5.2.2 The clmgr object classes

The following object classes are currently supported:
application_controller
application_monitor
cluster
dependency
fallback_timer
file_collection
file_system (incomplete coverage)
interface
logical_volume (incomplete coverage)
method (incomplete coverage)
network
node
persistent_ip
physical_volume (incomplete coverage)
report
resource_group
service_ip
snapshot
site
tape
volume_group (incomplete coverage)
For a list, you can use clmgr with no arguments. See “The clmgr query command” on
page 107 and Example 5-11 on page 107.
Most of these object classes in the list provide aliases. Table 5-2 on page 105 lists the current
object classes and their abbreviations and aliases.
Table 5-2 Object classes with aliases

Actual Minimum string
Cluster cl
site si
node no
interface in, if
network ne, nw
resource_group rg
service_ip se
persistent_ip pe, pi
application_controller ac, app, appctl

5.2.3 Examples of using the clmgr command
This section provides information about some of the clmgr commands. An advantage of the
clmgr command compared to the clvt command is that it is not case-sensitive. For more
details about the clmgr command, see Appendix D, “The clmgr man page” on page 501.
For a list of the actions that are currently supported, see 5.2.1, “The clmgr action commands”
on page 104.
For a list, you can use clmgr command with no arguments. See “The clmgr command” on
For a list of object classes that are currently supported, see 5.2.2, “The clmgr object classes”
on page 105.
For a list, use the clmgr command with no arguments. See “The clmgr query command” on
For most of these actions and object classes, abbreviations and aliases are available. These
commands are not case-sensitive. You can find more details about the actions and their
aliases in “The clmgr action commands” on page 104. For more information about object
classes, see “The clmgr object classes” on page 105.
Error messages: At the time of writing, the clmgr error messages referred to clvt. This
issue will be fixed in a future release so that it references clmgr.
The clmgr command

Running the clmgr command with no arguments or with the -h option shows the operations
that you can perform. Example 5-10 shows the output that you see just by using the clmgr
command. You see similar output if you use the -h option. The difference between the clmgr
and clmgr -h commands is that, in the output of the clmgr -h command, the line with the
error message is missing. For more details about the -h option, see “Using help in clmgr” on
page 111.
Example 5-10 Output of the clmgr command with no arguments

# clmgr
ERROR: an invalid operation was requested:
clmgr [-c|-x] [-S] [-v] [-f] [-D] [-l {low|med|high|max}] [-T <ID>] \
[-a {<ATTR#1>,<ATTR#2>,<ATTR#n>,...}] <ACTION> <CLASS> [<NAME>] \
[-h | <ATTR#1>=<VALUE#1> <ATTR#2>=<VALUE#2> <ATTR#n>=<VALUE#n> ...]
clmgr [-c|-x] [-S] [-v] [-f] [-D] [-l {low|med|high|max}] [-T <ID>] \
[-a {<ATTR#1>,<ATTR#2>,<ATTR#n>,...}] -M "
<ACTION> <CLASS> [<NAME>] [<ATTR#1>=<VALUE#1> <ATTR#n>=<VALUE#n> ...]
.
.
."
ACTION={add|modify|delete|query|online|offline|...}
CLASS={cluster|site|node|network|resource_group|...}
clmgr {-h|-?} [-v]

clmgr [-v] help

# Available actions for clvt:
add
delete
help
manage
modify
move
offline
online
query
recover
sync
verify
view
The clmgr query command

Running the clmgr command with only the query argument generates a list of the supported
object classes as shown in Example 5-11. You see similar output if you use the -h option. The
difference between the clmgr query and clmgr query -h commands is that, in the output of
the clmgr query -h command, the lines with the object class names are indented. For more
details about the -h option, see “Using help in clmgr” on page 111.
Example 5-11 Output of the clmgr query command

# clmgr query
# Available classes for clvt action "query":

application_controller
application_monitor
cluster
dependency
fallback_timer
file_collection
file_system
interface
log
logical_volume
method
network
node
persistent_ip
physical_volume
resource_group
service_ip
site
smart_assist
snapshot
tape
volume_group

The clmgr query cluster command
You use the clmgr query cluster command to obtain detailed information about your cluster.
Example 5-12 show the output from the cluster used in the test environment.
Example 5-12 Output of the clmgr query cluster command

# clmgr query cluster
CLUSTER_NAME="hacmp29_cluster"
CLUSTER_ID="1126895238"
STATE="STABLE"
VERSION="7.1.0.1"
VERSION_NUMBER="12"
EDITION="STANDARD"
CLUSTER_IP=""
REPOSITORY="caa_private0"
SHARED_DISKS=""
UNSYNCED_CHANGES="false"
SECURITY="Standard"
FC_SYNC_INTERVAL="10"
RG_SETTLING_TIME="0"
RG_DIST_POLICY="node"
MAX_EVENT_TIME="180"
MAX_RG_PROCESSING_TIME="180"
SITE_POLICY_FAILURE_ACTION="fallover"
SITE_POLICY_NOTIFY_METHOD=""
DAILY_VERIFICATION="Enabled"
VERIFICATION_NODE="Default"
VERIFICATION_HOUR="0"
VERIFICATION_DEBUGGING="Enabled"
LEVEL=""
ALGORITHM=""
GRACE_PERIOD=""
REFRESH=""
MECHANISM=""
CERTIFICATE=""
PRIVATE_KEY=""
As mentioned previously, most clmgr actions and object classes provide aliases. Another
helpful feature of the clmgr command is the ability to understand abbreviated commands. For
example, the previous command can be shortened as follows:
# clmgr q cl
For more details about the capability of the clmgr command, see 5.2.1, “The clmgr action
commands” on page 104, and 5.2.2, “The clmgr object classes” on page 105. See also the
man pages listed in Appendix D, “The clmgr man page” on page 501.

The enhanced search capability
An additional feature of the clmgr command is that it provides an easy search capability with
the query action. Example 5-13 shows how to list all the defined resource groups.
Example 5-13 List of all defined resource groups

# clmgr query rg
rg1
rg2
rg3
rg4
rg5
rg6
#
You can also use more complex search expressions. Example 5-14 shows how you can use
simple regular expression command. In addition, you can search on more than one field, and
only those objects that match all provided searches are displayed.
Example 5-14 Simple regular expression command

# clmgr query rg name=rg[123]
rg1
rg2
rg3
#
The -a option
Some query commands produce a rather long output. You can use the -a (attributes) option
to obtain shorter output and for information about a single value as shown in Example 5-15.
You can also use this option to get information about several values as shown in
Example 5-16.
Example 5-15 List state of the cluster node

munich:/ # clmgr -a state query cluster
STATE="STABLE"
munich:/ #
Example 5-16 shows how to get information about the state and the location of a resource
group. The full output of the query command for the nfsrg resource group is shown in
Example 5-31 on page 123.
Example 5-16 List state and location of a resource group

munich:/ # clmgr -a STATE,Current query rg nfsrg
STATE="ONLINE"
CURRENT_NODE="berlin"
munich:/ #

You can also use wildcards for getting information about some values as shown in
Example 5-17.
Example 5-17 The -a option and wildcards

munich:/ # clmgr -a "*NODE*" query rg nfsrg
NODES="berlin munich"
PRIMARYNODES=""
PRIMARYNODES_STATE=""
SECONDARYNODES=""
SECONDARYNODES_STATE=""
NODE_PRIORITY_POLICY="default"
munich:/ #
The -v option
The -v (verbose) option is helpful when used with the query action as shown in
Example 5-18. You use this option almost exclusively in IBM Systems Director to scan the
cluster for information.
Example 5-18 The -v option for query all resource groups

munich:/ # clmgr -a STATE,current -v query rg
STATE="ONLINE"
CURRENT_NODE="munich"
STATE="ONLINE"
munich:/ #
If you do not use the -v option with the query action as shown in Example 5-18, you see an
error message similar to the one in Example 5-19.
Example 5-19 Error message when not using the -v option for query all resource groups
munich:/ # clmgr -a STATE,current query rg
ERROR: a name/label must be provided.
munich:/ #
Returning only one value

You might want only one value returned for a clmgr command. This requirement mainly
happens if you prefer to use the clmgr command in a script and do not like to get the
ATTR="VALUE" format. You only need the VALUE. Example 5-20 shows how you can ensure
that only one value is returned.
The command has the following syntax:

clmgr -cSa <ATTR> query <CLASS> <OBJECT>
Example 5-20 The command to return a single value from the clmgr command
# clmgr -cSa state query rg rg1
ONLINE
#

5.2.4 Using help in clmgr
You can use the -h option in combination with actions and object classes. For example, if you
want to know how to add a resource group to an existing cluster, you can use the clmgr add
resource_group -h command. Example 5-21 shows the output of using this command. For an
example of using the clmgr add resource_group command, see Example 5-28 on page 121.
Example 5-21 Help for adding resource group using the clmgr command
# clmgr add resource_group -h
# Available options for "clvt add resource_group":
<RESOURCE_GROUP_NAME>
NODES
PRIMARYNODES
SECONDARYNODES
FALLOVER
FALLBACK
STARTUP
FALLBACK_AT
SERVICE_LABEL
APPLICATIONS
VOLUME_GROUP
FORCED_VARYON
VG_AUTO_IMPORT
FILESYSTEM
FSCHECK_TOOL
RECOVERY_METHOD
FS_BEFORE_IPADDR
EXPORT_FILESYSTEM
EXPORT_FILESYSTEM_V4
MOUNT_FILESYSTEM
STABLE_STORAGE_PATH
WPAR_NAME
NFS_NETWORK
SHARED_TAPE_RESOURCES
DISK
AIX_FAST_CONNECT_SERVICES
COMMUNICATION_LINKS
WLM_PRIMARY
WLM_SECONDARY
MISC_DATA
CONCURRENT_VOLUME_GROUP
NODE_PRIORITY_POLICY
NODE_PRIORITY_POLICY_SCRIPT
NODE_PRIORITY_POLICY_TIMEOUT
SITE_POLICY
Object class names between the angle brackets (<>) are required information, which does not
mean that all the other items are optional. Some items might not be marked because of other
dependencies. In Example 5-22 on page 112, only CLUSTER_NAME is listed as required, but
because of the new CAA dependency, the REPOSITORY (disk) is also required. For more
details about how to create a cluster using the clmgr command, see “Configuring a new
cluster using the clmgr command” on page 113.

Example 5-22 Help for creating a cluster
# clmgr add cluster -h
# Available options for "clvt add cluster":
<CLUSTER_NAME>
FC_SYNC_INTERVAL
NODES
REPOSITORY
SHARED_DISKS
CLUSTER_IP
RG_SETTLING_TIME
RG_DIST_POLICY
MAX_EVENT_TIME
MAX_RG_PROCESSING_TIME
SITE_POLICY_FAILURE_ACTION
SITE_POLICY_NOTIFY_METHOD
DAILY_VERIFICATION
VERIFICATION_NODE
VERIFICATION_HOUR
VERIFICATION_DEBUGGING
5.2.5 Configuring a PowerHA cluster using the clmgr command

In this section, you configure the two-node mutual takeover cluster with a focus on the
PowerHA configuration only. The system names are munich and berlin. This task does not
include the preliminary steps, which include setting up the IP interfaces and the shared disks.
For details and an example of the output, see “Starting the cluster using the clmgr command”
on page 127. All the steps in the referenced section were executed on the munich system.
To configure a PowerHA cluster by using the clmgr command, follow these steps:
1. Configure the cluster:
# clmgr add cluster de_cluster NODES=munich,berlin REPOSITORY=hdisk4
For details, see “Configuring a new cluster using the clmgr command” on page 113.
2. Configure the service IP addresses:
# clmgr add service_ip alleman NETWORK=net_ether_01 NETMASK=255.255.255.0
# clmgr add service_ip german NETWORK=net_ether_01 NETMASK=255.255.255.0
For details, see “Defining the service address using the clmgr command” on page 118.
3. Configure the application server:
# clmgr add application_controller http_app \
> STARTSCRIPT="/usr/IBM/HTTPServer/bin/apachectl -k start" \
> STOPSCRIPT="/usr/IBM/HTTPServer/bin/apachectl -k stop"
For details, see “Defining the application server using the clmgr command” on page 120.
4. Configure a resource group:
# clmgr add resource_group httprg VOLUME_GROUP=httpvg NODES=munich,berlin \
> SERVICE_LABEL=alleman APPLICATIONS=http_app
# clmgr add resource_group nfsrg VOLUME_GROUP=nfsvg NODES=berlin,munich \

> SERVICE_LABEL=german FALLBACK=NFB RECOVERY_METHOD=parallel \
> FS_BEFORE_IPADDR=true EXPORT_FILESYSTEM="/nfsdir" \
> MOUNT_FILESYSTEM="/sap;/nfsdir"

For details, see “Defining the resource group using the clmgr command” on page 120.
5. Sync the cluster:
clmgr sync cluster
For details, see “Synchronizing the cluster definitions by using the clmgr command” on
page 124.
6. Start the cluster:
clmgr online cluster start_cluster BROADCAST=false CLINFO=true
For details, see “Starting the cluster using the clmgr command” on page 127.
Command and syntax of clmgr: To ensure a robust and easy-to-use SMIT interface,
when using the clmgr command or CLI to configure or manage the PowerHA cluster, you
must use the correct command and syntax.
Configuring a new cluster using the clmgr command

Creating a cluster using the clmgr command is similar to using the typical configuration
through SMIT (described in 5.1.3, “Typical configuration of a cluster topology” on page 69). If
you want a method that is similar to the custom configuration in SMIT, you must use a
combination of the classical PowerHA commands and the clmgr command. The steps in the
following sections use the clmgr command only.
Preliminary setup
Prerequisite: In this section, you must know how to set up the prerequisites for a PowerHA
cluster.
The IP interfaces are already defined and the shared volume groups and file systems have
been created. The host names of the two systems are munich and berlin. Figure 5-45 shows
the disks and shared volume groups that are defined so far. hdisk4 is used as the CAA
repository disk.
munich:/ # lspv
hdisk1 00c0f6a012446137 httpvg
hdisk2 00c0f6a01245190c httpvg
hdisk3 00c0f6a012673312 nfsvg
hdisk4 00c0f6a01c784107 None
hdisk0 00c0f6a07c5df729 rootvg active
munich:/ #
Figure 5-45 List of available disks

Figure 5-46 shows the network interfaces that are defined on the munich system.
munich:/ # netstat -i
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en0 1500 link#2 a2.4e.58.a0.41.3 23992 0 24516 0 0
en0 1500 192.168.100 munich 23992 0 24516 0 0
en1 1500 link#3 a2.4e.58.a0.41.4 2 0 7 0 0
en1 1500 100.168.200 munichb1 2 0 7 0 0
en2 1500 link#4 a2.4e.58.a0.41.5 4324 0 7 0 0
en2 1500 100.168.220 munichb2 4324 0 7 0 0
lo0 16896 link#1 16039 0 16039 0 0
lo0 16896 127 localhost.locald 16039 0 16039 0 0
lo0 16896 localhost6.localdomain6 16039 0 16039 0 0
munich:/ #
Figure 5-46 Defined network interfaces
Creating the cluster

To begin, define the cluster along with the repository disk. If you do not remember all the
options for creating a cluster with the clmgr command, use the clmgr add cluster -h
command. Example 5-22 on page 112 shows the output of this command.
Before you use the clmgr add cluster command, you must know which disk will be used for
the CAA repository disk. Example 5-23 shows the command and its output.
Table 5-3 provides more details about the command and arguments that are used.
Table 5-3 Creating a cluster using the clmgr command

Action, object class, or Value used Comment
argument
add Basic preferred action
cluster Basic object class used
CLUSTER_NAME de_cluster Optional argument name, but required value
NODES munich, berlin Preferred node to use in the cluster
REPOSITORY hdisk4 The disk for the CAA repository
Example 5-23 Creating a cluster using the clmgr command

munich:/ # clmgr add cluster de_cluster NODES=munich,berlin REPOSITORY=hdisk4
Cluster Name: de_cluster
Repository Disk: None
Cluster IP Address:
NODE berlin:
berlinb2 100.168.220.141
berlinb1 100.168.200.141

berlin 192.168.101.141
NODE munich:
munichb2 100.168.220.142
munichb1 100.168.200.142
munich 192.168.101.142

clharvest_vg: Initializing....
Gathering cluster information, which may take a few minutes...
clharvest_vg: Processing...
Storing the following information in file
/usr/es/sbin/cluster/etc/config/clvg_config
berlin:
Hdisk: hdisk1
PVID: 00c0f6a012446137
VGname: httpvg
VGmajor: 100
Conc-capable: Yes
VGactive: No
Quorum-required:Yes
Hdisk: hdisk2
PVID: 00c0f6a01245190c
VGname: httpvg
VGmajor: 100
Conc-capable: Yes
VGactive: No
Quorum-required:Yes
munich:
Hdisk: hdisk1
PVID: 00c0f6a012446137
VGname: httpvg
VGmajor: 100
Conc-capable: Yes
VGactive: No
Quorum-required:Yes
berlin:
Hdisk: hdisk3
PVID: 00c0f6a012673312
VGname: nfsvg
VGmajor: 200
Conc-capable: Yes
VGactive: No
Quorum-required:Yes
munich:
Hdisk: hdisk2

PVID: 00c0f6a01245190c
VGname: httpvg
VGmajor: 100
Conc-capable: Yes
VGactive: No
Quorum-required:Yes
berlin:
Hdisk: hdisk4
PVID: 00c0f6a01c784107
VGname: None
VGmajor: 0
Conc-capable: No
VGactive: No
Quorum-required:No
munich:
Hdisk: hdisk3
PVID: 00c0f6a012673312
VGname: nfsvg
VGmajor: 200
Conc-capable: Yes
VGactive: No
Quorum-required:Yes
berlin:
Hdisk: hdisk0
PVID: 00c0f6a048cf8bfd
VGname: rootvg
VGmajor: 10
Conc-capable: No
VGactive: Yes
Quorum-required:Yes
FREEMAJORS: 35..99,101..199,201...
munich:
Hdisk: hdisk4
PVID: 00c0f6a01c784107
VGname: None
VGmajor: 0
Conc-capable: No
VGactive: No
Quorum-required:No
Hdisk: hdisk0
PVID: 00c0f6a07c5df729
VGname: rootvg
VGmajor: 10
Conc-capable: No
VGactive: Yes
Quorum-required:Yes
FREEMAJORS: 35..99,101..199,201...

Cluster IP Address:
NODE berlin:
berlinb2 100.168.220.141
berlinb1 100.168.200.141
berlin 192.168.101.141
NODE munich:
munichb1 100.168.200.142
munichb2 100.168.220.142
munich 192.168.101.142
Warning: There is no cluster found.

cllsclstr: No cluster defined
Communication path berlin discovered a new node. Hostname is berlin. Adding it to
the configuration with
Nodename berlin.
Communication path munich discovered a new node. Hostname is munich. Adding it to
the configuration with
Nodename munich.
Discovering IP Network Connectivity
Discovered [10] interfaces
IP Network Discovery completed normally
Discovering Volume Group Configuration
munich:/ #

To see the configuration up to this point, you can use the cltopinfo command. Keep in mind
that this information is local to the system on which you are working. Example 5-24 shows the
configuration up to this point.
Example 5-24 Output of the cltopinfo command after creating cluster definitions
munich:/ # cltopinfo
Cluster IP Address:
NODE berlin:
berlinb2 100.168.220.141
berlinb1 100.168.200.141
berlin 192.168.101.141
NODE munich:
munichb1 100.168.200.142
munichb2 100.168.220.142
munich 192.168.101.142

munich:/ #
Defining the service address using the clmgr command

Next you define the service addresses. Example 5-25 on page 119 shows the command and
its output.
The clmgr add cluster command: The clmgr add cluster command automatically runs
discovery on IP and volume group harvesting. It results in adding the IP network interfaces
automatically to the cluster configuration.
Table 5-4 Defining the service address using the clmgr command
argument
service_ip Basic object class used
SERVICE_IP_NAME alleman Optional argument name, but required value

german
NETWORK net_ether_01 The network name from the cltopinfo command used
previously

argument
NETMASK 255.255.255.0 Optional; when you specify a value, use the same one
that you used in setting up the interface.
Example 5-25 Defining the service address

munich:/ # clmgr add service_ip alleman NETWORK=net_ether_01 NETMASK=255.255.255.0
munich:/ # clmgr add service_ip german NETWORK=net_ether_01 NETMASK=255.255.255.0
munich:/ #
To check the configuration up to this point, use the cltopinfo command again. Example 5-26
shows the current configuration.
Example 5-26 The cltopinfo output after creating cluster definitions

munich:/ # cltopinfo
Cluster IP Address:
NODE berlin:
german 10.168.101.141
alleman 10.168.101.142
berlinb1 100.168.200.141
berlinb2 100.168.220.141
berlin 192.168.101.141
NODE munich:
german 10.168.101.141
alleman 10.168.101.142
munichb1 100.168.200.142
munichb2 100.168.220.142
munich 192.168.101.142

munich:/ #

Defining the application server using the clmgr command
Next you define the application server. Example 5-27 shows the command and its output. The
application is named server http_app.
Table 5-5 Defining the application server using the clmgr command
argument
application_controller Basic object class used
APPLICATION_SERVER_NAME http_app Optional argument name, but

required value
STARTSCRIPT "/usr/IBM/HTTPServer/bin/ The start script used for the

apachectl -k start" application
STOPSCRIPT "/usr/IBM/HTTPServer/bin/ The stop script used for the

apachectl -k stop" application
Example 5-27 Defining the service address

munich:/ # munich:/ # clmgr add application_controller http_app \
> STARTSCRIPT="/usr/IBM/HTTPServer/bin/apachectl -k start" \
> STOPSCRIPT="/usr/IBM/HTTPServer/bin/apachectl -k stop"
munich:/ #
Defining the resource group using the clmgr command

Next you define the resource groups. Example 5-28 on page 121 shows the commands and
their output.
Compared to the smit functions, by using the clmgr command, you create a resource group
and its resources in one step. Therefore, you must ensure that you have defined all the
service IP addresses and your application servers.
Two resource groups have been created. The first one uses only the items needed for this
resource group (httprg), so that the system used the default values for the remaining
arguments. Table 5-6 provides more details about the command and arguments that are
used.
Table 5-6 Defining the resource groups using the clmgr (httprg) command
action, object class, or Value used comment
argument
add Basic preferred action.
resource_group Basic object class used.
RESOURCE_GROUP_NAME httprg Optional argument name, but required value.
VOLUME_GROUP httpvg The volume group used for this resource group.
NODES munich,berlin The sequence of the nodes is important. The first

node is the primary node.

action, object class, or Value used comment
argument
SERVICE_LABEL alleman The service address used for this resource

group.
APPLICATIONS http_app The application server label created in a previous

step.
For the second resource group in the test environment, we specified more details because we
did not want to use the default values (nfsrg). Table 5-7 provides more details about the
command and arguments that we used.
Table 5-7 Defining the resource groups using the clmgr (nfsrg) command
argument
add Basic preferred action.
resource_group Basic object class used.
RESOURCE_GROUP_NAME httprg Optional argument name, but required value.
VOLUME_GROUP nfsvg The volume group use for this resource group.
NODES berlin,munich The sequence of the nodes is important. The

first node is the primary node.
SERVICE_LABEL german The service address used for this resource

group.
FALLBACK NFB Never Fall Back (NFB) preferred for this

resource group (the default is FBHPN).
RECOVERY_METHOD parallel Parallel preferred as the recovery method for

this resource group. (The default is sequential.)
FS_BEFORE_IPADDR true Because we want to define an NFS cross

mount, we must use the value true here. (The
default is false.)
EXPORT_FILESYSTEM /nfsdir The file system for NFS to export.
MOUNT_FILESYSTEM "/sap;/nfsdir" Requires the same syntax because we used it in

smit to define the NFS cross mount.
Example 5-28 shows the commands that are used to define the resource groups listed in
Table 5-6 on page 120 and Table 5-7.
Example 5-28 Defining the resource groups

munich:/ # clmgr add resource_group httprg VOLUME_GROUP=httpvg \
> NODES=munich,berlin SERVICE_LABEL=alleman APPLICATIONS=http_app

munich:/ #
munich:/ # clmgr add resource_group nfsrg VOLUME_GROUP=nfsvg \
> NODES=berlin,munich SERVICE_LABEL=german FALLBACK=NFB \
> RECOVERY_METHOD=parallel FS_BEFORE_IPADDR=true EXPORT_FILESYSTEM="/nfsdir" \
> MOUNT_FILESYSTEM="/sap;/nfsdir"

munich:/ #
To see the configuration up to this point, use the clmgr query command. Example 5-29
shows how to check which resource groups you defined.
Example 5-29 Listing the defined resource groups using the clmgr command
munich:/ # clmgr query rg
httprg
nfsrg
munich:/ #
Next, you can see the content that you created for the resource groups. Example 5-30 shows
the content of the httprg. As discussed previously, the default values for this resource group
were used as much as possible.
Example 5-30 Contents listing of httprg

munich:/ # clmgr query rg httprg
NAME="httprg"
STATE="UNKNOWN"
CURRENT_NODE=""
NODES="munich berlin"
PRIMARYNODES=""
PRIMARYNODES_STATE="UNKNOWN"
SECONDARYNODES=""
SECONDARYNODES_STATE="UNKNOWN"
TYPE=""
APPLICATIONS="http_app"
STARTUP="OHN"
FALLOVER="FNPN"
FALLBACK="FBHPN"
SITE_POLICY="ignore"
DISK=""
VOLUME_GROUP="httpvg"
CONCURRENT_VOLUME_GROUP=""
FORCED_VARYON="false"
FILESYSTEM=""
FSCHECK_TOOL="fsck"
RECOVERY_METHOD="sequential"
EXPORT_FILESYSTEM=""
SHARED_TAPE_RESOURCES=""
AIX_CONNECTIONS_SERVICES=""
AIX_FAST_CONNECT_SERVICES=""
COMMUNICATION_LINKS=""
MOUNT_FILESYSTEM=""
SERVICE_LABEL="alleman"
MISC_DATA=""
SSA_DISK_FENCING="false"
VG_AUTO_IMPORT="false"
INACTIVE_TAKEOVER="false"
CASCADE_WO_FALLBACK="false"

FS_BEFORE_IPADDR="false"
NFS_NETWORK=""
MOUNT_ALL_FS="true"
WLM_PRIMARY=""
WLM_SECONDARY=""
FALLBACK_AT=""
RELATIONSHIP=""
SRELATIONSHIP="ignore"
GMD_REP_RESOURCE=""
PPRC_REP_RESOURCE=""
ERCMF_REP_RESOURCE=""
SRDF_REP_RESOURCE=""
TRUCOPY_REP_RESOURCE=""
SVCPPRC_REP_RESOURCE=""
GMVG_REP_RESOURCE=""
EXPORT_FILESYSTEM_V4=""
STABLE_STORAGE_PATH=""
WPAR_NAME=""
VARYON_WITH_MISSING_UPDATES="true"
DATA_DIVERGENCE_RECOVERY="ignore"
munich:/ #
Now you can see the content that was created for the resource groups. Example 5-31 shows
the content of the nfsrg resource group.
Example 5-31 List the content of nfsrg resource group

munich:/ # clmgr query rg nfsrg
NAME="nfsrg"
STATE="UNKNOWN"
CURRENT_NODE=""
NODES="berlin munich"
PRIMARYNODES=""
PRIMARYNODES_STATE="UNKNOWN"
SECONDARYNODES=""
SECONDARYNODES_STATE="UNKNOWN"
TYPE=""
APPLICATIONS=""
STARTUP="OHN"
FALLOVER="FNPN"
FALLBACK="NFB"
SITE_POLICY="ignore"
DISK=""
VOLUME_GROUP="nfsvg"
CONCURRENT_VOLUME_GROUP=""
FILESYSTEM=""
FSCHECK_TOOL="fsck"
RECOVERY_METHOD="parallel"
EXPORT_FILESYSTEM="/nfsdir"
SHARED_TAPE_RESOURCES=""
AIX_CONNECTIONS_SERVICES=""
AIX_FAST_CONNECT_SERVICES=""
COMMUNICATION_LINKS=""
MOUNT_FILESYSTEM="/sap;/nfsdir"

SERVICE_LABEL="german"
MISC_DATA=""
INACTIVE_TAKEOVER="false"
CASCADE_WO_FALLBACK="false"
FS_BEFORE_IPADDR="true"
NFS_NETWORK=""
MOUNT_ALL_FS="true"
WLM_PRIMARY=""
WLM_SECONDARY=""
FALLBACK_AT=""
RELATIONSHIP=""
SRELATIONSHIP="ignore"
GMD_REP_RESOURCE=""
PPRC_REP_RESOURCE=""
ERCMF_REP_RESOURCE=""
SRDF_REP_RESOURCE=""
TRUCOPY_REP_RESOURCE=""
SVCPPRC_REP_RESOURCE=""
GMVG_REP_RESOURCE=""
EXPORT_FILESYSTEM_V4=""
STABLE_STORAGE_PATH=""
WPAR_NAME=""
VARYON_WITH_MISSING_UPDATES="true"
DATA_DIVERGENCE_RECOVERY="ignore"
munich:/ #
Synchronizing the cluster definitions by using the clmgr command

After you create all topology and resource information, synchronize the cluster.
Verifying and propagating the changes: After using the clmgr command to modify the
cluster configuration, enter the clmgr verify cluster and clmgr sync cluster commands
to verify and propagate the changes to all nodes.
Example 5-32 shows usage of the clmgr sync cluster command to synchronize the cluster
and the command output.
Example 5-32 Synchronizing the cluster using the clmgr sync cluster command
munich:/ # clmgr sync cluster
Verification to be performed on the following:

Cluster Topology
Cluster Resources
Retrieving data from available cluster nodes. This could take a few minutes.
Start data collection on node berlin

Start data collection on node munich
Waiting on node berlin data collection, 15 seconds elapsed
Waiting on node munich data collection, 15 seconds elapsed
Collector on node berlin completed

Collector on node munich completed
Data collection complete
Verifying Cluster Topology...
Completed 10 percent of the verification checks
berlin net_ether_010
munich net_ether_010

Verifying Cluster Resources...
http_app httprg
Remember to redo automatic error notification if configuration has changed.
Verification has completed normally.
Committing any changes, as required, to all available nodes...

Adding any necessary PowerHA SystemMirror for AIX entries to /etc/inittab and
/etc/rc.net for IP Address
Takeover on node munich.

Adding any necessary PowerHA SystemMirror for AIX entries to /etc/inittab and
/etc/rc.net for IP Address
Takeover on node berlin.
Verification has completed normally.
WARNING: Multiple communication interfaces are recommended for networks that

use IP aliasing in order to prevent the communication interface from
becoming a single point of failure. There are fewer than the recommended
number of communication interfaces defined on the following node(s) for
the given network(s):
Node: Network:
---------------------------------- ----------------------------------
WARNING: Not all cluster nodes have the same set of HACMP filesets installed.
The following is a list of fileset(s) missing, and the node where the
fileset is missing:
Fileset: Node:
-------------------------------- --------------------------------

WARNING: There are IP labels known to HACMP and not listed in file
/usr/es/sbin/cluster/etc/clhosts.client on
node: berlin. Clverify can automatically populate this file to be used on a client
node, if executed in
auto-corrective mode.
WARNING: There are IP labels known to HACMP and not listed in file
/usr/es/sbin/cluster/etc/clhosts.client on
node: munich. Clverify can automatically populate this file to be used on a client
node, if executed in
auto-corrective mode.
WARNING: Network option "nonlocsrcroute" is set to 0 and will be set to 1 on
during HACMP startup on the
following nodes:
berlin
munich
WARNING: Network option "ipsrcrouterecv" is set to 0 and will be set to 1 on

during HACMP startup on the
following nodes:
berlin
munich
WARNING: Application monitors are required for detecting application failures

in order for HACMP to recover from them. Application monitors are started
by HACMP when the resource group in which they participate is activated.
The following application(s), shown with their associated resource group,
do not have an application monitor configured:
Application Server Resource Group

-------------------------------- ---------------------------------
WARNING: Node munich has cluster.es.nfs.rte installed however grace periods are
not fully enabled on this node.
Grace periods must be enabled before NFSv4 stable storage can be used.
HACMP will attempt to fix this opportunistically when acquiring NFS resources on
this node however the change
won't take effect until the next time that nfsd is started.
If this warning persists, the administrator should perform the following steps to
enable grace periods on

munich at the next planned downtime:
1. stopsrc -s nfsd
2. smitty nfsgrcperiod
3. startsrc -s nfsd
munich:/ #
When the migration finishes successfully, the CAA repository disk is now defined. Figure 5-47
shows the disks before the cluster synchronization, which are the same as those shown in
Figure 5-45 on page 113.
munich:/ # lspv
hdisk4 00c0f6a01c784107 None
munich:/ #
Figure 5-47 List of available disks before sync
Figure 5-48 shows the output of the lspv command after the synchronization. In our example,
hdisk4 is now converted into a CAA repository disk and is listed as caa_private0.
munich:/ # lspv
caa_private0 00c0f6a01c784107 caavg_private active
munich:/ #
Figure 5-48 List of available disks after using the cluster sync command
Starting the cluster using the clmgr command

To determine whether the cluster is configured correctly, test the cluster. To begin, start the
cluster nodes.
Example 5-33 show the command that we used and some of the output from using this
command. To start the clinfo command, we used the CLINFO=true argument. We did not
want a broadcast message. Therefore, we also defined the BROADCAST=false argument.
Example 5-33 Starting the cluster by using the clmgr command

munich:/ # clmgr online cluster start_cluster BROADCAST=false CLINFO=true
Warning: "WHEN" must be specified. Since it was not,

a default of "now" will be used.
Warning: "MANAGE" must be specified. Since it was not,

a default of "auto" will be used.

/usr/es/sbin/cluster/diag/cl_ver_alias_topology[42] [[ high = high ]]
--- skipped lines ---
/usr/es/sbin/cluster/diag/cl_ver_alias_topology[335] return 0
WARNING: Multiple communication interfaces are recommended for networks that

use IP aliasing in order to prevent the communication interface from
becoming a single point of failure. There are fewer than the recommended
number of communication interfaces defined on the following node(s) for
the given network(s):
Node: Network:
---------------------------------- ----------------------------------
berlin net_ether_010
munich net_ether_010
WARNING: Network option "nonlocsrcroute" is set to 0 and will be set to 1 on during HACMP
startup on the following nodes:
munich
WARNING: Network option "ipsrcrouterecv" is set to 0 and will be set to 1 on during HACMP
startup on the following nodes:
munich
WARNING: Application monitors are required for detecting application failures

in order for HACMP to recover from them. Application monitors are started
by HACMP when the resource group in which they participate is activated.
The following application(s), shown with their associated resource group,
do not have an application monitor configured:
Application Server Resource Group

-------------------------------- ---------------------------------
http_app httprg
/usr/es/sbin/cluster/diag/clwpardata[23] [[ high == high ]]
--- skipped lines ---
/usr/es/sbin/cluster/diag/clwpardata[325] exit 0
WARNING: Node munich has cluster.es.nfs.rte installed however grace periods are not fully
enabled on this node. Grace periods must be enabled before NFSv4 stable storage can be used.
HACMP will attempt to fix this opportunistically when acquiring NFS resources on this node
however the change won't take effect until the next time that nfsd is started.
If this warning persists, the administrator should perform the following steps to enable grace
periods on munich at the next planned downtime:
1. stopsrc -s nfsd
2. smitty nfsgrcperiod
3. startsrc -s nfsd
berlin: start_cluster: Starting PowerHA SystemMirror

berlin: 2359456 - 0:09 syslogd
berlin: Setting routerevalidate to 1
berlin: 0513-059 The clevmgrdES Subsystem has been started. Subsystem PID is 10682520.
berlin: 0513-059 The clinfoES Subsystem has been started. Subsystem PID is 10027062.
munich: start_cluster: Starting PowerHA SystemMirror
munich: 3408044 - 0:07 syslogd
munich: Setting routerevalidate to 1
munich: 0513-059 The clevmgrdES Subsystem has been started. Subsystem PID is 5505122.
munich: 0513-059 The clinfoES Subsystem has been started. Subsystem PID is 6029442.
The cluster is now online.
munich:/ #
Starting all nodes in a cluster: The clmgr online cluster start_cluster command
starts all nodes in a cluster by default.
Example 5-49 shows that all nodes are now up and running.
clstat - HACMP Cluster Status Monitor

-------------------------------------
Cluster: de_cluster (1126819374)

Wed Oct 13 17:27:30 EDT 2010
State: UP Nodes: 2
SubState: STABLE
Node: berlin State: UP

Interface: berlinb1 (0) Address: 100.168.200.141
State: UP
Interface: berlinb2 (0) Address: 100.168.220.141
State: UP
Interface: berlin (1) Address: 192.168.101.141
State: UP
Interface: german (0) Address: 10.168.101.141
State: UP
Resource Group: nfsrg State: On line
Node: munich State: UP

Interface: munichb1 (0) Address: 100.168.200.142
State: UP
Interface: munichb2 (0) Address: 100.168.220.142
State: UP
Interface: munich (1) Address: 192.168.101.142
State: UP
Interface: alleman (0) Address: 10.168.101.142
State: UP
Resource Group: httprg State: On line
************************ f/forward, b/back, r/refresh, q/quit *****************

Figure 5-49 Output of the clstat -a command showing that all nodes are running

5.2.6 Alternative output formats for the clmgr command
All of the previous examples use the ATTR="VALUE" format. However, two other formats are
supported. One format is colon-delimited (by using -c). The other format is simple XML (by
using -x).
Colon-delimited format
When using the colon-delimited output format (-c), you can use the -S option to silence or
eliminate the header line.
Example 5-34 shows the colon-delimited output format.
Example 5-34 The colon-delimited output format

# clmgr query ac appctl1
NAME="appctl1"
MONITORS=""
STARTSCRIPT="/bin/hostname"
STOPSCRIPT="/bin/hostname"
# clmgr -c query ac appctl1

# NAME:MONITORS:STARTSCRIPT:STOPSCRIPT
appctl1::/bin/hostname:/bin/hostname
# clmgr -cS query ac appctl1

appctl1::/bin/hostname:/bin/hostname
Simple XML format

Example 5-35 shows the simple XML-based output format.
Example 5-35 Simple XML-based output format

# clmgr -x query ac appctl1
<APPLICATION_CONTROLLERS>
<APPLICATION_CONTROLLER>
<NAME>appctl1</NAME>
<MONITORS></MONITORS>
<STARTSCRIPT>/bin/hostname</STARTSCRIPT>
<STOPSCRIPT>/bin/hostname</STOPSCRIPT>
</APPLICATION_CONTROLLER>
</APPLICATION_CONTROLLERS>
5.2.7 Log file of the clmgr command

The traditional PowerHA practice of setting VERBOSE_LOGGING to produce debug output is
supported with the clmgr command. You can also set VERBOSE_LOGGING on a per-run
basis with the clmgr -l command. The -l flag has the following options:
low Typically of interest to support personnel; shows simple function entry and exit.
med Typically of interest to support personnel; shows the same information as the low
option, but includes input parameters and return codes.
high The recommended setting for customer use; turns on set -x in scripts (equivalent
to VERBOSE_LOGGING=high) but leaves out internal utility functions.

max Turns on everything that the high option does and omits nothing. Is likely to make
debugging more difficult because of the volume of output that is produced.
Attention: The max value might have a negative impact on performance.
The main log file for clmgr debugging is the /var/hacmp/log/clutils.log file. This log file
includes all standard error and output from each command.
The return codes used by the clmgr command are standard for all commands:
RC_UNKNOWN=-1 A result is not known. It is useful as an initializer.
RC_SUCCESS=0 No errors were detected; the operation seems to have
been successful.
RC_ERROR=1 A general error has occurred.
RC_NOT_FOUND=2 A specified resource does not exist or could not be
found.
RC_MISSING_INPUT=3 Some required input was missing.
RC_INCORRECT_INPUT=4 Some detected input was incorrect.
RC_MISSING_DEPENDENCY=5 A required dependency does not exist.
RC_SEARCH_FAILED=6 A specified search failed to match any data.
Example 5-36 lists the format of the trace information in the clutils.log file.
Example 5-36 The trace information in the clutils.log file

<SENTINEL>:<RETURN_CODE>:<FILE>:<FUNCTION>[<LINENO>](<ELAPSED_TIME>):
<TRANSACTION_ID>:<PID>:<PPID>: <SCRIPT_LINE>
The following line shows an example of how the clutils.log file might be displayed:
CLMGR:0:resource_common:SerializeAsAssociativeArray()[537](0.704):13327:9765002:90
44114: unset 'array[AIX_LEVEL0]'
Example 5-37 shows some lines from the clutils.log file (not using trace).
Example 5-37 The clutils.log file

CLMGR STARTED (243:7667890:9437234): Wed Oct 6 23:51:22 CDT 2010
CLMGR USER (243:7667890:9437234): ::root:system
CLMGR COMMAND (243:7012392:7667890): clmgr -T 243 modify cluster
hacmp2728_cluster REPOSITORY=hdisk2
CLMGR ACTUAL (243:7012392:7667890): modify_cluster properties hdisk2
CLMGR RETURN (243:7012392:7667890): 0
CLMGR STDERR -- BEGIN (243:7667890:9437234): Wed Oct 6 23:51:26 CDT 2010
CLMGR STDERR -- END (243:7667890:9437234): Wed Oct 6 23:51:26 CDT 2010

CLMGR ENDED (243:7667890:9437234): Wed Oct 6 23:51:26 CDT 2010
CLMGR ELAPSED (243:7667890:9437234): 3.720

5.2.8 Displaying the log file content by using the clmgr command
You can use the clmgr action view command to view the log content.
Defining the number of lines returned

By using the TAIL argument, you can define the number of clmgr command-related lines that
are returned from the clutils.log file. Example 5-38 shows how you can specify 1000 lines
of clmgr log information.
Example 5-38 Using the TAIL argument when viewing the content of the clmgr log file
# clmgr view log clutils.log TAIL=1000 | wc -l
1000
#
Filtering special items by using the FILTER argument

You can use the FILTER argument to filter special items that you are looking for.
Example 5-39 shows how to list just the last 10 clmgr commands that were run.
Example 5-39 Listing the last 10 clmgr commands

# clmgr view log clutils.log TAIL=10 FILTER="CLMGR COMMAND"
CLMGR COMMAND (12198:13828308:15138846): clmgr -T 12198 add
application_controller appctl1 start=/bin/hostname stop=/bin/hostname
CLMGR COMMAND (2629:15138850:17891482): clmgr -T 2629 query
application_controller appctl1
CLMGR COMMAND (4446:19464210:17891482): clmgr -c -T 4446 query
CLMGR COMMAND (23101:19464214:17891482): clmgr -c -S -T 23101 query
CLMGR COMMAND (24919:17826012:17891482): clmgr -x -T 24919 query
CLMGR COMMAND (464:14352476:15138926): clmgr -T 464 view log clutils.log
TAIL=1000
#
Example 5-40 shows how to list the last five clmgr query commands that were run.
Example 5-40 Listing the last five clmgr query commands

# clmgr view log clutils.log TAIL= FILTER="CLMGR COMMAND",query
CLMGR COMMAND (9047:17825980:17891482): clmgr -x -T 9047 query resource_group rg1
CLMGR COMMAND (2629:15138850:17891482): clmgr -T 2629 query
CLMGR COMMAND (4446:19464210:17891482): clmgr -c -T 4446 query
CLMGR COMMAND (23101:19464214:17891482): clmgr -c -S -T 23101 query
CLMGR COMMAND (24919:17826012:17891482): clmgr -x -T 24919 query
#

5.3 PowerHA SystemMirror for IBM Systems Director
Using the web browser graphical user interface makes it easy to complete the configuration
and management tasks by mouse clicks. For example, you can easily create a cluster, verify
and synchronize a cluster, and add nodes to a cluster)
Director client agent of PowerHA SystemMirror is installed on cluster nodes in the same
manner as PowerHA SystemMirror itself by using the installp command. The Director
server and PowerHA server plug-in installation require a separate effort. You must download
them from the external website and manually install them on a dedicated system. This system
does not have to be a PowerHA system.
To learn about installing the Systems Director and PowerHA components, and their use for
configuration and management tasks, see Chapter 12, “Creating and managing a cluster
using IBM Systems Director” on page 333.

6
Chapter 6. IBM PowerHA SystemMirror

Smart Assist for DB2
PowerHA SystemMirror Smart Assist for DB2 is included in the base Standard Edition
software. It simplifies and minimizes the time and effort of making a non-DPF DB2 database
highly available. The Smart Assist automatically discovers DB2 instances and databases and
creates start and stop scripts for the instances. The Smart Assist also creates process and
custom PowerHA application monitors that help to keep the DB2 instances highly available.
This chapter explains how to configure a hot standby two-node IBM PowerHA SystemMirror
7.1 cluster using the Smart Assist for DB2. The lab cluster korea is used for the examples with
the participating nodes seoul and busan.

Prerequisites
Implementing a PowerHA SystemMirror cluster and Smart Assist for DB2 7.1

6.1 Prerequisites
This section describes the prerequisites for the Smart Assist implementation.
6.1.1 Installing the required file sets

You must install two additional file sets, as shown in Example 6-1, before using Smart Assist
for DB2.
Example 6-1 Additional file sets required for installing Smart Assist
seoul:/ # clcmd lslpp -l cluster.es.assist.common cluster.es.assist.db2
-------------------------------
NODE seoul
-------------------------------
Fileset Level State Description
----------------------------------------------------------------------------
Path: /usr/lib/objrepos
cluster.es.assist.common 7.1.0.1 COMMITTED PowerHA SystemMirror Smart
Assist Common Files
cluster.es.assist.db2 7.1.0.1 COMMITTED PowerHA SystemMirror Smart
Assist for DB2
-------------------------------
NODE busan
-------------------------------
Fileset Level State Description
----------------------------------------------------------------------------
Path: /usr/lib/objrepos
cluster.es.assist.common 7.1.0.1 COMMITTED PowerHA SystemMirror Smart
Assist Common Files
cluster.es.assist.db2 7.1.0.1 COMMITTED PowerHA SystemMirror Smart
Assist for DB2
6.1.2 Installing DB2 on both nodes

The DB2 versions supported by the PowerHA Smart Assist are versions 8.1, 8.2, 9.1, and 9.5.
For the example in this appendix, DB2 9.5 has been installed on both nodes, seoul and
busan, as shown in Example 6-2.
Example 6-2 DB2 version installed

seoul:/db2/db2pok # db2pd -v
Instance db2pok uses 64 bits and DB2 code release SQL09050

with level identifier 03010107
Informational tokens are DB2 v9.5.0.0, s071001, AIX6495, Fix Pack 0.

6.1.3 Importing the shared volume group and file systems
The storage must be accessible from both nodes with the logical volume structures created
and imported on both sides. If the volume groups are not imported on the secondary node,
Smart Assist for DB2 does it automatically as shown in Example 6-3.
Example 6-3 Volume groups imported in the nodes

seoul:/db2/db2pok # clcmd lspv
-------------------------------
NODE seoul
-------------------------------
hdisk0 00c0f6a088a155eb rootvg active
caa_private0 00c0f6a01077342f caavg_private active
cldisk2 00c0f6a0107734ea pokvg
cldisk1 00c0f6a010773532 pokvg
-------------------------------
NODE busan
-------------------------------
caa_private0 00c0f6a01077342f caavg_private active
cldisk2 00c0f6a0107734ea pokvg
cldisk1 00c0f6a010773532 pokvg
6.1.4 Creating the DB2 instance and database on the shared volume group
Before launching the PowerHA Smart Assist for DB2, you must have already created the DB2
instance and DB2 database over the volume groups that are shared by both nodes.
In Example 6-4, the home for the POK database was created in the /db2/POK/db2pok shared
file system of the volume group pokvg. The instance was created in the /db2/db2pok shared
file system, which is the home directory for user db2pok. The instance was created in the
primary node only as far as the structures are created over a shared volume group.
Example 6-4 Displaying the logical volume groups of pokvg

seoul:/ # lsvg -l pokvg
pokvg:
loglv001 jfs2log 1 1 1 open/syncd N/A
poklv001 jfs2 96 96 1 open/syncd /db2/POK/db2pok
poklv002 jfs2 192 192 2 open/syncd /db2/POK/sapdata1
poklv003 jfs2 32 32 1 open/syncd /db2/POK/sapdatat1
poklv004 jfs2 48 48 1 open/syncd /db2/POK/log_dir
poklv005 jfs2 64 64 1 open/syncd /export/sapmnt/POK
poklv006 jfs2 64 64 1 open/syncd /export/usr/sap/trans
poklv008 jfs2 32 32 1 open/syncd /usr/sap/POK
poklv009 jfs2 4 4 1 open/syncd /db2/POK/db2dump
poklv007 jfs2 32 32 1 open/syncd /db2/db2pok
seoul:/ # clcmd grep db2pok /etc/passwd

-------------------------------
NODE seoul
-------------------------------
Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2 137

db2pok:!:203:101::/db2/db2pok:/usr/bin/ksh
-------------------------------
NODE busan
-------------------------------
db2pok:!:203:101::/db2/db2pok:/usr/bin/ksh
seoul:/ # /opt/IBM/db2/V9.5/instance/db2icrt -a SERVER -s ese -u db2fenc1 -p

db2c_db2pok db2pok
seoul:/ # su - db2pok
seoul:/db2/db2pok # ls -ld sqllib

drwxrwsr-t 19 db2pok db2iadm1 4096 Sep 21 13:12 sqllib
seoul:/db2/db2pok # db2start
seoul:/db2/db2pok # db2 "create database pok on /db2/POK/db2pok CATALOG TABLESPACE

managed by database using (file '/db2/POK/sapdata1/catalog.tbs' 100000) EXTENTSIZE
4 PREFETCHSIZE 4 USER TABLESPACE managed by database using (file
'/db2/POK/sapdata1/sapdata.tbs' 500000) EXTENTSIZE 4 PREFETCHSIZE 4 TEMPORARY
TABLESPACE managed by database using (file '/db2/POK/sapdatat1/temp.tbs' 200000)
EXTENTSIZE 4 PREFETCHSIZE 4"
seoul:/db2/db2pok # db2 list db directory

System Database Directory
Number of entries in the directory = 1
Database 1 entry:
Database alias = POK
Database name = POK
Local database directory = /db2/POK/db2pok
Database release level = c.00
Comment =
Directory entry type = Indirect
Catalog database partition number = 0
Alternate server hostname =
Alternate server port number =
seoul:/db2/db2pok # db2 update db cfg for pok using NEWLOGPATH /db2/POK/log_dir
seoul:/db2/db2pok # db2 update db cfg for pok using LOGRETAIN on
seoul:/db2/db2pok # db2 backup db pok to /tmp
seoul:/db2/db2pok # db2stop; db2start
seoul:/db2/db2pok # db2 connect to pok
Database Connection Information
Database server = DB2/AIX64 9.5.0

SQL authorization ID = DB2POK
Local database alias = POK

seoul:/db2/db2pok # db2 connect reset
DB20000I The SQL command completed successfully.
Non-DPF database support: Smart Assist for DB2 supports only non-DPF databases.
6.1.5 Updating the /etc/services file on the secondary node

When the instance is created on the primary node, the /etc/services file is updated with
information for DB2 use. You must also add these lines to the /etc/services file on the
secondary node as in the following example:
db2c_db2pok 50000/tcp
DB2_db2pok 60000/tcp
DB2_db2pok_1 60001/tcp
DB2_db2pok_2 60002/tcp
DB2_db2pok_END 60003/tcp
6.1.6 Configuring IBM PowerHA SystemMirror

You must configure the topology of the PowerHA cluster before using Smart Assist for DB2. In
Example 6-5, the cluster korea was configured with two Ethernet interfaces in each node.
Example 6-5 Cluster korea configuration

seoul:/ # cllsif
busan-b2 boot net_ether_01 ether public busan 192.168.201.144 en2 255.255.252.0 22
busan-b1 boot net_ether_01 ether public busan 192.168.101.144 en0 255.255.252.0 22
poksap-db service net_ether_01 ether public busan 10.168.101.143 255.255.252.0 22
seoul-b1 boot net_ether_01 ether public seoul 192.168.101.143 en0 255.255.252.0 22
seoul-b2 boot net_ether_01 ether public seoul 192.168.201.143 en2 255.255.252.0 22
poksap-db service net_ether_01 ether public seoul 10.168.101.143 255.255.252.0 22
6.2 Implementing a PowerHA SystemMirror cluster and Smart

Assist for DB2 7.1
This section explains the preliminary steps that are required before you start Smart Assist for
DB2. Then it explains how to start Smart Assist for DB2.
6.2.1 Preliminary steps

Before starting Smart Assist for DB2, complete the following steps:
1. Stop the PowerHA cluster services on both nodes by issuing the lssrc -ls clstrmgrES
command on both nodes as shown in Example 6-6 on page 140. A ST_INIT state indicates
that the cluster services are stopped. The shared volume group is active, with file systems
mounted, on the node where Smart Assist for DB2 is going to be installed.

Example 6-6 Checking for PowerHA stopped cluster services
seoul:/ # lssrc -ls clstrmgrES
Current state: ST_INIT
sccsid = "$Header: @(#) 61haes_r710_integration/13
43haes/usr/sbin/cluster/hacmprd/main.C, hacmp, 61haes_r710, 1034A_61haes_r710
2010-08-19T1
0:34:17-05:00$"
busan:/ # lssrc -ls clstrmgrES

2010-08-19T1
0:34:17-05:00$"
2. Mount the file systems as shown in Example 6-7 so that Smart Assist for DB2 can
discover the available instances and databases.
Example 6-7 Checking for mounted file systems in node seoul

seoul:/db2/db2pok # lsvg -l pokvg
pokvg:
loglv001 jfs2log 1 1 1 open/syncd N/A
poklv001 jfs2 96 96 1 open/syncd /db2/POK/db2pok
poklv002 jfs2 192 192 2 open/syncd /db2/POK/sapdata1
poklv003 jfs2 32 32 1 open/syncd /db2/POK/sapdatat1
poklv004 jfs2 48 48 1 open/syncd /db2/POK/log_dir
poklv005 jfs2 64 64 1 open/syncd /export/sapmnt/POK
poklv006 jfs2 64 64 1 open/syncd /export/usr/sap/trans
poklv008 jfs2 32 32 1 open/syncd /usr/sap/POK
poklv009 jfs2 4 4 1 open/syncd /db2/POK/db2dump
poklv007 jfs2 32 32 1 open/syncd /db2/db2pok
The DB2 instance is active on the node where Smart Assist for DB2 is going to be
executed as shown in Example 6-8.
Example 6-8 Checking for active DB2 instances

seoul:/db2/db2pok # db2ilist
db2pok
seoul:/db2/db2pok # db2start
09/24/2010 11:38:53 0 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
seoul:/db2/db2pok # ps -ef | grep db2sysc | grep -v grep

db2pok 15794218 8978496 0 11:38:52 - 0:00 db2sysc 0
seoul:/db2/db2pok # db2pd -
Database Partition 0 -- Active -- Up 0 days 00:00:10

3. After the instance is running, edit the $INSTHOME/sqllib/db2nodes.cfg file as shown in
Example 6-9 to add the service IP label. This service IP label is going to be used in the
IBM PowerHA resource group. If you edited it before, the database instance will not start
because the service IP label is not configured on the network interface when PowerHA is
down.
Example 6-9 Editing and adding the service IP label to the db2nodes.cfg file
seoul:/ # cat /db2/db2pok/sqllib/db2nodes.cfg
0 poksap-db 0
The .rhosts file (Example 6-10) for the DB2 instance owner has all the base, persistent,
and service addresses. It also has the right permissions.
Example 6-10 Checking the .rhosts file

seoul:/ # cat /db2/db2pok/.rhosts
seoul db2pok
busan db2pok
seoul-b1 db2pok
busan-b1 db2pok
seoul-b2 db2pok
busan-b2 db2pok
poksap-db db2pok
seoul:/db2/db2pok # ls -ld .rhosts

-rw------- 1 db2pok system 107 Oct 4 15:10 .rhosts
4. Find the path for the binary files and then export the variable as shown in Example 6-11.
The DSE_INSTALL_DIR environment variable is exported as a root user with the actual path
for the DB2 binary files. If more than one DB2 version is installed, choose the version that
you to use for your high available instance.
Example 6-11 Finding the DB2 binary files and exporting them
seoul:/db2/db2pok # db2level
DB21085I Instance "db2pok" uses "64" bits and DB2 code release "SQL09050" with
level identifier "03010107".
Informational tokens are "DB2 v9.5.0.0", "s071001", "AIX6495", and Fix Pack
"0".
Product is installed at "/opt/IBM/db2/V9.5".
seoul:/ # export DSE_INSTALL_DIR=/opt/IBM/db2/V9.5
6.2.2 Starting Smart Assist for DB2

After completing the steps in 6.2.1, “Preliminary steps” on page 139, you are ready to start
Smart Assist for DB2 as explained in the following steps:
1. Launch Smart Assist for DB2 by using the path for seoul: smitty sysmirror  Cluster
Applications and Resources  Make Applications Highly Available (Use Smart
Assists)  Add an Application to the PowerHA SystemMirror Configuration.
2. In the Add an Application to the PowerHA SystemMirror Configuration panel, select Select
a Smart Assist From the List of Available Smart Assists.

3. In the Select a Smart Assist From the List of Available Smart Assists panel (Figure 6-1),
select DB2 UDB non-DPF Smart Assist.
Select a Smart Assist From the List of Available Smart Assists
DB2 UDB non-DPF Smart Assist # busan seoul

DHCP Smart Assist # busan seoul
DNS Smart Assist # busan seoul
Lotus Domino Smart Assist # busan seoul
FileNet P8 Smart Assist # busan seoul
IBM HTTP Server Smart Assist # busan seoul
SAP MaxDB Smart Assist # busan seoul
Oracle Database Smart Assist # busan seoul
Oracle Application Server Smart Assist # busan seoul
Print Subsystem Smart Assist # busan seoul
SAP Smart Assist # busan seoul
Tivoli Directory Server Smart Assist # busan seoul
TSM admin smart assist # busan seoul
TSM client smart assist # busan seoul
TSM server smart assist # busan seoul
WebSphere Smart Assist # busan seoul
F1=Help F2=Refresh F3=Cancel

Esc+8=Image Esc+0=Exit Enter=Do
/=Find n=Find Next
Figure 6-1 Selecting DB2 UDB non-DPF Smart Assist
Configuration Mode.
5. In the Select Configuration Mode panel (Figure 6-2), select Automatic Discovery and
Configuration.
Select Configuration Mode
Automatic Discovery And Configuration

Manual Configuration

/=Find n=Find Next
Figure 6-2 Selecting the configuration mode

the Specific Configuration You Wish to Create.
7. In the Select the Specific Configuration You Wish to Create panel (Figure 6-3), select DB2
Single Instance.
Select The Specific Configuration You Wish to Create
DB2 Single Instance # busan seoul

/=Find n=Find Next
Figure 6-3 Selecting the configuration to create
8. Select the DB2 instance name. In this case, only one instance, db2pok, is available as
shown in Figure 6-4.
Select a DB2 Instance
db2pok

/=Find n=Find Next
Figure 6-4 Selecting the DB2 instance name
9. Using the available pick lists (F4), edit the Takeover Node, DB2 Instance Database to
Monitor, and Service IP Label fields as shown in Figure 6-5. Press Enter.
Add a DB2 Highly Available Instance Resource Group

* Application Name [DB2_Instance_db2pok]
* DB2 Instance Owning Node [seoul] +

* Takeover Node(s) [busan] +
* DB2 Instance Name db2pok +
* DB2 Instance Database to Monitor POK +
* Service IP Label [poksap-db] +
Figure 6-5 Adding the DB2 high available instance resource group
Tip: You can edit the Application Name field and change it to have a more meaningful
name.

A new PowerHA resource group, called db2pok_ResourceGroup, is created. The volume
group pokvg and the service IP label poksap-db are automatically added to the resource
group as shown in Example 6-12.
Example 6-12 The configured resource group for the DB2 instance
seoul:/ # /usr/es/sbin/cluster/utilities/cllsres
APPLICATIONS="db2pok_ApplicationServer"
FILESYSTEM=""
FSCHECK_TOOL="logredo"
FS_BEFORE_IPADDR="false"
RECOVERY_METHOD="parallel"
SERVICE_LABEL="poksap-db"
VOLUME_GROUP="pokvg"
USERDEFINED_RESOURCES=""
seoul:/ # /usr/es/sbin/cluster/utilities/cllsgrp
db2pok_ResourceGroup
10.Administrator task: Verify the start and stop scripts that were created for the resource
group.
a. To verify the scripts, use the odmget or cllsserv commands or the SMIT tool as shown
in Example 6-13.
Example 6-13 Verifying the start and stop scripts

busan:/ # odmget HACMPserver
HACMPserver:
name = "db2pok_ApplicationServer"
start = "/usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok"
stop = "/usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok"
min_cpu = 0
desired_cpu = 0
min_mem = 0
desired_mem = 0
use_cod = 0
min_procs = 0
min_procs_frac = 0
desired_procs = 0
desired_procs_frac = 0
seoul:/ # /usr/es/sbin/cluster/utilities/cllsserv
db2pok_ApplicationServer /usr/es/sbin/cluster/sa/db2/sbin/cl_db2start
db2pok /usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok
b. Follow the path on seoul: smitty sysmirror  Cluster Applications and

Resources  Resources  Configure User Applications (Scripts and
Monitors)  Application Controller Scripts  Change/Show Application
Controller Scripts.

c. Select the application controller (Figure 6-8) and press Enter.
Select Application Controller
db2pok_ApplicationServer

/=Find n=Find Next
Figure 6-6 Selecting the DB2 application controller
The characteristics of the application controller displayed as shown in Figure 6-7.
Change/Show Application Controller Scripts

Application Controller Name db2pok_ApplicationServer

New Name [db2pok_ApplicationServer]
Start Script [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok]
Stop Script [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok]
Application Monitor Name(s) db2pok_SQLMonitor
db2pok_ProcessMonitor
Figure 6-7 Change/Show Application Controller Scripts panel
11.Administrator task: Verify which custom and process application monitors were created by
Smart Assist for DB2. In our example, the application monitors are db2pok_SQLMonitor
and db2pok_ProcessMonitor.
a. Run the following path for seoul: smitty sysmirror  Cluster Applications and
Monitors)  Application Monitors  Configure Custom Application Monitors 
Change/Show Custom Application Monitor.
b. In the Application Monitor to Change panel (Figure 6-8), select db2pok_SQLMonitor
and press Enter.
Application Monitor to Change
db2pok_SQLMonitor

/=Find n=Find Next
Figure 6-8 Selecting the application monitor to change

c. In the Change/Show Custom Application Monitor panel (Figure 6-9), you see the
attributes of the application monitor.
Change/Show Custom Application Monitor

* Monitor Name db2pok_SQLMonitor

Application Controller(s) to Monitor db2pok_ApplicationServer +
* Monitor Mode [Long-running
monitoring] +
* Monitor Method [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2cmon -i db2pok
-A po>
Hung Monitor Signal [9] #
Restart Count [3] #
Notify Method []
Cleanup Method [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok]
Restart Method [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok]
Figure 6-9 Change/Show Custom Application Monitor panel
d. Run the following path for seoul: smitty sysmirror  Cluster Applications and
Monitors)  Application Monitors  Configure Process Application Monitors 
Change/Show Process Application Monitor.
e. In the Application Monitor to Change panel (Figure 6-10), select
db2pok_ProcessMonitor and press Enter.
Application Monitor to Change

/=Find n=Find Next
Figure 6-10 Selecting the application monitor to change

In the Change/Show Process Application Monitor panel, you see the attributes of the
application monitor (Figure 6-11).
Change/Show Process Application Monitor

[Entry Fields]
* Monitor Name db2pok_ProcessMonitor
Application Controller(s) to Monitor db2pok_ApplicationServer +
* Monitor Mode [Long-running monitoring] +
* Processes to Monitor [db2sysc]
* Process Owner [db2pok]
Instance Count [1] #
* Restart Count [3] #
Notify Method []
Cleanup Method [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok]
Restart Method [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok]
Figure 6-11 Change/Show Process Application Monitor panel
6.2.3 Completing the configuration

After the Smart Assist for DB2 is started, complete the configuration:
1. Stop the DB2 instance on the primary node as shown in Example 6-14. Keep in mind that
it was active only for the sake of the Smart Assist for DB2 discovery process.
Example 6-14 Stopping the DB2 instance

seoul:/db2/db2pok # db2stop
09/24/2010 12:02:56 0 0 SQL1064N DB2STOP processing was successful.
SQL1064N DB2STOP processing was successful.
2. Unmount the shared file systems as shown in Example 6-15.
Example 6-15 Unmounting the shared file systems

seoul:/db2/db2pok # lsvg -l pokvg
pokvg:
loglv001 jfs2log 1 1 1 closed/syncd N/A
poklv001 jfs2 96 96 1 closed/syncd /db2/POK/db2pok
poklv002 jfs2 192 192 2 closed/syncd /db2/POK/sapdata1
poklv003 jfs2 32 32 1 closed/syncd /db2/POK/sapdatat1
poklv004 jfs2 48 48 1 closed/syncd /db2/POK/log_dir
poklv005 jfs2 64 64 1 closed/syncd /export/sapmnt/POK
poklv006 jfs2 64 64 1 closed/syncd /export/usr/sap/trans
poklv008 jfs2 32 32 1 closed/syncd /usr/sap/POK
poklv009 jfs2 4 4 1 closed/syncd /db2/POK/db2dump
poklv007 jfs2 32 32 1 closed/syncd /db2/db2pok

3. Deactivate the shared volume group as shown in Example 6-16.
Example 6-16 Deactivating the shared volume group of pokvg

seoul:/ # varyoffvg pokvg
seoul:/ # lsvg -o
caavg_private
rootvg
4. Synchronize the PowerHA cluster by using SMIT:

a. Follow the path smitty sysmirror  Custom Cluster Configuration  Verify and
Synchronize Cluster Configuration (Advanced).
b. In the PowerHA SystemMirror Verification and Synchronization panel (Figure 6-12),
press Enter to accept the default option.
PowerHA SystemMirror Verification and Synchronization

[Entry Fields]
* Include custom verification library checks [Yes] +
* Automatically correct errors found during [Yes] +
verification?
* Force synchronization if verification fails? [No] +
Figure 6-12 Accepting the default actions on the Verification and Synchronization panel
5. Start the cluster on both nodes, seoul and busan, by running smitty clstart.
6. In the Start Cluster Services panel (Figure 6-13 on page 149), complete these steps:
a. For Start now, on system restart or both, select now.
b. For Start Cluster SErvices on these nodes, enter [seoul busan].
c. For Manage Resource Groups, select Automatically.
d. For BROADCAST message at startup, select false.
e. For Startup Cluster Information Daemon, select true.
f. For Ignore verification errors, select false.
g. For Automatically correct errors found during cluster start?, select yes.
h. Press Enter.

Start Cluster Services

[Entry Fields]
* Start now, on system restart or both now +
Start Cluster Services on these nodes [seoul busan] +
* Manage Resource Groups Automatically +
BROADCAST message at startup? false +
Startup Cluster Information Daemon? true +
Ignore verification errors? false +
Automatically correct errors found during yes +
cluster start?
Figure 6-13 Specifying the options for starting cluster services
Tip: The log file for the Smart Assist is in the /var/hacmp/log/sa.log file. You can use the
clmgr utility to easily view the log, as in the following example:
clmgr view log sa.log
When the PowerHA cluster starts, the DB2 instance is automatically started. The application
monitors start after the defined stabilization interval as shown in Example 6-17.
Example 6-17 Checking the status of the high available cluster and the DB2 instance
seoul:/ # clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
db2pok_Resourc ONLINE seoul
OFFLINE busan
seoul:/ # ps -ef | grep /usr/es/sbin/cluster/clappmond | grep -v grep

root 7340184 15728806 0 12:17:53 - 0:00
/usr/es/sbin/cluster/clappmond db2pok1_SQLMonitor
root 11665630 4980958 0 12:17:53 - 0:00
/usr/es/sbin/cluster/clappmond db2pok_ProcessMonitor
seoul:/db2/db2pok # db2pd -
Database Partition 0 -- Active -- Up 0 days 00:19:38
Your DB2 instance and database are now configured for high availability in a hot-standby
PowerHA SystemMirror configuration.

7
Chapter 7. Migrating to PowerHA 7.1

This chapter includes the following topics for migrating to PowerHA 7.1:
Considerations before migrating
Understanding the PowerHA 7.1 migration process
Snapshot migration
Rolling migration
Offline migration

7.1 Considerations before migrating
Before migrating your cluster, you must be aware of the following considerations:
The required software
– AIX
– Virtual I/O Server (VIOS)
Multicast address
Repository disk
FC heartbeat support
All non-IP networks support removed
– RS232
– TMSCSI
– TMSSA
– Disk heartbeat (DISKHB)
– Multinode disk heartbeat (MNDHB)
IP networks support removed
– Asynchronous transfer mode (ATM)
– Fiber Distributed Data Interface (FDDI)
– Token ring
IP Address Takeover (IPAT) via replacement support removed
Heartbeat over alias support removed
Site support not available in this version
IPV6 support not available in this version
You can migrate from High-Availability Cluster Multi-Processing (HACMP) or PowerHA

versions 5.4.1, 5.5, and 6.1 only. If you are running a version earlier than HACMP 5.4.1, you
must upgrade to a newer version first.
TL6: AIX must be at a minimum version of AIX 6.1 TL6 (6.1.6.0) on all nodes before
migration. Use of AIX 6.1 TL6 SP2 or later is preferred.
For more information about migration considerations, see 3.4, “Migration planning” on
page 46.
Only the following migration methods are supported:

Snapshot migration (as explained in 7.3, “Snapshot migration” on page 161)
Rolling migration (as explained in 7.4, “Rolling migration” on page 177)
Offline migration (as explained in 7.5, “Offline migration” on page 191)
Important: A nondisruptive upgrade is not available in PowerHA 7.1, because this version
is the first one to use Cluster Aware AIX (CAA).

7.2 Understanding the PowerHA 7.1 migration process
Before you begin a migration, you must understand the migration process and all migration
scenarios. The process is different from the previous versions of PowerHA (HACMP).
With the introduction of PowerHA 7.1, you now use the features of CAA introduced in AIX 6.1
TL6 and AIX 7.1. For more information about the new features of this release, see 2.2, “New
features” on page 24.
The migration process now has two main cluster components: CAA and PowerHA. This
process involves updating your existing PowerHA product and configuring the CAA cluster
component.
7.2.1 Stages of migration

Migrating to PowerHA 7.1 involves the following stages:
Stage 1: Upgrading to AIX 6.1 TL6 or AIX 7.1
Before you can migrate, you must have working a cluster-aware version of AIX. You can
perform this task as part of a two-stage rolling migration or upgrade to AIX first before you
start the PowerHA migration. This version is required before you can start premigration
checking (stage 2).
Stage 2: Performing the premigration check (clmigcheck)
During this stage, you use the clmigcheck command to upgrade PowerHA to PowerHA
7.1:
a. Stage 2a: Run the clmigcheck command on the first node to choose Object Data
Manager (ODM) or snapshot. Run it again to choose the repository disk (and optionally
the IP multicast address).
b. Stage 2b: Run the clmigcheck command on each node (including the first node) to see
the “OK to install the new version” message and then upgrade the node to
PowerHA 7.1.
The clmigcheck command: The clmigcheck command automatically creates the CAA
cluster when it is run on the last node.
For a detailed explanation about the clmigcheck process, see 7.2.2, “Premigration
checking: The clmigcheck program” on page 157.
Chapter 7. Migrating to PowerHA 7.1 153

Stage 3: Upgrading to PowerHA 7.1
After stage 2 is completed, you upgrade to PowerHA 7.1 on the node. Figure 7-1 shows
the state of the cluster in the test environment after updating to PowerHA 7.1 on one node.
Topology services are still active so that the newly migrated PowerHA 7.1 node can
communicate with the previous version, PowerHA 6.1. The CAA configuration has been
completed, but the CAA cluster is not yet created.
Figure 7-1 Mixed version cluster after migrating node 1
Stage 4: Creating the CAA cluster (last node)

When you are on the last node of the cluster, you create the CAA cluster after running the
clmigcheck command a final time. CAA is required for PowerHA 7.1 to work, making this
task a critical step. Figure 7-2 shows the state of the environment after running the
clmigcheck command on the last node of the cluster, but before completing the migration.
Figure 7-2 Mixed version cluster after migrating node 2
At this stage, the clmigcheck process has run on the last node of the cluster. The CAA
cluster is now created and CAA has established communication with the other node.

However, PowerHA is still using the Topology Services (topsvcs) function because the
migration switchover to CAA is not yet completed.
Stage 5: Starting the migration protocol
As soon as you create the CAA cluster and install PowerHA 7.1, you must start the cluster.
The node_up event checks whether all nodes are running PowerHA 7.1 and starts the
migration protocol. The migration protocol has two phases:
– Phase 1
You call ha_gs_migrate_to_caa_prep(0) to start the migration from groups services to
CAA. Ensure that each node can proceed with the migration.
– Phase 2
During the second phase, you update the DCD and ACD ODM entries in HACMPnode
and HACMPcluster to the latest version. You call ha_gs_migrate_to_caa_commit() to
complete the migration and issue the following command:
/usr/es/sbin/cluster/utilities/clmigcleanup
The clmigcleanup process removes existing non-IP entries from the HACMPnetwork,
HACMPadapter, and HACMPnim ODM entries, such as any diskhb entries. Figure 7-3
shows sections from the clstrmgr.debug log file showing the migration protocol stages.
Migration phase one - extract from clstrmgr.debug

Mon Sep 27 20:22:51 nPhaseCb: First phase of the migration protocol, call
ha_gs_caa_migration_prep()
Mon Sep 27 20:22:51 domainControlCb: Called, state=ST_STABLE
Mon Sep 27 20:22:51 domainControlCb: Notification type: HA_GS_DOMAIN_NOTIFICATION
Mon Sep 27 20:22:51 domainControlCb: HA_GS_MIGRATE_TO_CAA
Mon Sep 27 20:22:51 domainControlCb: Sub-Type: HA_GS_DOMAIN_CAA_MIGRATION_COORD
Mon Sep 27 20:22:51 domainControlCb: reason: HA_GS_VOTE_FOR_MIGRATION
Mon Sep 27 20:22:51 domainControlCb: Called, state=ST_STABLE
Mon Sep 27 20:22:51 domainControlCb: Notification type: HA_GS_DOMAIN_NOTIFICATION
Mon Sep 27 20:22:51 domainControlCb: HA_GS_MIGRATE_TO_CAA
Mon Sep 27 20:22:51 domainControlCb: Sub-Type: HA_GS_DOMAIN_CAA_MIGRATION_APPRVD
Mon Sep 27 20:22:51 domainControlCb: reason: HA_GS_MIGRATE_TO_CAA_PREP_DONE
Mon Sep 27 20:22:51 domainControlCb: Set RsctMigPrepComplete flag
Mon Sep 27 20:22:51 domainControlCb: Voting to CONTINUE with RsctMigrationPrepMsg.
Migration phase two - updating cluster version

Mon Sep 27 20:22:51 DoNodeOdm: Called for DCD HACMPnode class
Mon Sep 27 20:22:51 GetObjects: Called with criteria:
name=chile
Mon Sep 27 20:22:51 DoNodeOdm: Updating DCD HACMPnode stanza with node_id = 1 and
version = 12 for object NAME_SERVER of node chile
Mon Sep 27 20:22:51 DoNodeOdm: Updating DCD HACMPnode stanza with node_id = 1 and
version = 12 for object DEBUG_LEVEL of node chile
Finishing migration - calling clmigcleanup

Mon Sep 27 20:23:51 finishMigrationGrace: resetting MigrationGracePeriod
Mon Sep 27 20:23:51 finishMigrationGrace: Calling ha_gs_migrate_to_caa_commit()
Mon Sep 27 20:23:51 finifhMigration Grace: execute clmigcleanup command
Figure 7-3 Extract from the clstrmgr.debug file showing the migration protocol

Stage 6: Switching over from Group Services (grpsvcs) to CAA
When migration is complete, switch over the grpsvcs communication function from
topsvcs to the new communication with CAA. The topsvcs function is now inactive, but the
service is still part of Reliable Scalable Cluster Technology (RSCT) and is not removed.
CAA communication: The grpsvcs SRC subsystem is active until you restart the
system. This subsystem is now communicating with CAA and not topsvcs as shown in
Figure 7-4.
Figure 7-4 Switching over Group Services to use CAA
Figure 7-5 shows the services that are running after migration, including cthags.
chile:/ # lssrc -a | grep cluster

clevmgrdES cluster 11862228 active
chile:/ # lssrc -a | grep cthags

chile:/ # lssrc -a | grep caa

Figure 7-5 Services running after migration

Table 7-1 shows the changes to the SRC subsystem before and after migration.
Table 7-1 Changes in the SRC subsystems

Older PowerHA PowerHA 7.1 or later
Topology Services topsvcs N/A
Group Services grpsvcs cthags
The clcomdES and clcomd subsystems

When running in a mixed-version cluster, you must handle the changes in the clcomd
subsystem. During a rolling or mixed-cluster situation, you can have two separate instances
of the communication daemon running: clcomd and clcomdES.
clcomd instances: You can have two instances of the clcomd daemon in the cluster, but
never on a given node. After PowerHA 7.1 is installed on a node, the clcomd daemon is
run, and the clcomdES daemon does not exist. AIX 6.1.6.0 and later with a back-level
PowerHA version (before version 7.1) only runs the clcomdES daemon even though the
clcomd daemon exists.
The clcomd daemon uses port 16191, and the clcomdES daemon uses port 6191. When
migration is complete, the clcomdES daemon is removed.
The clcomdES daemon: The clcomdES daemon is removed when the older PowerHA
software version is removed (snapshot migration) or overwritten by the new PowerHA 7.1
version (rolling or offline migration).
7.2.2 Premigration checking: The clmigcheck program

Before starting migration, you must run the clmigcheck program to prepare the cluster for
migration. The clmigcheck program has two functions. First, it validates the current cluster
configuration (by using ODM or snapshot) for migration. If the configuration is not valid, the
clmigcheck program notifies you of any unsupported elements, such as disk heartbeating or
IPAT via replacement. It also indicates any actions that might be required before you can
migrate. Second, this program prepares for the new cluster by obtaining the disk to be used
for the repository disk and multicast address.
Command profile: The clmigcheck command is not a PowerHA command, but the
command is part of bos.cluster and is in the /usr/sbin directory.

High-level overview of the clmigcheck process
Figure 7-6 shows a high-level view of how the clmigcheck program works. The clmigcheck
program must go through several stages to complete the cluster migration.
Figure 7-6 High-level process of the clmigcheck command
The clmigcheck program goes through the following stages:

1. Performing the first initial run
When the clmigcheck program runs, it checks whether it has been run before by looking
for a /var/clmigcheck/clmigcheck.txt file. If this file does not exist, the clmigcheck
program runs and opens the menu shown in Figure 7-8 on page 159.
2. Verifying that the cluster configuration is suitable for migration
From the clmigcheck menu, you can select options 1 or 2 to check your existing ODM or
snapshot configuration to see if your environment is ready for migration.
3. Creating the CAA required configuration
After performing option 1 or 2, choose option 3. Option 3 creates the /var/clmigcheck
/clmigcheck.txt file with the information entered and is copied to all nodes in the cluster.
4. Performing the second run on the first node, or first run on any other node that is not the
first or the last node in the cluster to be migrated
If the clmigcheck program is run again and the clmigcheck.txt file already exists, a
message is returned indicating that you can proceed with the upgrade of PowerHA.

5. Verifying whether the last node in the cluster is upgraded
When the clmigcheck program runs, apart from checking for the presence of the
clmigcheck.txt file, it verifies if it is the last node in the cluster to be upgraded. The lslpp
command is run against each node in the cluster to establish whether PowerHA has been
upgraded. If all other nodes are upgraded, this command confirms that this node is the last
node of the cluster and can now create the CAA cluster.
The clmigcheck program uses the mkcluster command and passes the cluster parameters
from the existing PowerHA cluster, along with the repository disk and multicast address (if
applicable). Figure 7-7 shows an example of the mkcluster command being called.
usr/sbin/mkcluster -n newyork -r hdisk1 -m

chile{cle_globid=4},scotland{cle_globid=5},serbia{cle_globid=6}
Figure 7-7 The clmigcheck command calling the mkcluster command
Running the clmigcheck command

Figure 7-8 shows the main clmigcheck panel. You choose option 1 or 2 depending on which
type of migration you want to perform. Option 1 is for a rolling or offline migration. Option 2 is
for a snapshot migration. When you choose either option, a check of the cluster configuration
is performed to verify if the cluster can be migrated. If any problems are detected, a warning
or error message is displayed.
------------[ PowerHA SystemMirror Migration Check ]-------------
Please select one of the following options:
1 = Check ODM configuration.
2 = Check snapshot configuration.
3 = Enter repository disk and multicast IP addresses.
Select one of the above,"x"to exit or "h" for help:

Figure 7-8 The clmigcheck menu
A warning message is displayed for certain unsupported elements, such as disk heartbeat as
CONFIG-WARNING: The configuration contains unsupported hardware: Disk

Heartbeat network. The PowerHA network name is net_diskhb_01. This will be
removed from the configuration during the migration
to PowerHA SystemMirror 7.1.
Hit <Enter> to continue

Figure 7-9 The disk heartbeat warning message when running the clmigcheck command

Non-IP networks can be dynamically removed during the migration process by using the
clmigcleanup command. However, other configurations, such as IPAT via replacement,
require manual steps to remove or change them to a supported configuration. After the
changes are made, run clmigcheck again to ensure that the error is resolved.
The second function of the clmigcheck program is to prepare the CAA cluster environment.
This function is performed when you select option 3 (Enter repository disk and multicast IP
addresses) from the menu.
When you select this option, the clmigcheck program stores the information entered in the
/var/clmigcheck/clmigcheck.txt file. This file is also copied to the /var/clmigcheck
directory on all nodes in the cluster. This file contains the physical volume identifier (PVID) of
the repository disk and the chosen multicast address. If PowerHA is allowed to choose a
multicast address automatically, the NULL setting is specified in the file. Figure 7-10 shows
an example of the clmigcheck.txt file.
CLUSTER_TYPE:STANDARD
CLUSTER_REPOSITORY_DISK:000fe40120e16405
CLUSTER_MULTICAST:NULL
Figure 7-10 Contents of the clmigcheck.txt file
Upon running the clmigcheck command, the command checks to see if the clmigcheck.txt
file exists. If the clmigcheck.txt file exists and the node is not the last node in the cluster to
be migrated, the panel shown in Figure 7-11 is displayed. It contains a message indicating
that you can now upgrade to the later level of PowerHA.
clmigcheck: This is not the first node or last node clmigcheck was run on.
No further checking is required on this node. You can install the new
version of PowerHA SystemMirror.
-----------------------------------------------------------------------
Figure 7-11 The clmigcheck panel after it has been run once and before the PowerHA upgrade
The clmigcheck program checks the installed version of PowerHA to see if it has been
upgraded. This step is important to determine which node is the last node to be upgraded in
the cluster. If it is the last node in the cluster, then additional configuration operations must be
completed along with creating and activating the CAA cluster.
Important: You must run the clmigcheck program before you upgrade PowerHA. Then
upgrade PowerHA one node at a time, and run the clmigcheck program on the next node
only after you complete the migration on the previous node. If you do not run the
clmigcheck program specifically on the last node, the cluster is still in migration mode
without creating the CAA cluster. For information about how to resolve this situation, see
10.4.7, “The ‘Cluster services are not active’ message” on page 323.

After you upgrade PowerHA, if you run the clmigcheck program again, you see an error
message similar to the one shown in Figure 7-12. The message indicates that all migration
steps for this node of the cluster have been completed.
ERROR: This program is intended for PowerHA configurations prior to version 7.1
The version currently installed appears to be: 7.1.0
Figure 7-12 clmigcheck panel after PowerHA has been installed on a node.
Figure 7-13 shows an extract from the /tmp/clmigcheck/clmigcheck.log file that was taken
when the clmigcheck command ran on the last node in a three-node cluster migration. This
file shows the output by the clmigcheck program when checking whether this node is the last
node of the cluster.
ck_lastnode: Getting version of cluster.es.server.rte on node chile
ck_lastnode: lslpp from node (chile) is

/etc/objrepos:cluster.es.server.rte:7.1.
0.1::COMMITTED:F:Base Server Runtime:
ck_lastnode: cluster.es.server.rte on node chile is (7.1.0.1)
ck_lastnode: Getting version of cluster.es.server.rte on node serbia
ck_lastnode: lslpp from node (serbia) is

/etc/objrepos:cluster.es.server.rte:7.1
.0.1::COMMITTED:F:Base Server Runtime:
ck_lastnode: cluster.es.server.rte on node serbia is (7.1.0.1)
ck_lastnode: Getting version of cluster.es.server.rte on node scotland
ck_lastnode: lslpp from node (scotland) is

/etc/objrepos:cluster.es.server.rte:6
.1.0.2::COMMITTED:F:ES Base Server Runtime:
ck_lastnode: cluster.es.server.rte on node scotland is (6.1.0.2)
ck_lastnode: oldnodes = 1
ck_lastnode: This is the last node to run clmigcheck.
clmigcheck: This is the last node to run clmigcheck, create the CAA cluster
Figure 7-13 Extract from clmigcheck.log file showing the lslpp last node checking
7.3 Snapshot migration

To illustrate a snapshot migration, the environment in this scenario entails a two-node AIX
6.1.3 and PowerHA 5.5 SP4 cluster being migrated to AIX 6.1 TL6 and PowerHA 7.1 SP1.
The nodes are IBM POWER6® 550 systems and configured as VIO client partitions. Virtual
devices are used for network and storage configuration.

The network topology consists of one IP network and one non-IP network, which is the disk
heartbeat network. The initial IPAT method is IPAT via replacement, which must be changed
before starting the migration, because PowerHA 7.1 only supports IPAT via aliasing.
Also the environment has one resource group that includes one service IP, two volume
groups, and application monitoring. This environment also has an IBM HTTP server as the
application. Figure 7-14 shows the relevant resource group settings.
Resource Group Name testrg

Participating Node Name(s) algeria brazil
Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node
Fallback Policy Never Fallback
Site Relationship ignore
Node Priority
Service IP Label algeria_svc
Volume Groups algeria_vg brazil_vg
Figure 7-14 Cluster resource group configuration using snapshot migration
7.3.1 Overview of the migration process

A major difference from previous migration versions is the clmigcheck script, which is
mandatory for the migration procedure. As stated in 1.2, “Cluster Aware AIX” on page 7,
PowerHA 7.1 uses CAA for monitoring and event management. By running the clmigcheck
script (option 3), you can specify a repository disk and a multicast address, which are
required for the CAA service.
The snapshot migration method requires all cluster nodes to be offline for some time. It
requires removing previous versions of PowerHA and installing AIX 6.1 TL6 or later and the
new version of PowerHA 7.1.
In this scenario, to begin, PowerHA 5.5 SP4 is on AIX 6.1.3 and migrated to PowerHA 7.1
SP1 on AIX 6.1 TL6. The network topology consists of one IP network using IPAT via
replacement and the disk heartbeat network. Both of these network types are no longer
supported. However, if you have an IPAT via replacement configuration, the clmigcheck script
generates an error message as shown in Figure 7-15. You must remove this configuration to
proceed with the migration.
CONFIG-ERROR: The configuration contains unsupported options: IP Address

Takeover via Replacement. The PowerHA network name is net_ether_01.
This will have to be removed from the configuration before

migration to PowerHA SystemMirror

Figure 7-15 The clmigcheck error message for IPAT via replacement
IPAT via replacement configuration: If your cluster has an IPAT via replacement
configuration, remove or change to the IPAT via alias method before starting the migration.

7.3.2 Performing a snapshot migration
The next steps are followed to migrate the cluster.
Creating a snapshot
Create a snapshot by entering the smit cm_add_snap.dialog command while your cluster is
running.
Stopping the cluster

Run the smit clstop command on all nodes to take down the cluster. Ensure that the cluster
is down by using the lssrc -ls clstrmgrES command (Figure 7-16) for each node.
# lssrc -ls clstrmgrES

sccsid = "$Header: @(#) 61haes_r710_integration/13 43haes/usr/sbin/cluster/hacmprd/main.C,
hacmp, 61haes_r710, 1034A_61haes_r710 2010-08-19T1
0:34:17-05:00$"
Figure 7-16 The lssrc -ls clstrmgrES command to ensure that each cluster is down
Installing AIX 6.1.6 and clmigcheck

To install AIX 6.1.6 and the clmigcheck program, follow these steps:
1. By using the AIX 6.1.6 installation media or TL6 updates, perform a smitty update_all.
2. After updating AIX, check whether the bos.cluster and bos.ahafs file sets are correctly
installed as shown in Figure 7-17. These two file sets are new for the CAA services. You
might need to install them separately.
brazil:/ # lslpp -l |grep bos.cluster

bos.cluster.rte 6.1.6.1 APPLIED Cluster Aware AIX
bos.cluster.solid 6.1.6.1 APPLIED POWER HA Business Resiliency
bos.cluster.rte 6.1.6.1 APPLIED Cluster Aware AIX
bos.cluster.solid 6.1.6.0 COMMITTED POWER HA Business Resiliency
brazil:/ #
Figure 7-17 Verifying additional required file sets
The clcomd subsystem is now part of AIX and requires the fully qualified host names of all
nodes in the cluster to be listed in the /etc/cluster/rhosts file. Because AIX was
updated, a restart is required.
3. Because you updated the AIX image, restart the system before you continue with the next
step.
After restarting the system, you can see the clcomd subsystem from the caa subsystem
group that is up and running. The clcomdES daemon, which is part of PowerHA, is also
running as shown in Figure 7-18.
algeria:/usr/es/sbin/cluster/etc # lssrc -a|grep com

clcomdES clcomdES 2818102 active
algeria:/usr/es/sbin/cluster/etc #
Figure 7-18 Two clcomd daemons exist

Now AIX 6.1.6 is installed and you ready for the clmigcheck step.
4. Run the clmigcheck command on the first node (algeria). Figure 7-19 shows the
clmigcheck menu.

Figure 7-19 Options on the clmigcheck menu
The clmigcheck menu options: In the clmigcheck menu, option 1 and 2 review the
cluster configurations. Option 3 gathers information that is necessary to create the CAA
cluster during its execution on the last node of the cluster. In option 3, you define a cluster
repository disk and multicast IP address. Selecting option 3 means that you are ready to
start the migration.
In option 3 of the clmigcheck menu, you select two configurations:

The disk to use for the repository
The multicast address for internal cluster communication
Option 2: Checking the snapshot configuration

When you choose option 2 from the clmigcheck menu, a prompt is displayed for you to
provide the snapshot file name. The clmigcheck review specifies the snapshot file and shows
an error or warning message if any unsupported elements are discovered.

In the test environment, a disk heartbeat network is not supported in PowerHA 7.1. The
warning message from clmigcheck is for the disk heartbeat configuration as Figure 7-20
shows.
h = help
Enter snapshot name (in /usr/es/sbin/cluster/snapshots): snapshot_mig
clsnapshot: Removing any existing temporary HACMP ODM entries...

clsnapshot: Creating temporary HACMP ODM object classes...
clsnapshot: Adding HACMP ODM entries to a temporary directory..
clsnapshot: Succeeded generating temporary ODM containing Cluster Snapshot:
snapshot_mig

Heartbeat network. The PowerHA network name is net_diskhb_01. This will be

Figure 7-20 The clmigcheck warning message for a disk heartbeat configuration
Figure 7-20 shows the warning message “This will be removed from the configuration
during the migration”. Because it is only a warning message, you can continue with the
migration. After completing the migration, verify that the disk heartbeat is removed.
When option 2 of clmigcheck is completed without error, proceed with option 3 as shown in
Figure 7-21.
The ODM has no unsupported elements.

Figure 7-21 clmigcheck passed for snapshot configurations

Option 3: Entering the repository disk and multicast IP addresses
In option 3, clmigcheck lists all shared disks on both nodes. In this scenario, hdisk1 is
specified as the repository disk as shown in Figure 7-22.
Select the disk to use for the repository
1 = 000fe4114cf8d1ce(hdisk1)
2 = 000fe4114cf8d3a1(hdisk4)
3 = 000fe4114cf8d441(hdisk5)
4 = 000fe4114cf8d4d5(hdisk6)
5 = 000fe4114cf8d579(hdisk7)
Select one of the above or "x" to exit: 1

Figure 7-22 Selecting the repository disk
You can create a NULL entry for the multicast address. Then, AIX generates one such
address as shown in Figure 7-23. Keep this value as the default so that AIX can generate the
multicast address.
PowerHA SystemMirror uses multicast address for internal

cluster communication and monitoring. These must be in the
multicast range, 224.0.0.0 - 239.255.255.255.
If you make a NULL entry, AIX will generate an appropriate address for you.
You should only specify an address if you have an explicit reason to do
so, but are cautioned that this address cannot be changed once the
configuration is activated (i.e. migration is complete).
h = help
Enter the multicast IP address to use for network monitoring:

Figure 7-23 Defining a multicast address

The clmigcheck process is logged in the /tmp/clmigcheck/clmigcheck.log file (Figure 7-24).
validate_disks: No sites, only one repository disk needed.
validate_disks: Disk 000fe4114cf8d1ce exists
prompt_mcast: Called
prompt_mcast: User entered:
validate_mcast: Called
write_file: Called
write_file: Copying /tmp/clmigcheck/clmigcheck.txt to algeria:/var/clmigcheck/clmigcheck.txt
write_file: Copying /tmp/clmigcheck/clmigcheck.txt to brazil:/var/clmigcheck/clmigcheck.txt

Figure 7-24 /tmp/clmigcheck/clmigcheck.log
The completed clmigcheck program

When the clmigcheck program is completed, it creates a /var/clmigcheck/clmigcheck.txt
file on each node of the cluster. The text file contains a PVID of the repository disk and the
multicast address for the CAA cluster as shown in Figure 7-25.
# cat /var/clmigcheck/clmigcheck.txt
CLUSTER_REPOSITORY_DISK:000fe4114cf8d1ce
Figure 7-25 The /var/clmigcheck/clmigcheck.txt file
When PowerHA 7.1 is installed, this information is used to create the HACMPsircol.odm file as
shown in Figure 7-26. This file is created when you finish restoring the snapshot in this
scenario.
algeria:/ # odmget HACMPsircol
HACMPsircol:
name = "canada_cluster_sircol"
id = 0
uuid = "0"
repository = "000fe4114cf8d1ce"
ip_address = ""
nodelist = "brazil,algeria"
backup_repository1 = ""
algeria:/ #
Figure 7-26 The HACMPsircol.odm file

Running clmigcheck on one node: Compared to the rolling migration method, the
snapshot migration method entails running the clmigcheck command on one node. Do not
run the clmigcheck command on another node while you are doing a snapshot migration or
the migration will fail. If you run the clmigcheck command on every node, the CAA cluster
is created upon executing the clmigcheck command on the last node and goes into the
rolling migration phase.
Uninstalling PowerHA SystemMirror 5.5

To uninstall PowerHA SystemMirror 5.5, follow these steps:
1. Run smit install_remove and specify cluster.* from all nodes. Verify this step by
running the following command to show that all PowerHA file sets are removed:
lslpp -l cluster.*
2. Install PowerHA 7.1 by using the following command:
smit install_all
3. Verify that the file sets are installed correctly:
lslpp -l cluster.*
After you install the new PowerHA 7.1 file sets, you can see that the clcomdES daemon has
disappeared. You now have the clcomd daemon, which is part of CAA, instead of the clcomdES
daemon.
Updating the /etc/cluster/rhosts file

After you complete the installation of PowerHA 7.1, update the /etc/cluster/rhosts file:
1. Update the /etc/cluster/rhosts file with the fully qualified domain name of each node in
the cluster. (For example, you might use the output from the hostname command).
2. Restart the clcomd subsystem as shown in Figure 7-27.
algeria:/ # stopsrc -s clcomd

0513-044 The clcomd Subsystem was requested to stop.
algeria:/ # startsrc -s clcomd
0513-059 The clcomd Subsystem has been started. Subsystem PID is 12255420.
algeria:/ #
Figure 7-27 Restarting the clcomd subsystem on both nodes
3. Stop and start the clcomd daemon instead by using the following command:
refresh -s clcomd
4. To verify that the clcomd subsystem is working, use the clrsh command. If it does not
work, correct any problems before proceeding as explained in Chapter 10,
“Troubleshooting PowerHA 7.1” on page 305.
Converting the snapshot

Now convert the snapshot from PowerHA 5.5. On PowerHA 7.1, run the clconvert_snapshot
command before you restore it. (In some older versions of PowerHA, you do not need to run
this command.) While converting the snapshot, the clconvert_snapshot command refers to
the /var/clmigcheck/clmigcheck.txt file and adds the HACMPsircol stanza with the
repository disk and multicast address, which are newly introduced in PowerHA 7.1. After you

restore the snapshot, you can see that the HACMPsircol ODM contains this information as
illustrated in Figure 7-26 on page 167.
Restoring a snapshot
To restore a snapshot, follow the path smitty hacmp  Cluster Nodes and Networks 
Manage the Cluster  Snapshot Configuration  Restore the Cluster Configuration
From a Snapshot for restoring a snapshot.
Failure to restore a snapshot

When you restore the snapshot with the default option, an error message about clcomd
communication is displayed. Because there is no configuration, the snapshot fails at the
communication_check function in the clsnapshot program as shown in Figure 7-28.
cllsnode: Error reading configuration

/usr/es/sbin/cluster/utilities/clsnapshot[2127]: apply_CS[116]: communication_check: line 49:
local: not found
Warning: unable to verify inbound clcomd communication from node "algeria" to the local node,
"".
/usr/es/sbin/cluster/utilities/clsnapshot[2127]: apply_CS[116]: communication_check: line 49:

local: not fou
nd
Warning: unable to verify inbound clcomd communication from node "brazil" to the local node,
"".
clsnapshot: Verifying configuration using temporary PowerHA SystemMirror ODM entries...

Cannot get local HACMPnode ODM.
Cannot get local HACMPnode ODM.
FATAL ERROR: CA_invoke_client nodecompath == NULL! @ line: of file: clver_ca_main.c
Figure 7-28 A failed snapshot restoration

If you are at PowerHA 7.1 SP2, you should not see the failure message. However, some error
messages concern the disk heartbeat network (Figure 7-29), which is not supported in
PowerHA 7.1. You can ignore this error message.
clsnapshot: Removing any existing temporary PowerHA SystemMirror ODM entries...

clsnapshot: Creating temporary PowerHA SystemMirror ODM object classes...
clsnapshot: Adding PowerHA SystemMirror ODM entries to a temporary
directory..ODMDIR set to /tmp/snapshot
Error: Network's network type diskhb is not known.

Error: Interface/Label's network type diskhb is not known.
Error: Network's network type diskhb is not known.
Error: Interface/Label's network type diskhb is not known.
cllsnode: Error reading configuration
clodmget: Could not retrieve object for HACMPnode, odm errno 5904
/usr/es/sbin/cluster/utilities/clsnapshot[2139]: apply_CS[125]:
communication_check: line 52: local: not found
Warning: unable to verify inbound clcomd communication from node "algeria" to

the local node, "
".
/usr/es/sbin/cluster/utilities/clsnapshot[2139]: apply_CS[125]:
communication_check: line 52: local: not found
Warning: unable to verify inbound clcomd communication from node "brazil" to

the local node, ""
Figure 7-29 The snapshot restoring the error with the new clsnapshot command
When you finish restoring the snapshot, the CAA cluster is created based on the repository
disk and multicast address based in the /var/clmigcheck/clmigcheck.txt file.
Sometimes the synchronization or verification fails because the snapshot cannot create the
CAA cluster. If you see an error message similar to the one shown in Figure 7-30, look in the
/var/adm/ras/syslog.caa file and correct the problem.
ERROR: Problems encountered creating the cluster in AIX. Use the syslog
facility to see output from the mkcluster command.
ERROR: Creating the cluster in AIX failed. Check output for errors in local
cluster configuration, correct them, and try synchronization again.
ERROR: Updating the cluster in AIX failed. Check output for errors in local
cluster configuration, correct them, and try synchronization again.
cldare: Error detected during synchronization.

Figure 7-30 Failure of CAA creation during synchronization or verification

Figure 7-30 on page 170 shows a sample CAA creation failure, which is a clrepos_private1
file system mount point that is used for the CAA service. Assuming you have enabled syslog,
you can easily find it in the syslog.caa file, which you can find by searching on “odmadd
HACMPsircol.add.”
After completing all the steps, check the CAA cluster configuration and status on both nodes.
First, the caavg_private volume group is created and varied on as shown in Figure 7-31.
algeria:/ # lspv
hdisk2 000fe4114cf8d258 algeria_vg
hdisk3 000fe4114cf8d2ec brazil_vg
hdisk8 000fe4114cf8d608 diskhb
hdisk0 000fe4113f087018 rootvg active
algeria:/ #
Figure 7-31 The caavg_private volume group varied on

From the lscluster command, you can see information about the CAA cluster including the
repository disk, the multicast address, and so on, as shown in Figure 7-32.
algeria:/ # lscluster -m
Node name: algeria

uuid for node: 0410c158-c6ca-11df-88bc-c21e45bc6603
canada_cluster local e8fbea82-c6c9-11df-b8d6-c21e4a9e5103

n/a
------------------------------
Node name: brazil

uuid for node: e8ff0dde-c6c9-11df-b8d6-c21e4a9e5103
State of node: UP
canada_cluster local e8fbea82-c6c9-11df-b8d6-c21e4a9e5103

en1 UP
en0 UP
algeria:/mnt/HA71 # lscluster -c
Cluster query for cluster canada_cluster returns:
Cluster uuid: e8fbea82-c6c9-11df-b8d6-c21e4a9e5103
Cluster id for node algeria is 1
Primary IP address for node algeria is 192.168.101.101
Cluster id for node brazil is 2
Primary IP address for node brazil is 192.168.101.102
algeria:/mnt/HA71 #
Figure 7-32 The lscluster command after creating the CAA cluster

You can also check whether the multicast address is correctly defined for each interface by
running the netstat -a -I en0 command as shown in Figure 7-33.
algeria:/ # netstat -a -I en0

en0 1500 link#2 c2.1e.45.bc.66.3 1407667 0 1034372 0 0
01:00:5e:28:65:65
01:00:5e:7f:ff:fd
01:00:5e:00:00:01
en0 1500 192.168.100 algeria 1407667 0 1034372 0 0
228.168.101.101
239.255.255.253
224.0.0.1
en0 1500 10.168.100 algeria_svc 1407667 0 1034372 0 0
228.168.101.101
239.255.255.253
224.0.0.1
algeria:/ # netstat -a -I en1

en1 1500 link#3 c2.1e.45.bc.66.4 390595 0 23 0 0
01:00:5e:28:65:65
01:00:5e:7f:ff:fd
01:00:5e:00:00:01
en1 1500 192.168.200 algeria_boot 390595 0 23 0 0
228.168.101.101
239.255.255.253
224.0.0.1
Figure 7-33 The multicast address for CAA service
After the clmigcheck command is done running, you can remove the older version of
PowerHA and install PowerHA 7.1.
Optional: Adding a shared disk to the CAA services

After the migration, the shared volume group is not included in the CAA service as shown in
Figure 7-34.
# lspv
hdisk2 000fe4114cf8d258 algeria_vg
hdisk3 000fe4114cf8d2ec brazil_vg
#
Figure 7-34 The lspv output after restoring the snapshot

To add the shared volume group disks to the CAA service, run the following command:
chcluster -n <cluster_name> -d +hdiskX, hdiskY
where:
<cluster_name> is canada_cluster.
+hdiskX is +hdisk2.
hdsiskY is hdisk3.
The two shared disks are now included in the CAA shared disk as shown in Figure 7-35.
algeria: # chcluster -n canada_cluster -d +hdisk2,hdisk3

chcluster: Cluster shared disks are automatically renamed to names such as
cldisk1, [cldisk2, ...] on all cluster nodes. However, this cannot
take place while a disk is busy or on a node which is down or not
reachable. If any disks cannot be renamed now, they will be renamed
later by the clconfd daemon, when the node is available and the disks
are not busy.
algeria: #
Figure 7-35 Using the chcluster command for shared disks
Now hdisk2 and hdisk3 are changed to cldisk. The hdisk name from the lspv command
shows the cldiskX instead of the hdiskX as shown in Figure 7-36.
algeria:/ # lspv
cldisk1 000fe4114cf8d258 algeria_vg
cldisk2 000fe4114cf8d2ec brazil_vg
hdisk8 000fe4114cf8d608 diskhb
algeria:/ #
Figure 7-36 The lspv command showing cldisks for shared disks
When you use the lscluster command to perform the check, you can see that the shared
disks (cldisk1 and cldisk2) are monitored by the CAA service. Keep in mind that two types
of disks are in CAA. One type is the repository disk that is shown as REPDISK, and the other
type is the shared disk that is shown as CLUSDISK. See Figure 7-37 on page 175.

algeria:/ # lscluster -d
Storage Interface Query
Cluster Name: canada_cluster

Cluster uuid: 97833c9e-c5b8-11df-be00-c21e45bc6603
Node algeria
Node uuid = 88cff8be-c58f-11df-95ab-c21e45bc6604
Number of disk discovered = 3
cldisk2
state : UP
uDid : 533E3E213600A0B80001146320000F1A74C18BDAA0F1815
FAStT03IBMfcp05VDASD03AIXvscsi
uUid : 600a0b80-0011-4632-0000-f1a74c18bdaa
type : CLUSDISK
cldisk1
state : UP
uDid : 533E3E213600A0B8000291B080000D3CB053B7EA60F1815
FAStT03IBMfcp05VDASD03AIXvscsi
uUid : 600a0b80-0029-1b08-0000-d3cb053b7ea6
type : CLUSDISK
caa_private0
state : UP
uDid :
uUid : 600a0b80-0029-1b08-0000-d3cd053b7f0d
type : REPDISK
Node
Node uuid = 00000000-0000-0000-0000-000000000000
algeria:/ #
Figure 7-37 The shared disks monitored by the CAA service
Verifying the cluster

To verify the snapshot migration, check the components shown in Table 7-2 on each node.
Table 7-2 Components to verify after the snapshot migration

Component Command
The CAA services are active. lssrc -g caa

lscluster -m
The RSCT services are active. lssrc -s cthags
Start the cluster service one by one. smitty clstart

7.3.3 Checklist for performing a snapshot migration
Because the entire migration can be confusing, Table 7-3 provides a step-by-step checklist for
the snapshot migration of each node in the cluster.
Table 7-3 Checklist for performing a snapshot migration

Step Node 1 Node 2 Check
0 Ensure that the cluster is Ensure that the cluster is

running. running.
1 Create a snapshot.
2 Stop the cluster. Stop the cluster. lssrc -ls clstrmgrES
3 Update AIX 6.1.6. Update AIX 6.1.6. oslevel -s

install bos.cluster and
bos.ahafs filesets
4 Restart the system. Restart the system.
5 Select option 2 from the Check for unsupported

clmigcheck menu. configurations.
6 Select option 3 from the /var/clmigcheck/clgmicheck.txt

clmigcheck menu.
7 Remove PowerHA 5.5 Remove PowerHA 5.5 lslpp -l | grep cluster

and install PowerHA 7.1. and install PowerHA 7.1.
8 Convert the snapshot. clconvert_snapshot
9 Restore the snapshot.
10 Start the cluster. lssrc -ls clstrmgrES, hacmp.out
11 Start the cluster. lssrc -ls clstrmgrES, hacmp.out
7.3.4 Summary
A snapshot migration to PowerHA 7.1 entails running the clmigcheck program. Before you
begin the migration, you must prepare for it by installing AIX 6.1.6 or later and checking if any
part of the configuration is unsupported.
Then you run the clmigcheck command to review your PowerHA configuration and verify that
is works with PowerHA 7.1. After verifying the configuration, you specify a repository disk and
multicast address for the CAA service, which are essential components for the CAA service.
After you successfully complete the clmigcheck procedure, you can install PowerHA 7.1. The
CAA service is made while you restore your snapshot. PowerHA 7.1 uses the newly
configured CAA service for event monitoring and heartbeating.

7.4 Rolling migration
This section explains how to perform a three-node rolling migration of AIX and PowerHA. The
test environment begins with PowerHA 6.1 SP3 and AIX 6.1 TL3 versions. The step-by-step
instructions in this topic explain how to perform a three-node rolling migration of AIX to 6.1
TL6 and PowerHA to 7.1 SP1 versions as illustrated in Figure 7-38.
Figure 7-38 Three-node cluster before migration
The cluster is using virtualized resources provided by VIOS for network and storage. Rootvg
(hdisk0) is also hosted from the VIOS. The backing devices are provided from a DS4800
storage system.
The network topology is configured as IPAT via aliasing. Also disk heartbeating is used over
the shared storage between all the nodes.
The cluster contains two resource groups: newyork_rg and test_rg. The newyork_rg resource
group hosts the IBM HTTP Server application, and the test_rg resource group hosts a test
script application. The node priority for newyork_rg is node chile, and test_rg is node
serbia. Node scotland is running in a standby node capacity.

Figure 7-39 shows the relevant attributes of the newyork_rg and test_rg resource groups.
Resource Group Name newyork_rg

Participating Node Name(s) chile scotland serbia
Volume Groups ny_datavg
Application Servers httpd_app
Resource Group Name test_app_rg

Participating Node Name(s) serbia chile scotland
Fallback Policy Fallback To Higher Priority Node
Application Servers test_app
Figure 7-39 Three-node cluster resource groups
7.4.1 Planning
Before beginning a rolling migration, you must properly plan to ensure that you are ready to
proceed. For more information, see 7.1, “Considerations before migrating” on page 152. The
migration to PowerHA 7.1 is different from previous releases, because of the support for CAA
integration. Therefore, see also 7.2, “Understanding the PowerHA 7.1 migration process” on
page 153.
Ensure that the cluster is stable on all nodes and is synchronized. With a rolling migration,
you must be aware of the following restrictions while performing the migration, because a
mixed-software-version cluster is involved:
Do not perform synchronization or verification while a mixed-software-version cluster
exists. Such actions are not allowed in this case.
Do not make any cluster configuration changes.
Do not perform a Cluster Single Point Of Control (C-SPOC) operation while a
mixed-software-version cluster exists. Such action is not allowed in this case.
Try to perform the migration during one maintenance period, and do not leave your cluster
in a mixed state for any significant length of time.
7.4.2 Performing a rolling migration

In this example, a two-phase migration is performed in which you migrate AIX from version
6.1 TL3 to version 6.1 TL6, restart the system, and then migrate PowerHA. You perform this
migration on one node at a time, ensuring that any resource group that the node is hosting is
moved to another node first.

Migrating the first node
Figure 7-40 shows the cluster before upgrading AIX.
Figure 7-40 Rolling migration: Scotland before the AIX upgrade
To migrate the first node, follow these steps:

1. Shut down PowerHA services on the standby node (scotland). Specify the smitty clstop
command to stop this node. Because this node is a standby node, no resource groups are
hosted. Therefore, you do not need to perform any resource group operations first. Ensure
that cluster services are stopped by running the following command:
lssrc -ls clstrmgres
Look for the ST_INIT status, which indicates that cluster services on this node are in a
stopped state.
2. Update AIX to version 6.1 TL6 (scotland node). To perform this task, run the smitty
update_all command by using the TL6 images, which you can download by going to:
http://www.ibm.com/support/entry/portal/Downloads/IBM_Operating_Systems/AIX
CAA-specific file sets: You must install the CAA specific bos.cluster and bos.ahafs
file sets because update_all does not install them.
After you complete the installation, restart the node.

When AIX is upgraded, you are at the stage shown in Figure 7-41.
Figure 7-41 Rolling migration: Scotland post AIX upgrade
3. Decide which shared disk you to use for the CAA private repository (scotland node). See
7.1, “Considerations before migrating” on page 152, for more information.
Previous volume disk group: The disk must be a clean logical unit number (LUN) that
does not contain a previous volume group. If you have a previous volume group on this
disk, you must remove it. See 10.4.5, “Volume group name already in use” on
page 320.
4. Run the clmigcheck command on the first node (scotland).

You have now upgraded AIX to a CAA version and chosen the CAA disk. When you start
the clmigcheck command, you see the panel shown in Figure 7-42 on page 181. For more
information about the clmigcheck command, see 7.2, “Understanding the PowerHA 7.1
migration process” on page 153.


Figure 7-42 Running the clmigcheck command first during a rolling migration
a. Select option 1 (Check the ODM configuration).

When choosing this option, the clmigcheck command checks your configuration and
reports any problems that cannot be migrated.
This migration scenario uses disk-based heartbeating. The clmigcheck command
detects this method and shows a message similar to the one in Figure 7-43, indicating
that this configuration will be removed during migration.

Heartbeat network. The PowerHA network name is net_diskhb_01. This will
be

Figure 7-43 The disk heartbeat warning message from the clmigcheck command
You do not need to take any action because the disk-based heartbeating is
automatically removed during migration. Because three disk heartbeat networks are in
the configuration, this warning message is displayed three times, once for each
network. If no errors are detected, you see the message shown in Figure 7-44.

Figure 7-44 ODM no unsupported elements message
Press Enter after this last panel, and you return to the main menu.

b. Select option 3 to enter the repository disk. As shown in Figure 7-45, in this scenario,
we chose option 1 to use hdisk1 (PVID 000fe40120e16405).
-----------[ PowerHA SystemMirror Migration Check ]-------------
1 = 000fe40120e16405(hdisk1)
2 = 000fe4114cf8d258(hdisk2)
3 = 000fe4114cf8d2ec(hdisk3)
4 = 000fe4013560cc77(hdisk5)
5 = 000fe4114cf8d4d5(hdisk6)
6 = 000fe4114cf8d579(hdisk7)
Select one of the above or "x" to exit:

Figure 7-45 Choosing a CAA disk
c. Enter the multicast address as shown in Figure 7-46. You can specify a multicast, or
you can have clmigcheck automatically assign one. For more information about
multicast addresses, see 1.3.1, “Communication interfaces” on page 13. Press Enter
and you return to the main menu.

multicast range, 224.0.0.0 - 239.255.255.255.
If you make a NULL entry, AIX will generate an appropriate address for
you.
h = help

Figure 7-46 Choosing a multicast address
d. Exit the clmigcheck tool.

5. Verify whether you are ready for the PowerHA upgrade on the node scotland by running
the clmigcheck tool again. If you are ready, you see the panel shown in Figure 7-47.

Figure 7-47 Verifying readiness for migration
6. Upgrade PowerHA on the scotland node to PowerHA 7.1 SP1. Because the cluster
services are down, you can perform a smitty update_all to upgrade PowerHA.
7. When this process is complete, modify the new rhosts definition for CAA as shown in
Figure 7-48. Although in this scenario, we used network addresses, you can also add the
short name for the host name into rhosts considering that you configured the /etc/hosts
file correctly. See “Creating a cluster with host names in the FQDN format” on page 75, for
more information.
/etc/cluster
# cat rhosts
192.168.101.111
192.168.101.112
192.168.101.113
Figure 7-48 Extract showing the configured rhosts file
Populating the /etc/cluster/rhosts file: The /etc/cluster/rhosts file must be

populated with all cluster IP addresses before using PowerHA SystemMirror. This
process was done automatically in previous releases but is now a required, manual
process. The addresses that you enter in this file must include the addresses that
resolve to the host name of the cluster nodes. If you update this file, you must refresh
the clcomd subsystem by using the following command:
refresh -s clcomd
Restarting the cluster: You do not need to restart the cluster after you upgrade
PowerHA.
8. Start PowerHA on the scotland node by issuing the smitty clstart command. The node
should be able to rejoin the cluster. However, you receive warning messages about mixed
versions of PowerHA.
After PowerHA is started on this node, move any resource groups that the next node is
hosting onto this node so that you can migrate the second node in the cluster. In this
scenario, the serbia node is hosting the test_app_rg resource group. Therefore, we
perform a resource group move request to move this resource to the newly migrated
scotland node. The serbia node is then available to migrate.

You have now completed the first node migration of the three-node cluster. You have
rejoined the cluster and are now in a mixed version. Figure 7-49 shows the starting point
for migrating the next node in the cluster, with the test_app_rg resource group moved to
the newly migrated scotland node.
Figure 7-49 Rolling migration: Scotland post HA upgrade

Migrating the second node
Figure 7-50 shows that you are ready to proceed with migration of the second node (serbia).
Figure 7-50 Rolling migration: Serbia before the AIX upgrade
To migrate the second node, follow these steps:

1. Shut down PowerHA services on the serbia node. You must stop cluster services on this
node before you begin the migration.
2. Upgrade to AIX 6.1 TL6 (serbia node) similar to the process you used for the scotland
node. After the update is complete, ensure that AIX is rebooted.

You are now in the state as shown in Figure 7-51.
Figure 7-51 Rolling migration: Serbia post AIX upgrade
3. Run the clmigcheck command to ensure that the migration worked and that you can
proceed with the PowerHA upgrade. This step is important even though you have already
performed the cluster configuration migration check and CAA configuration on the first
node (scotland) is complete.
Figure 7-52 shows the panel that you see now.

Figure 7-52 The clmigcheck panel on the second node
4. Upgrade PowerHA on the serbia node to PowerHA 7.1 SP1. Follow the same migration
procedure as in the first node.
Reminder: Update the /etc/cluster/rhosts file so that it is the same as the first node
that you upgraded. See step 6 on page 183.

5. Start PowerHA on the serbia node and rejoin this node to the cluster.
After this node is started, check and move the newyork_rg resource group from the chile
node to the scotland node. By performing this task, you are ready to proceed with
migration of the final node in the cluster (the chile node).
At this stage, two of the three nodes in the cluster are migrated to AIX 6.1 TL6 and PowerHA
7.1. The chile node is the last node in the cluster to be upgraded. Figure 7-53 shows how the
cluster looks now.
Figure 7-53 Rolling migration: The serbia node post HA upgrade

Migrating the final node
Figure 7-54 shows that you are ready to proceed with migration of the final node of the chile
cluster. The newyork_rg resource group has been moved to the scotland node and the
cluster services are down and ready for the AIX migration.
Figure 7-54 Rolling migration: The chile node before the AIX upgrade
To migrate the final node, follow these steps:

1. Shut down PowerHA services on the chile node.
2. Upgrade to AIX 6.1 TL6 (chile node). Remember to reboot the node after the upgrade.
Then run the clmigcheck command for the last time.
When the clmigcheck command is run for the last time, it recognizes that this node is the
last node of the cluster to migrate. This command then initiates the final phase of the
migration, which configures CAA. You see the message shown in Figure 7-55.
clmigcheck: You can install the new version of PowerHA SystemMirror.

Figure 7-55 Final message from the clmigcheck command

If a problem exists at this stage, you might see the message shown in Figure 7-56.
chile:/ # clmigcheck
Verifying clcomd communication, please be patient.
clmigcheck: Running
/usr/sbin/rsct/install/bin/ct_caa_set_disabled_for_migration
on each node in the cluster
Creating CAA cluster, please be patient.
ERROR: Problems encountered creating the cluster in AIX.

Use the syslog facility to see output from the mkcluster command.
Figure 7-56 Error condition from clmigcheck
If you see a message similar to the one shown in Figure 7-56, the final mkcluster phase
has failed. For more information about this problem, see 10.2, “Troubleshooting the
migration” on page 308.
At this stage, you have upgraded AIX and run the final clmigcheck process. Figure 7-57
shows how the cluster looks now.
Figure 7-57 Rolling migration: Chile post AIX upgrade

3. Upgrade PowerHA on the chile node by following the same procedure that you previously
used.
Reminder: Update the /etc/cluster/rhosts file so that it is the same as the other
nodes that you upgraded. See step 6 on page 183.
In this scenario, you started PowerHA on the chile node and performed a synchronization or
verification of the cluster, which is the final stage of the migration. The newyork_rg resource
group was moved back to the chile node. The cluster migration is now completed.
Figure 7-58 shows how the cluster looks now.
Figure 7-58 Rolling migration completed

7.4.3 Checking your newly migrated cluster
After the migration is completed, perform the following checks to ensure that everything has
migrated correctly:
 Verify that CAA is configured and running on all nodes.
Check that CAA is working by running the lscluster -m command. This command returns
information about your cluster from all your nodes. If a problem exists, you see a message
similar to the one shown in Figure 7-59.
# lscluster -m
Cluster services are not active.
Figure 7-59 Message indicating that CAA is not running
If you receive this message, see 10.4.7, “The ‘Cluster services are not active’ message” on
page 323, for details about how to fix this problem.
 Verify that CAA private is defined and active on all nodes.
Check the lspv output to ensure that the CAA repository is defined and varied on for each
node. You see output similar to what is shown in Figure 7-60.
chile:/ # lspv
Figure 7-60 Extract from lspv showing the CAA repository disk
 Check conversion of PowerHA ODM.
Review the /tmp/clconvert.log file to ensure that the conversion of the PowerHA ODM has
been successful. For additional details about the log files and troubleshooting information,
see 10.1, “Locating the log files” on page 306.
 Synchronize or verify the cluster.
Run verification on your cluster to ensure that it operates as expected.
Troubleshooting: For information about common problems and solutions, see

Chapter 10, “Troubleshooting PowerHA 7.1” on page 305.
7.5 Offline migration

This section explains how to perform an offline migration. The test environment begins with
AIX 6.1.3.2 and PowerHA 6.1.0.2. The migration leads to AIX 7.1.0.1 and PowerHA 7.1.0.1.
7.5.1 Planning the offline migration

Part of planning for any migration is to ensure that you meet all the hardware and software
requirements. For more details, see 7.1, “Considerations before migrating” on page 152, and
7.2, “Understanding the PowerHA 7.1 migration process” on page 153.

Starting configuration
Figure 7-61 on page 192 shows a simplified layout of the cluster that is migrated in this
scenario. Both systems are running AIX 6.1 TL3 SP 2. The installed PowerHA version is 6.1
SP 2.
The cluster layout is a mutual takeover configuration. The munich system is the primary server
for the HTTP application. The berlin system is the primary server for the Network File
System (NFS), which is cross mounted by the system munich.
Because of resource limitations, the disk heartbeat is using one of the existing shared disks.
Two networks are defined:
The net_ether_01 network is the administrative network and is used only by the system
administration team.
The net_ether_10 network is used by the applications and its users.
Figure 7-61 Start point for offline migration

Planned target configuration
The plan is to update both systems to AIX 7.1 and to PowerHA SystemMirror 7.1. Because
PowerHA SystemMirror 6.1 SP2 is not supported on AIX 7.1, the quickest way to update it is
through an offline migration. A rolling migration is also possible, but requires the following
migration steps:
1. Update to PowerHA 6.1 SP3 or later (which can be performed by using a nondisruptive
upgrade method).
2. Migrate to AIX 7.1.
3. Migrate to PowerHA 7.1.
PowerHA 6.1 support on AIX 7.1: PowerHA 6.1 SP2 is not supported on AIX 7.1. You
need a minimum of PowerHA 6.1 SP3.
As mentioned in 1.2.3, “The central repository” on page 9, an additional shared disk is

required for the new CAA repository disk. Figure 7-62 shows the results of the completed
migration. To perform the migration, see 7.5.3, “Performing an offline migration” on page 195.
Figure 7-62 Planned configuration for offline migration

7.5.2 Offline migration flow
Figure 7-63 shows a high-level overview of the offline migration flow. First and most
importantly, you must have fulfilled all the new hardware requirements. Then you ensure that
AIX has been upgraded on all cluster nodes before continuing with the update of PowerHA.
To perform the migration, see 7.5.3, “Performing an offline migration” on page 195.
Figure 7-63 Offline migration flow

7.5.3 Performing an offline migration
Before you start the migration, you must complete all hardware and software requirements.
For a list of the requirements, see 7.1, “Considerations before migrating” on page 152.
1. Create a snapshot and copy it to a safe place and create a system backup (mksysb).
The snapshot and the mksysb are not required to complete the migration, but they might be
helpful if something goes wrong. You can also use the snapshot file to perform a snapshot
migration. You can use the system backup to re-install the system back to its original
starting point if necessary.
2. Stop cluster services on all nodes by running the smitty clstop command. Before you
continue, ensure that cluster services are stopped on all nodes.
3. Update to AIX 6.1.6 or later. Alternatively perform a migration installation of AIX to version
7.1. or later.
In this test scenario, a migration installation to version 7.1 is performed on both systems in
parallel.
4. Ensure that the new AIX cluster file sets are installed, specifically the bos.ahafs and
bos.cluster file sets. These file sets are not installed as part of the AIX migration.
5. Restart the systems.
Important: You must restart the systems to ensure that all needed processes for CAA
are running.
6. Verify that the new clcomd subsystem is running.

If the clcomd subsystem is not running, a required file set is missing (see step 4).
Figure 7-64 shows an example of the output indicating that the subsystems are running.
# lssrc -a | grep clcom

clcomdES clcomdES 5243068 active
#
Figure 7-64 Verifying if the clcomd subsystem is running
Beginning with PowerHA 6.1 SP3 or later, you can start the cluster if preferred, but we do
not start it now in this scenario.
7. Run the clmigcheck program on one of the cluster nodes.
Important: You must run the clmigcheck program (in the /usr/sbin/ directory) before
you install PowerHA 7.1. Keep in mind that you must run this program on each node
one-at-a-time in the cluster.

The following steps are required for offline migration when running the clmigcheck
program. The steps might differ slightly if you perform a rolling or snapshot migration.
a. Select option 1 (check ODM configuration) from the first clmigcheck panel
(Figure 7-65).
Select one of the above,"x"to exit or "h" for help: 1

Figure 7-65 The clmigcheck main panel
While checking the configuration, you might see warning or error messages. You must
correct errors manually, but can clean up issues identified by warning messages during
the migration process. In this case, a warning message (Figure 7-66) is displayed
indicating the disk heartbeat network will be removed at the end of the migration.

Heartbeat network. The PowerHA network name is net_diskhb_01. This will
be

Figure 7-66 Warning message after selecting clmigcheck option 1
b. Continue with the next clmigcheck panel.

Only one error or warning is displayed at a time. Press the Enter key, and any
additional messages are displayed. In this case, only one warning message is
displayed.
Manually correct or fix all issues that are identified by error messages before
continuing with the process. After you fix an issue, restart the system as explained in
step 5 on page 195.

c. Verify that you receive a message similar to the one in Figure 7-67 indicating that ODM
has no supported elements. You must receive this message before you continue with
the clmigcheck process and the installation of PowerHA.

Figure 7-67 ODM check successful message
Press Enter, and the main clmigcheck panel (Figure 7-65 on page 196) is displayed
again.
d. Select option 3 (Enter repository disk and multicast IP addresses).
The next panel (Figure 7-68) lists all available shared disks that might be used for the
CAA repository disk. You need one shared disk for the CAA repository.
1 = 00c0f6a01c784107(hdisk4)
Select one of the above or "x" to exit: 1

e. Configure the multicast address as shown in Figure 7-69 on page 198. The system
automatically creates an appropriate address for you. By default, PowerHA creates a
multicast address by replacing the first octet of the IP communication path of the lowest
node in the cluster by 228. Press Enter.
Manually specifying an address: Only specify an address manually if you have an

explicit reason to do so.
Important:
You cannot change the selected IP multicast address after the configuration is
activated.
You must set up any routers in the network topology to forward multicast
messages.


multicast range, 224.0.0.0 - 239.255.255.255.
If you make a NULL entry, AIX will generate an appropriate address for
you.
h = help

Figure 7-69 Configuring a multicast address
f. From the main clmigcheck panel, type an x to exit the clmigcheck program.
g. In the next panel (Figure 7-70), confirm the exit request by typing y.
You have requested to exit clmigcheck.
Do you really want to exit? (y) y

Figure 7-70 The clmigcheck exit confirmation message
A warning message (Figure 7-71) is displayed as a reminder to complete all the

previous steps before you exit.
Note - If you have not completed the input of repository disks and
multicast IP addresses, you will not be able to install
Additional details for this session may be found in

/tmp/clmigcheck/clmigcheck.log.
Figure 7-71 The clmigcheck exit warning message

8. Install PowerHA only on the node where the clmigcheck program was executed.
If the clmigcheck program is not run, a failure message (Figure 7-72) is displayed when
you try to install PowerHA 7.1. In this case, return to step 7 on page 195.
COMMAND STATUS
[MORE...94]
restricted by GSA ADP Schedule Contract with IBM Corp.
. . . . . << End of copyright notice for cluster.es.migcheck >>. . . .
The /usr/sbin/clmigcheck command must be run to

verify the back level configuration before you can
install this version. If you are not migrating the
back level configuration you must remove it before
before installing this version.
Failed /usr/sbin/clmigcheck has not been run
instal: Failed while executing the cluster.es.migcheck.pre_i script.

[MORE...472]
F1=Help F2=Refresh F3=Cancel F6=Command

F8=Image F9=Shell F10=Exit /=Find
n=Find Next
Figure 7-72 PowerHA 7.1 installation failure message
9. Add the host names of your cluster nodes to the /etc/cluster/rhosts file. The names
must match the PowerHA node names.
10.Refresh the clcomd subsystem.
refresh -s clcomd
11.Review the /tmp/clconvert.log file to ensure that a conversion of the PowerHA ODMs
has occurred.
12.Start cluster services only on the node that you updated by using smitty clstart.
13.Ensure that the cluster services have started successfully on this node by using any of the
following commands:.
clstat -a
lssrc -ls clstrmgrES | grep state
clmgr query cluster | grep STATE
14.Continue to the next node.
15.Run the clmigcheck program on this node.
Keep in mind that you must run the clmigcheck program on each node before you can
install PowerHA 7.1. Follow the same steps as for the first system as explained in step 7
on page 195.

An error message similar to the one shown in Figure 7-73 indicates that one of the steps
was not performed. Often this message is displayed because the system was not
restarted after the installation of the AIX cluster file sets.
To correct this issue, return to step 4 on page 195. You might have to restart both systems,
depending on which part was missed.
# clmigcheck
Saving existing /tmp/clmigcheck/clmigcheck.log to
/tmp/clmigcheck/clmigcheck.log.bak
rshexec: cannot connect to node munich
ERROR: Internode communication failed,

check the clcomd.log file for more information.
#
Figure 7-73 The clmigcheck execution error message
Attention: Do not start the clcomd subsystem manually. Starting this system manually
can result in further errors, which might require you to re-install this node or all the
cluster nodes.
16.Install PowerHA only on this node in the same way as you did on the first node. See step 8
on page 199.
17.As on the first node, add the host names of your cluster nodes to the /etc/cluster/rhosts
file. The names must be the same as the node names.
18.Refresh the clcomd subsystem.
19.Start the cluster services only on the node that you updated.
20.Ensure that the cluster services started successfully on this node.
21.If you have more than two nodes in you cluster, repeat step 15 on page 199 through step
20 until all of your cluster nodes are updated.
You now have a fully running cluster environment. Before going into production mode, test
your cluster as explained in Chapter 9, “Testing the PowerHA 7.1 cluster” on page 259.
Upon checking the topology information by using the cltopinfo command, all non-IP and disk
heartbeat networks should be removed. If these networks are not removed, see Chapter 10,
“Troubleshooting PowerHA 7.1” on page 305.
When checking the RSCT subsystems, the topology subsystem should now be inactive as
# lssrc -a | grep svcs

grpsvcs grpsvcs 6684834 active
emsvcs emsvcs 5898390 active
topsvcs topsvcs inoperative
grpglsm grpsvcs inoperative
emaixos emsvcs inoperative
Figure 7-74 Checking for topology service

8
Chapter 8. Monitoring a PowerHA

SystemMirror 7.1 for AIX cluster
Monitoring plays an important role in managing issues when a cluster has duplicated
hardware that can “hide” the failing components from the user. It is also essential for tracking
the behavior of a cluster and helping to address performance issues or bad design
implementations.
The role of the administrator is to quickly find relevant information and analyze it to make the
best decision in every situation. This chapter provides several examples that show how the
PowerHA 7.1 administrator can gather information about the cluster by using several
methods.
For most of the examples in this chapter, the korea cluster from the test environment is used
with the participating seoul and busan nodes. All the commands in the examples are executed
as root user.

Collecting information before a cluster is configured
Collecting information after a cluster is configured
Collecting information after a cluster is running

8.1 Collecting information before a cluster is configured
Before you configure the cluster, you must collect the relevant information. Later, the
administrator can use this information to see the changes that have been made after a
configured IBM PowerHA SystemMirror 7.1 for AIX cluster is running. Ensure that this
information is available to assist in troubleshooting and diagnosing the cluster in the future.
This topic lists the relevant information that you might want to collect.
The /etc/hosts file

The /etc/hosts file must have all the IP addresses that are used in the cluster configuration,
including the boot or base addresses, persistent addresses, and service addresses, as
shown in Example 8-1.
Example 8-1 A /etc/hosts sample configuration

seoul, busan:/ # egrep "seoul|busan|poksap" /etc/hosts
192.168.101.143 seoul-b1 # Boot IP label 1
192.168.101.144 busan-b1 # Boot IP label 1
192.168.201.143 seoul-b2 # Boot IP label 2
192.168.201.144 busan-b2 # Boot IP label 2
10.168.101.43 seoul # Persistent IP
10.168.101.44 busan # Persistent IP
10.168.101.143 poksap-db # Service IP label
The /etc/cluster/rhosts file

The /etc/cluster/rhosts file (Example 8-2) in PowerHA 7.1 replaces the
/usr/es/sbin/cluster/etc/rhosts file. This file is populated with the communication paths
used at the moment of the nodes definition.
Example 8-2 A /etc/cluster/rhosts sample configuration

seoul, busan:/ # cat /etc/cluster/rhosts
seoul # Persistent IP address used as communication path
busan # Persistent IP address used as communication path
CAA subsystems
Cluster Aware AIX (CAA) introduces a new set of subsystems. When the cluster is not
running, its status is inactive, except for the clcomd subsystem, which is active (Example 8-3).
The clcomdES subsystem has been replaced by the clcomd subsystem and is no longer part of
the cluster subsystems group. It is now part of the AIX Base Operating System (BOS), not
PowerHA.
Example 8-3 CAA subsystems status

seoul, busan:/ # lssrc -a | grep caa
cld caa inoperative
clconfd caa inoperative
busan:/ # lslpp -w /usr/sbin/clcomd

File Fileset Type
----------------------------------------------------------------------------
/usr/sbin/clcomd bos.cluster.rte File

PowerHA groups
IBM PowerHA 7.1 creates two operating system groups during installation. The group
numbers must be consistent across cluster nodes as shown in Example 8-4.
Example 8-4 Groups created while installing PowerHA file sets

seoul, busan:/ # grep ha /etc/group
hacmp:!:202:
haemrm:!:203:
Disk configuration
With the current code level in AIX 7.1.0.1, the CAA repository cannot be created over virtual
SCSI (VSCSI) disks. For the korea cluster, a DS4800 storage system is used and is accessed
over N_Port ID Virtualization (NPIV). The rootvg volume group is the only one using VSCSI
devices. Example 8-5 shows a list of storage disks.
Example 8-5 Storage disks listing

seoul:/ # lspv
hdisk1 00c0f6a077839da7 None
hdisk2 00c0f6a0107734ea None
hdisk3 00c0f6a010773532 None
busan:/ # lspv
hdisk1 00c0f6a077839da7 None
hdisk2 00c0f6a0107734ea None
hdisk3 00c0f6a010773532 None
seoul, busan:/ # lsdev -Cc disk

hdisk0 Available Virtual SCSI Disk Drive
hdisk1 Available C5-T1-01 MPIO Other DS4K Array Disk
Network interfaces configuration

The boot or base address is configured as the initial address for each network interface. The
future persistent IP address is aliased over the en0 interface in each node before the
PowerHA cluster configuration. Example 8-6 shows a configuration of the network interfaces.
Example 8-6 Network interfaces configuration

seoul:/ # ifconfig -a
en0:
flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT
,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 192.168.101.143 netmask 0xffffff00 broadcast 192.168.101.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
en2:
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 203

lo0:
flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LAR
GESEND,CHAIN>
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1%1/0
busan:/ # ifconfig -a
en0: en0:
en2:
lo0:
GESEND,CHAIN>
inet6 ::1%1/0
Routing table
Keeping the routing table is an important source of information. As shown in 8.3.1, “AIX
commands and log files” on page 216, the multicast address is not displayed in this table,
even when the CAA and IBM PowerHA clusters are running. Example 8-7 shows the routing
table for the seoul node.
Example 8-7 Routing table

seoul:/ # netstat -rn
Routing tables
Destination Gateway Flags Refs Use If Exp Groups
Route tree for Protocol Family 2 (Internet):

default 192.168.100.60 UG 1 3489 en0 - -
10.168.100.0 10.168.101.43 UHSb 0 0 en0 - - =>
10.168.100/22 10.168.101.43 U 10 39006 en0 - -
10.168.101.43 127.0.0.1 UGHS 11 24356 lo0 - -
10.168.103.255 10.168.101.43 UHSb 0 0 en0 - -
127/8 127.0.0.1 U 12 10746 lo0 - -
192.168.100.0 192.168.101.143 UHSb 0 0 en0 - - =>
192.168.100/22 192.168.101.143 U 2 1057 en0 - -
192.168.101.143 127.0.0.1 UGHS 0 16 lo0 - -
192.168.103.255 192.168.101.143 UHSb 0 39 en0 - -
192.168.200.0 192.168.201.143 UHSb 0 0 en2 - - =>
192.168.200/22 192.168.201.143 U 0 2 en2 - -
192.168.201.143 127.0.0.1 UGHS 0 4 lo0 - -
192.168.203.255 192.168.201.143 UHSb 0 0 en2 - -

Route tree for Protocol Family 24 (Internet v6):
::1%1 ::1%1 UH 3 17903 lo0 - -
Multicast information
You can use the netstat command to display information about an interface for which
multicast is enabled. As shown in Example 8-8 for en0, no multicast address is configured,
other than the default 224.0.0.1 address before the cluster is configured.
Example 8-8 Multicast information

seoul:/ # netstat -a -I en0
en0 1500 link#2 a2.4e.50.54.31.3 304248 0 60964 0 0
01:00:5e:7f:ff:fd
01:00:5e:00:00:01
en0 1500 192.168.100 seoul-b1 304248 0 60964 0 0
239.255.255.253
224.0.0.1
en0 1500 10.168.100 seoul 304248 0 60964 0 0
239.255.255.253
224.0.0.1
Status of the IBM Systems Director common agent subsystems

The two subsystems must be active in every node to be discovered and managed by IBM
Systems Director as shown in Example 8-9. To monitor the cluster using the IBM Systems
Director web and command-line interfaces (CLIs), see 8.3, “Collecting information after a
cluster is running” on page 216.
Example 8-9 Common agent subsystems status

seoul:/ # lssrc -a | egrep "cim|platform"
platform_agent 2359482 active
cimsys 3211362 active
busan:/ # lssrc -a | egrep "cim|platform"

Cluster status
Before a cluster is configured, the state of every node is NOT_CONFIGURED as shown in
Example 8-10.
Example 8-10 PowerHA cluster status

seoul:/ # lssrc -g cluster

Current state: NOT_CONFIGURED
2010-08-19T1
0:34:17-05:00$"

busan:/ # lssrc -g cluster

Current state: NOT_CONFIGURED
2010-08-19T1
0:34:17-05:00$"
Modifications in the /etc/syslogd.conf file

During the installation of the PowerHA 7.1 file sets, entries are added to the
/etc/syslogd.conf configuration file as shown in Example 8-11.
Example 8-11 Modifications to the /etc/syslogd.conf file

# PowerHA SystemMirror Critical Messages
local0.crit /dev/console
# PowerHA SystemMirror Informational Messages
local0.info /var/hacmp/adm/cluster.log
# PowerHA SystemMirror Messages from Cluster Scripts
user.notice /var/hacmp/adm/cluster.log
# PowerHA SystemMirror Messages from Cluster Daemons
daemon.notice /var/hacmp/adm/cluster.log
Lines added to the /etc/inittab file

In PowerHA 7.1, the clcomd subsystem has a separate entry in the /etc/inittab file because
the clcomd subsystem is no longer part of the cluster subsystem group. Two entries now exist
as shown in Example 8-12.
Example 8-12 Modification to the /etc/inittab file

clcomd:23456789:once:/usr/bin/startsrc -s clcomd
hacmp:2:once:/usr/es/sbin/cluster/etc/rc.init >/dev/console 2>&1
8.2 Collecting information after a cluster is configured

After the configuration is done and the first cluster synchronization is performed, the CAA
services become available. Also, the administrator can start using the clcmd utility that
distributes every command passed as an argument to all the cluster nodes.
As soon as the configuration is synchronized to all nodes and the CAA cluster is created, the
administrator cannot change the cluster name or the cluster multicast address.
Changing the repository disk: The administrator can change the repository disk with the
procedure for replacing a repository disk provided in the PowerHA 7.1 Release Notes.

Disk configuration
During the first successful synchronization, the CAA repository is created over the chosen
disk. In each node, the hdisk device is renamed according to the new cluster unified
nomenclature. Is name changes to caa_private0. The repository volume group is called
caavg_private and is in active state in every node.
After the first synchronization, two other disks are added in the cluster storage by using the
following command:
chcluster -n korea -d+hdisk2,hdisk3
where hdisk2 is renamed to cldisk2, and hdisk3 is renamed to cldisk1. Example 8-13
shows the resulting disk listing.
Example 8-13 Disk listing

seoul:/ # clcmd lspv
-------------------------------
NODE seoul
-------------------------------
caa_private0 00c0f6a077839da7 caavg_private active
cldisk2 00c0f6a0107734ea None
cldisk1 00c0f6a010773532 None
-------------------------------
NODE busan
-------------------------------
cldisk2 00c0f6a0107734ea None
cldisk1 00c0f6a010773532 None
Attention: The cluster repository disk is a special device for the cluster. The use of Logical
Volume Manager (LVM) commands over the repository disk is not supported. AIX LVM
commands are single node commands and are not intended for use in a clustered
configuration.
Compared with the multicast information collected when the cluster was not configured, the
netstat command now shows the 228.168.101.43 address in the table (Example 8-14).

en0 1500 link#2 a2.4e.50.54.31.3 70339 0 44686 0 0
01:00:5e:28:65:2b
01:00:5e:7f:ff:fd
01:00:5e:00:00:01
en0 1500 192.168.100 seoul-b1 70339 0 44686 0 0
228.168.101.43
239.255.255.253
224.0.0.1

en0 1500 10.168.100 seoul 70339 0 44686 0 0
228.168.101.43
239.255.255.253
224.0.0.1
Cluster status
The cluster status changes from NOT_CONFIGURED to ST_INIT as shown in Example 8-15.

2010-08-19T1
0:34:17-05:00$"
CAA subsystem group active

All the CAA subsystems become active after the first cluster synchronization as shown in
Example 8-16.
Example 8-16 CAA subsystems status

seoul:/ # clcmd lssrc -g caa
-------------------------------
NODE seoul
-------------------------------
-------------------------------
NODE busan
-------------------------------
Subsystem guide:
cld determines whether the local node must become the primary or secondary solidDB
server in a failover.
The solid subsystem is the database engine.
The solidhac subsystem is used for the high availability of the solidDB server.
The clconfd subsystem runs every 10 minutes to put any missed cluster configuration
changes into effect on the local node.

Cluster information using the lscluster command
CAA comes with a set of command-line tools, as explained in the following sections, that can
be used to monitor the status and statistics of a running cluster. For more information about
CAA and its functionalities, see Chapter 2, “Features of PowerHA SystemMirror 7.1” on
page 23.
Listing the cluster configuration: -c flag

Example 8-17 shows the cluster configuration by using the lscluster -c command.
Example 8-17 Listing the cluster configuration

seoul:/ # lscluster -c
Cluster query for cluster korea returns:
Cluster uuid: a01f47fe-d089-11df-95b5-a24e50543103
Cluster id for node busan is 1
Primary IP address for node busan is 10.168.101.44
Cluster id for node seoul is 2
Primary IP address for node seoul is 10.168.101.43
for disk cldisk1 UUID = fe1e9f03-005b-3191-a3ee-4834944fcdeb
cluster_major = 0 cluster_minor = 1
for disk cldisk2 UUID = 428e30e8-657d-8053-d70e-c2f4b75999e2
cluster_major = 0 cluster_minor = 2
Tip: The primary IP address shown for each node is the IP address chosen as the
communication path during cluster definition. In this case, the address is the same IP
address that is used as the persistent IP address.
The multicast address, when not specified by the administrator during cluster creation, is
composed by the number 228 followed by the last three octets of the communication path
from the node where the synchronization is executed. In this particular example, the
synchronization was run from the seoul node that has the communication path
192.168.101.43. Therefore, the multicast address for the cluster becomes 228.168.101.43
as can be observed in the output of lscluster -c command.
The nodes can use IPv6, but at least one of the interfaces in each node must be configured
with IPv4 to enable the CAA cluster multicasting.
Listing the cluster nodes configuration: -m flag

The -m flag has a different output in each node. In the output shown in Example 8-18, clcmd is
used to distribute the command over all cluster nodes.
Example 8-18 Listing the cluster nodes configuration

seoul:/ # clcmd lscluster -m
-------------------------------
NODE seoul
-------------------------------
Node name: busan


uuid for node: e356646e-c0dd-11df-b51d-a24e57e18a03
State of node: UP
korea local a01f47fe-d089-11df-95b5-a24e50543103

en2 UP
en0 UP
------------------------------
Node name: seoul
uuid for node: 4f8858be-c0dd-11df-930a-a24e50543103

n/a
-------------------------------
NODE busan
-------------------------------
Node name: busan

uuid for node: e356646e-c0dd-11df-b51d-a24e57e18a03

n/a
------------------------------
Node name: seoul
uuid for node: 4f8858be-c0dd-11df-930a-a24e50543103
State of node: UP


en2 UP
en0 UP
Zone: Example 8-18 on page 209 mentions zones. A zone is a concept that is planned for
use in future versions of CAA, where the node can be part of different groups of machines.
Listing the cluster interfaces: -i flag

The korea cluster is configured with NPIV through the VIOS. To have SAN heartbeating, you
must direct SAN connection through Fibre Channel (FC) adapters. In Example 8-19, a cluster
with such requirements has been used to demonstrate the output.
Example 8-19 Listing the cluster interfaces

Cluster Name: au_cl

Cluster uuid: 0252a470-c216-11df-b85d-6a888564f202
Node sydney
Node uuid = a6ac83d4-c1d4-11df-8953-6a888564f202
Mac address = 6a.88.85.64.f2.2
Interface state UP
255.255.255.0
netmask 0.0.0.0
Mac address = 6a.88.85.64.f2.4

Interface state UP
255.255.255.0
netmask 0.0.0.0
Interface state UP
Node perth
Node uuid = c89d962c-c1d4-11df-aa87-6a888dd67502
Mac address = 6a.88.8d.d6.75.2
Interface state UP
255.255.255.0
netmask 0.0.0.0
Mac address = 6a.88.8d.d6.75.4

Interface state UP
255.255.255.0
netmask 0.0.0.0
Interface state UP
rtt: The round-trip time (rtt) is calculated by using a mean deviation formula. Some
commands show rrt instead of rtt, which is believed to be a typographic error in the
command.
sfwcom: Storage Framework Communication (sfwcom) is the interface created by CAA for
SAN heartbeating. To enable sfwcom, the following prerequisites must be in place:
Each node must have either a 4 GB or 8 GB FC adapter. If you are using vSCSI or
NPIV, VIOS 2.2.0.11-FP24 SP01 is the minimum level required.
The adapters used for SAN heartbeating must have the tme (target mode enabled)
parameter set to yes. The Fibre Channel controller must have the parameter dyntrk set
to yes, and the parameter fc_err_recov set to fast_fail.
All the adapters participating in the heartbeating must be in the same fabric zone. In the
previous example, sydney-fcs0 and perth-fcs0 are in the same fabric zone;
sydney-fcs1 and perth-fcs1 are in the same fabric zone.
dpcomm: The dpcomm interface is the actual repository disk. It means that, on top of the
Ethernet and the Fibre Channel adapters, the cluster also uses the repository disk as a
physical media to exchange heartbeats among the nodes.
Excluding configured interfaces: Currently you cannot exclude configured interfaces

from being used for cluster monitoring and communication. All network interfaces are used
for cluster monitoring and communication.

Listing the cluster storage interfaces: -d flag
Example 8-20 shows all storage disks that are participating in the cluster, including the
repository disk.
Example 8-20 Listing cluster storage interfaces

seoul:/ # clcmd lscluster -d
-------------------------------
NODE seoul
-------------------------------
Cluster Name: korea

Node seoul
Node uuid = 4f8858be-c0dd-11df-930a-a24e50543103
cldisk2
state : UP
uDid : 3E213600A0B8000114632000009554C8E0B010F1815 FAStT03IBMfcp
uUid : 428e30e8-657d-8053-d70e-c2f4b75999e2
type : CLUSDISK
cldisk1
state : UP
uDid : 3E213600A0B8000291B080000E90C05B0CD4B0F1815 FAStT03IBMfcp
uUid : fe1e9f03-005b-3191-a3ee-4834944fcdeb
type : CLUSDISK
caa_private0
state : UP
uDid :
uUid : 03e41dc1-3b8d-c422-3426-f1f61c567cda
type : REPDISK
Node seoul
cldisk2
state : UP
uUid : 428e30e8-657d-8053-d70e-c2f4b75999e2
type : CLUSDISK
cldisk1
state : UP
type : CLUSDISK
caa_private0
state : UP
uDid :
type : REPDISK
-------------------------------
NODE busan
-------------------------------

Cluster Name: korea

Node busan
Node uuid = e356646e-c0dd-11df-b51d-a24e57e18a03
cldisk1
state : UP
type : CLUSDISK
cldisk2
state : UP
uUid : 428e30e8-657d-8053-d70e-c2f4b75999e2
type : CLUSDISK
caa_private0
state : UP
uDid :
type : REPDISK
Node busan
cldisk1
state : UP
type : CLUSDISK
cldisk2
state : UP
uUid : 428e30e8-657d-8053-d70e-c2f4b75999e2
type : CLUSDISK
caa_private0
state : UP
uDid :
type : REPDISK
Listing the network statistics: -s flag

Example 8-21 shows overall statistics about cluster heartbeating and the gossip protocol
used for nodes communication.
Example 8-21 Listing the network statistics

seoul:/ # lscluster -s
Cluster Statistics:
Cluster Network Statistics:
pkts seen:194312 pkts passed:66305

IP pkts:126210 UDP pkts:127723
gossip pkts sent:22050 gossip pkts recv:64076
cluster address pkts:0 CP pkts:127497
bad transmits:0 bad posts:0
short pkts:0 multicast pkts:127768
cluster wide errors:0 bad pkts:0
dup pkts:3680 pkt fragments:0
fragments queued:0 fragments freed:0
requests dropped:0 pkts routed:0
pkts pulled:0 no memory:0
rxmit requests recv:21 requests found:21
requests missed:0 ooo pkts:2
requests reset sent:0 reset recv:0
requests lnk reset send :0 reset lnk recv:0
rxmit requests sent:5
alive pkts sent:0 alive pkts recv:0
ahafs pkts sent:17 ahafs pkts recv:7
nodedown pkts sent:0 nodedown pkts recv:0
socket pkts sent:733 socket pkts recv:414
cwide pkts sent:230 cwide pkts recv:230
socket pkts no space:0 pkts recv notforhere:0
stale pkts recv:0 other cluster pkts:0
storage pkts sent:1 storage pkts recv:1
out-of-range pkts recv:0
8.3 Collecting information after a cluster is running

Up to this point, all the examples in this chapter collected information about a non-running
PowerHA 7.1 cluster. This section explains how to obtain valuable information from a
configured and running cluster.
WebSMIT: WebSMIT is no longer a supported tool.
8.3.1 AIX commands and log files

AIX 7.1, which is used in the korea cluster, provides a set of tools that can be used to collect
relevant information about the cluster, cluster services, and cluster device status. This section
shows examples of that type of information.
Disk configuration
All the volume groups controlled by a resource group are shown as concurrent on both sides
Example 8-22 Listing disks

seoul:/ # clcmd lspv
-------------------------------
NODE seoul
-------------------------------
cldisk2 00c0f6a0107734ea pokvg concurrent

cldisk1 00c0f6a010773532 pokvg concurrent
-------------------------------
NODE busan
-------------------------------
cldisk2 00c0f6a0107734ea pokvg concurrent
cldisk1 00c0f6a010773532 pokvg concurrent
When compared with the multicast information collected when the cluster is not configured,
the netstat command shows that the 228.168.101.43 address is present in the table. See
Example 8-23.

en0 1500 link#2 a2.4e.50.54.31.3 82472 0 53528 0 0
01:00:5e:28:65:2b
01:00:5e:7f:ff:fd
01:00:5e:00:00:01
en0 1500 192.168.100 seoul-b1 82472 0 53528 0 0
228.168.101.43
239.255.255.253
224.0.0.1
en0 1500 10.168.100 seoul 82472 0 53528 0 0
228.168.101.43
239.255.255.253
224.0.0.1

en2 1500 link#3 a2.4e.50.54.31.7 44673 0 22119 0 0
01:00:5e:7f:ff:fd
01:00:5e:28:65:2b
01:00:5e:00:00:01
en2 1500 192.168.200 seoul-b2 44673 0 22119 0 0
239.255.255.253
228.168.101.43
224.0.0.1
en2 1500 10.168.100 poksap-db 44673 0 22119 0 0
239.255.255.253
228.168.101.43
224.0.0.1

Status of the cluster
When the PowerHA cluster is running, its status changes from ST_INIT to ST_STABLE as

Current state: ST_STABLE
2010-08-19T1
0:34:17-05:00$"
i_local_nodeid 1, i_local_siteid -1, my_handle 2
ml_idx[1]=0 ml_idx[2]=1
There are 0 events on the Ibcast queue
There are 0 events on the RM Ibcast queue
CLversion: 12 # Note: Version 12 represents PowerHA SystemMirror 7.1
local node vrmf is 7101
cluster fix level is "1"
The following timer(s) are currently active:
Current DNP values
DNP Values for NodeId - 1 NodeName - busan
PgSpFree = 1308144 PvPctBusy = 0 PctTotalTimeIdle = 98.105654
DNP Values for NodeId - 2 NodeName - seoul
PgSpFree = 1307899 PvPctBusy = 0 PctTotalTimeIdle = 96.912367
Group Services information

Previous versions of PowerHA use the grpsvcs subsystem. PowerHA 7.1 uses the cthags
subsystem. The output of the lssrc -ls cthags command has similar information to what
used to be presented by the lssrc -ls grpsvcs command. Example 8-25 shows this output.
Example 8-25 Output of the lssrc -ls cthags command

seoul:/ # lssrc -ls cthags
5 locally-connected clients. Their PIDs:
6160578(IBM.ConfigRMd) 1966256(rmcd) 3604708(IBM.StorageRMd) 7078046(clstrmgr)
14680286(gsclvmd)
HA Group Services domain information:
Domain established by node 1
Number of groups known locally: 8
Number of Number of local
Group name providers providers/subscribers
rmc_peers 2 1 0
s00O3RA00009G0000015CDBQGFL 2 1 0
IBM.ConfigRM 2 1 0
IBM.StorageRM.v1 2 1 0
CLRESMGRD_1108531106 2 1 0
CLRESMGRDNPD_1108531106 2 1 0
CLSTRMGR_1108531106 2 1 0
d00O3RA00009G0000015CDBQGFL 2 1 0
Critical clients will be terminated if unresponsive

Network configuration and routing table
The service IP address is added to an interface on the node where the resource group is
started. The routing table also keeps the service IP address. The multicast address is not
displayed in the routing table. See Example 8-26.
Example 8-26 Network configuration and routing table

seoul:/ # clcmd ifconfig -a
-------------------------------
NODE seoul
-------------------------------
en0: en0:
en2:
lo0:
GESEND,CHAIN>
inet6 ::1%1/0
-------------------------------
NODE busan
-------------------------------
en0:
en2:
lo0:
GESEND,CHAIN>
inet6 ::1%1/0
seoul:/ # clcmd netstat -rn

-------------------------------
NODE seoul
-------------------------------
Routing tables

default 192.168.100.60 UG 1 4187 en0 - -
10.168.100.0 10.168.101.43 UHSb 0 0 en0 - - =>
10.168.100.0 10.168.101.143 UHSb 0 0 en2 - - =>
10.168.100/22 10.168.101.43 U 7 56800 en0 - - =>
10.168.100/22 10.168.101.143 U 3 1770 en2 - -
10.168.101.43 127.0.0.1 UGHS 10 33041 lo0 - -
10.168.101.143 127.0.0.1 UGHS 1 72 lo0 - -
10.168.103.255 10.168.101.43 UHSb 0 0 en0 - - =>
10.168.103.255 10.168.101.143 UHSb 0 0 en2 - -
127/8 127.0.0.1 U 15 16316 lo0 - -
192.168.100.0 192.168.101.143 UHSb 0 0 en0 - - =>
192.168.100/22 192.168.101.143 U 2 1201 en0 - -
192.168.101.143 127.0.0.1 UGHS 0 18 lo0 - -
192.168.103.255 192.168.101.143 UHSb 0 43 en0 - -
192.168.200.0 192.168.201.143 UHSb 0 0 en2 - - =>
192.168.200/22 192.168.201.143 U 0 2 en2 - -
192.168.201.143 127.0.0.1 UGHS 0 4 lo0 - -
192.168.203.255 192.168.201.143 UHSb 0 0 en2 - -

::1%1 ::1%1 UH 2 4180 lo0 - -
-------------------------------
NODE busan
-------------------------------
Routing tables

default 192.168.100.60 UG 1 2012 en0 - -
10.168.100.0 10.168.101.44 UHSb 0 0 en0 - - =>
10.168.100/22 10.168.101.44 U 23 54052 en0 - -
10.168.101.44 127.0.0.1 UGHS 10 5706 lo0 - -
10.168.103.255 10.168.101.44 UHSb 0 0 en0 - -
127/8 127.0.0.1 U 19 3803 lo0 - -
192.168.100.0 192.168.101.144 UHSb 0 0 en0 - - =>
192.168.100/22 192.168.101.144 U 3 1953 en0 - -
192.168.101.144 127.0.0.1 UGHS 0 14 lo0 - -
192.168.103.255 192.168.101.144 UHSb 2 27 en0 - -
192.168.200.0 192.168.201.144 UHSb 0 0 en2 - - =>
192.168.200/22 192.168.201.144 U 0 2 en2 - -
192.168.201.144 127.0.0.1 UGHS 0 4 lo0 - -
192.168.203.255 192.168.201.144 UHSb 0 0 en2 - -

::1%1 ::1%1 UH 6 876 lo0 - -
Using tcpdump, iptrace, and mping utilities to monitor multicast traffic

With the introduction of the multicast address and the gossip protocol, the cluster
administrator can use tools to monitor Ethernet heartbeating. The following sections explain
how to use the native tcpdump, iptrace, and mping native AIX tools for this type of monitoring.

The tcpdump utility
You can dump all the traffic between the seoul node and the multicast address
228.168.101.43 by using the tcpdump utility. Observe that the UDP packets originate in the
base or boot addresses of the interfaces, not in the persistent or service IP labels.
Example 8-27 shows how to list the available interfaces and then capture traffic for the en2
interface.
Example 8-27 Multicast packet monitoring for the seoul node using the tcpdump utility
seoul:/ # tcpdump -D
1.en0
2.en2
3.lo0
seoul:/ # tcpdump -t -i2 -v ip and host 228.168.101.43

tcpdump: listening on en0, link-type 1, capture size 96 bytes
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP (17), length: 1478)
seoul-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x02 ttl 32, id 0, offset 0, flags [none], proto: UDP (17), length: 1478)
The same information is captured on the busan node as shown in Example 8-28.
Example 8-28 Multicast packet monitoring for the busan node using the tcpdump utility
busan:/tmp # tcpdump -D
1.en0
2.en2
3.lo0
busan:/ # tcpdump -t -i2 -v ip and host 228.168.101.43

busan-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450

You can also see the multicast traffic for all the PowerHA 7.1 clusters in your LAN segment.
The following command generates the output:
seoul:/ # tcpdump -n -vvv port drmsfsd
The iptrace utility

The iptrace utility provides a more detailed packet tracing information compared to the
tcpdump utility. Both the en0 (MAC address A24E50543103) and en2 (MAC address
A24E50543107) interfaces are generating packets toward the cluster multicast address
228.168.101.43 as shown in Example 8-29.
Example 8-29 The iptrace utility for monitoring multicast packets

seoul:/tmp # iptrace -a -s 228.168.101.43 -b korea_cluster.log; sleep 30
[10289364]
seoul:/tmp # kill -9 10289364
seoul:/tmp # /usr/sbin/ipreport korea_cluster.log | more

IPTRACE version: 2.0
====( 1492 bytes transmitted on interface en0 )==== 12:49:17.384871427
ETHERNET packet : [ a2:4e:50:54:31:03 -> 01:00:5e:28:65:2b ] type 800 (IP)
IP header breakdown:
< SRC = 192.168.101.143 > (seoul-b1)
< DST = 228.168.101.43 >
ip_v=4, ip_hl=20, ip_tos=0, ip_len=1478, ip_id=0, ip_off=0
ip_ttl=32, ip_sum=251c, ip_p = 17 (UDP)
UDP header breakdown:
<source port=4098(drmsfsd), <destination port=4098(drmsfsd) >
[ udp length = 1458 | udp checksum = 0 ]
00000000 00000009 100234c8 00000030 00000000 |......4....0....|
00000010 1be40fb0 c19311df 920ca24e 50543103 |...........NPT1.|
********
00000030 ffffffff ffffffff ffffffff ffffffff |................|
00000040 00001575 00000000 00000000 00000000 |...u............|
00000050 00000000 00000003 00000000 00000000 |................|
00000060 00000000 00000000 00020001 00020fb0 |................|
00000070 c19311df 1be40fb0 c19311df 920ca24e |...............N|
00000080 50543103 0000147d 00000000 4f8858be |PT1....}....O.X.|
00000090 c0dd11df 930aa24e 50543103 00000000 |.......NPT1.....|
000000a0 00000000 00000000 00000000 00000000 |................|
********
000005a0 00000000 00000000 0000 |.......... |

< SRC = 192.168.101.143 > (seoul-b1)
< DST = 228.168.101.43 >
ip_ttl=32, ip_sum=251c, ip_p = 17 (UDP)

00000000 00000004 10021002 00000070 00000000 |...........p....|
00000010 1be40fb0 c19311df 920ca24e 50543103 |...........NPT1.|
********
00000040 00001575 00000000 00000000 00000000 |...u............|
00000050 f1000815 b002b8a0 00000000 00000000 |................|
00000060 00000000 00000000 0002ffff 00010000 |................|
00000070 00000000 00000000 00000000 00000000 |................|
00000080 00000000 00000d7a 00000000 00000000 |.......z........|
00000090 00000000 00000000 00000000 00000000 |................|
000000a0 00000000 00020000 00000000 00000000 |................|
000000b0 00000000 00000000 00000000 00001575 |...............u|
000000c0 00000001 4f8858be c0dd11df 930aa24e |....O.X........N|
000000d0 50543103 00000001 00000000 00000000 |PT1.............|
000000e0 00000000 00000000 00000000 00000000 |................|
********
000005a0 00000000 00000000 0000 |.......... |

< SRC = 192.168.201.143 > (seoul-b2)
< DST = 228.168.101.43 >
ip_ttl=32, ip_sum=c11b, ip_p = 17 (UDP)
00000000 00000009 100234c8 00000030 00000000 |......4....0....|
00000010 a01f47fe d08911df 95b5a24e 50543103 |..G........NPT1.|
********
00000040 00000fab 00000000 00000000 00000000 |................|
00000050 00000000 00000003 00000000 00000000 |................|
00000060 00000000 00000000 00020001 000247fe |..............G.|
00000070 d08911df a01f47fe d08911df 95b5a24e |......G........N|
00000080 50543103 000014b4 00000000 4f8858be |PT1.........O.X.|
00000090 c0dd11df 930aa24e 50543103 00000000 |.......NPT1.....|
000000a0 00000000 00000000 00000000 00000000 |................|
********
000005a0 00000000 00000000 0000 |.......... |
.
.
.
Tip: You can observer the multicast address in the last line of the lscluster -c CAA
command.

The mping utility
You can also use the mping utility to test the multicast connectivity. One node acts as a sender
of packets, and the other node acts as a receiver of packets. You trigger the command on
both nodes at the same time as shown in Example 8-30.
Example 8-30 Using the mping utility to test multicast connectivity

seoul:/ # mping -v -s -a 228.168.101.43
mping version 1.0
Localhost is seoul, 10.168.101.43
mpinging 228.168.101.43/4098 with ttl=32:
32 bytes from 10.168.101.44: seqno=1 ttl=32 time=0.260 ms
busan:/ # mping -v -r -a 228.168.101.43

mping version 1.0
Localhost is busan, 10.168.101.44
Listening on 228.168.101.43/4098:
Replying to mping from 10.168.101.43 bytes=32 seqno=1 ttl=32
Discarding receiver packet
8.3.2 CAA commands and log files

This section explains the commands specifically for gathering CAA-related information and
the associated log files.
Cluster information
The CAA comes with a set of command-line tools, as explained in “Cluster information using
the lscluster command” on page 209. These tools can be used to monitor the status and
statistics of a running cluster. For more information about CAA and its functionalities, see
Chapter 2, “Features of PowerHA SystemMirror 7.1” on page 23.

Cluster repository disk, CAA, and solidDB
This section provides additional information about the cluster repository disk, CAA, and
solidDB.
UUID
The UUID of the caa_private0 disk is stored as a cluster0 device attribute as shown in
Example 8-31.
Example 8-31 The cluster0 device attributes

seoul:/ # lsattr -El cluster0
clvdisk 03e41dc1-3b8d-c422-3426-f1f61c567cda Cluster repository disk identifier True
node_uuid 4f8858be-c0dd-11df-930a-a24e50543103 OS image identifier True
Example 8-32 also shows the UUID.
Example 8-32 UUID

caa_private0
state : UP
uDid :
type : REPDISK
The repository disk contains logical volumes for the bootstrap and solidDB file systems as
Example 8-33 Repository logical volumes

seoul:/ # lsvg -l caavg_private
caavg_private:
fslv00 jfs2 4 4 1 closed/syncd /clrepos_private1
fslv01 jfs2 4 4 1 open/syncd /clrepos_private2
Querying the bootstrap repository

Example 8-34 shows the bootstrap repository.
Example 8-34 Querying the bootstrap repository

seoul:/ # /usr/lib/cluster/clras dumprepos
HEADER
CLUSRECID: 0xa9c2d4c2
Name: korea
UUID: a01f47fe-d089-11df-95b5-a24e50543103
SHID: 0x0
Data size: 1536
Checksum: 0xc197
Num zones: 0
Dbpass: a0305b84_d089_11df_95b5_a24e50543103
Multicast: 228.168.101.43

DISKS
name devno uuid udid
cldisk1 1 fe1e9f03-005b-3191-a3ee-4834944fcdeb 3E213600A0B8000291B080000E90C05B0CD4B0F1815
FAStT03IBMfcp
cldisk2 2 428e30e8-657d-8053-d70e-c2f4b75999e2 3E213600A0B8000114632000009554C8E0B010F1815
FAStT03IBMfcp
NODES
numcl numz uuid shid name
0 0 4f8858be-c0dd-11df-930a-a24e50543103 2 seoul
0 0 e356646e-c0dd-11df-b51d-a24e57e18a03 1 busan
ZONES
none
The solidDB status

You can use the command shown in Example 8-35 to check which node currently hosts the
active solidDB database.
Example 8-35 The solidDB status

seoul:/ # clcmd /opt/cluster/solidDB/bin/solcon -x pwdfile:/etc/cluster/dbpass -e "hsb state"
"tcp 2188" caa
-------------------------------
NODE seoul
-------------------------------
IBM solidDB Remote Control - Version 6.5.0.0 Build 0010
(c) Solid Information Technology Ltd. 1993, 2009
SECONDARY ACTIVE
-------------------------------
NODE busan
-------------------------------
IBM solidDB Remote Control - Version 6.5.0.0 Build 0010
PRIMARY ACTIVE
Tip: The solidDB database is not necessarily active in the same node where the PowerHA
resource group is active. You can see this difference when comparing Example 8-35 with
the output of the clRGinfo command:
seoul:/ # clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
OFFLINE busan
In this case, the solidDB database has the primary database active in the busan node, and
the PowerHA resource group is currently settled in the seoul node.

Another way to check which node has solidDB active is to use the lssrc command.
Example 8-36 shows that solidDB is active in the seoul node. Observe the line that says
“Group Leader.”
Example 8-36 Using the lssrc command to check where solidDB is active
seoul:/ # lssrc -ls IBM.StorageRM
Subsystem : IBM.StorageRM
PID : 7077950
Cluster Name : korea
Node Number : 2
Daemon start time : 10/05/10 10:06:57
PeerNodes: 2
QuorumNodes: 2
Group IBM.StorageRM.v1:
ConfigVersion: 0x24cab3184
Providers: 2
QuorumMembers: 2
Group Leader: seoul, 0xdc82faf0908920dc, 2
Information from malloc about memory use:

Total Space : 0x00be0280 (12452480)
Allocated Space: 0x007ec198 (8307096)
Unused Space : 0x003ed210 (4117008)
Freeable Space : 0x00000000 (0)
Information about trace levels:

_SEU Errors=255 Info=0 API=0 Buffer=0 SvcTkn=0 CtxTkn=0
_SEL Errors=255 Info=0 API=0 Buffer=0 Perf=0
_SEI Error=0 API=0 Mapping=0 Milestone=0 Diag=0
_SEA Errors=255 Info=0 API=0 Buffer=0 SVCTKN=0 CTXTKN=0
_MCA Errors=255 Info=0 API=0 Callbacks=0 Responses=0 RspPtrs=0
Protocol=0 APItoProto=0 PrototoRsp=0 CommPath=0 Thread=0 ThreadCtrl=0
RawProtocol=0 Signatures=0
_RCA RMAC_SESSION=0 RMAC_COMMANDGROUP=0 RMAC_REQUEST=0 RMAC_RESPONSE=0
RMAC_CALLBACK=0
_CAA Errors=255 Info=0 Debug=0 AUA_Blobs=0 AHAFS_Events=0
_GSA Errors=255 Info=2 GSCL=0 Debug=0
_SRA API=0 Errors=255 Wherever=0
_RMA Errors=255 Info=0 API=0 Thread=0 Method=0 Object=0
Protocol=0 Work=0 CommPath=0
_SKD Errors=255 Info=0 Debug=0
_SDK Errors=255 Info=0 Exceptions=0
_RMF Errors=255 Info=2 Debug=0
_STG Errors=255 Info=1 Event=1 Debug=0
/var/ct/2W7qV~q8aHtvMreavGL343/log/mc/IBM.StorageRM/trace -> spooling not enabled

Using the solidDB SQL interface
You can also retrieve some information shown by the lscluster command by using the
solidDB SQL interface as shown in Example 8-37 and Example 8-38 on page 229.
Example 8-37 The solidDB SQL interface (view from left side of code)
seoul:/ # /opt/cluster/solidDB/bin/solsql -x pwdfile:/etc/cluster/dbpass "tcp 2188" caa
IBM solidDB SQL Editor (teletype) - Version: 6.5.0.0 Build 0010

Connected to 'tcp 2188'.
Execute SQL statements terminated by a semicolon.
Exit by giving command: exit;
list schemas;
RESULT
------
Catalog: CAA
SCHEMAS:
--------
CAA
35193956_C193_11DF_A3EA_A24E50543103
36FC3B56_C193_11DF_A29A_A24E50543103
1 rows fetched.
list tables;
RESULT
------
Catalog: CAA
Schema: CAA
TABLES:
-------
CLUSTERS
NODES
REPOSNAMESPACE
REPOSSTORES
SHAREDDISKS
INTERFACES
INTERFACE_ATTRS
PARENT_CHILD
ENTITIES
1 rows fetched.
select * from clusters;
CLUSTER_ID CLUSTER_NAME ETYPE ESUBTYPE GLOB_ID UUID
---------- ------------ ----- -------- ------- ----
1 SIRCOL_UNKNOWN 4294967296 32 4294967297 00000000-0000-0000-0000-000000000000
2 korea 4294967296 32 4294967296 a01f47fe-d089-11df-95b5-a24e50543103
2 rows fetched.
select * from nodes;
NODES_ID NODE_NAME ETYPE ESUBTYPE GLOB_ID UUID

-------- --------- ----- -------- ------- ----
1 busan 8589934592 0 8589934593 e356646e-c0dd-11df-b51d-a24e57e18a03
2 seoul 8589934592 0 85899345944 f8858be-c0dd-11df-930a-a24e50543103
2 rows fetched.
select * from SHAREDDISKS;

SHARED_DISK_ID DISK_NAME ETYPE GLOB_ID UUID
-------------- --------- ----- ------- ----
1 cldisk2 34359738368 34359738370 428e30e8-657d-8053-d70e-c2f4b75999e2
2 cldisk1 34359738368 34359738369 fe1e9f03-005b-3191-a3ee-4834944fcdeb
2 rows fetched.
Example 8-38 Using the solidDB SQL interface (view from right side starting at CLUSTER_ID row)
VERIFIED_STATUS ESTATE VERSION_OPERATING VERSION_CAPABLE MULTICAST
--------------- ------ ----------------- --------------- ---------
NULL 1 1 1 0
NULL 1 1 1 0
VERIFIED_STATUS PARENT_CLUSTER_ID ESTATE VERSION_OPERATING VERSION_CAPABLE

--------------- ----------------- ------ ----------------- ---------------
NULL 2 1 1 1
NULL 2 1 1 1
VERIFIED_STATUS PARENT_CLUSTER_ID ESTATE VERSION_OPERATING VERSION_CAPABLE

--------------- ----------------- ------ ----------------- ---------------
NULL 2 1 1 1
NULL 2 1 1 1
SIRCOL: SIRCOL stands for Storage Interconnected Resource Collection.
The /var/adm/ras/syslog.caa log file

The mkcluster, chcluster and rmcluster commands (and their underlying APIs) use the
syslogd daemon for error logging. The cld and clconfd daemons and the clusterconf
command also use syslogd facility for error logging. For that purpose, when PowerHA 7.1 file
sets are installed, the following line is added to the /etc/syslog.conf file:
*.info /var/adm/ras/syslog.caa rotate size 1m files 10
This file keeps all the logs about CAA activity, including the error outputs from the commands.
Example 8-39 shows an error caught in the /var/adm/ras/syslog.caa file during the cluster
definition. The chosen repository disk has already been part of a repository in the past and
had not been cleaned up.
Example 8-39 Output of the /var/adm/ras/syslog.caa file

Sep 16 08:58:14 seoul user:err|error syslog: validate_device: Specified device,
hdisk1, is a repository.
Sep 16 08:58:14 seoul user:warn|warning syslog: To force cleanup of this disk, use
rmcluster -r hdisk1
# It also keeps track of all PowerHA SystemMirror events. Example:

Sep 16 09:40:40 seoul user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED:
acquire_service_addr 0
rg_move seoul 1 ACQUIRE 0

rg_move_acquire seoul 1 0
Sep 16 09:40:42 seoul user:notice PowerHA SystemMirror for AIX: EVENT START:
rg_move_complete seoul 1
rg_move_complete seoul 1 0
Sep 16 09:40:44 seoul user:notice PowerHA SystemMirror for AIX: EVENT START:
node_up_complete seoul
node_up_complete seoul 0
Tip: To capture debug information, you can replace *.info with *.debug in the
/etc/syslog.conf file, followed by a syslogd daemon refresh. Given that the output in
debug mode provides much information, redirect the syslogd output to a file system other
than /, /var, or /tmp.
The solidDB log files

The solidDB daemons keep log files on file systems over the repository disk in every node
inside the solidDB directory as shown in Example 8-40.
Example 8-40 The solidDB log files and directories

seoul:/ # lsvg -l caavg_private
caavg_private:
fslv00 jfs2 4 4 1 closed/syncd
/clrepos_private1
fslv01 jfs2 4 4 1 open/syncd
/clrepos_private2
seoul:/ # ls -lrt /clrepos_private2

total 8
drwxr-xr-x 2 root system 256 Sep 16 09:05 lost+found
drwxr-xr-x 4 bin bin 4096 Sep 17 14:32 solidDB
seoul:/ # ls -lrt /clrepos_private2/solidDB

total 18608
-r-xr-xr-x 1 root system 650 Feb 20 2010 solid.lic
-r-xr-xr-x 1 root system 5246 Jun 6 18:54 caa.sql
-r-xr-xr-x 1 root system 5975 Aug 7 15:53 solid.ini
d--x------ 2 root system 256 Aug 7 23:10 .sec
-r-x------ 1 root system 322 Sep 17 12:06 solidhac.ini
drwxr-xr-x 2 bin bin 256 Sep 17 12:06 logs
-rw------- 1 root system 8257536 Sep 17 12:06 solid.db
-rw-r--r-- 1 root system 18611 Sep 17 12:06 hacmsg.out
-rw------- 1 root system 1054403 Sep 17 14:32 solmsg.bak
-rw------- 1 root system 166011 Sep 17 15:03 solmsg.out

seoul:/ # ls -lrt /clrepos_private2/solidDB/logs
total 32
-rw------- 1 root system 16384 Sep 17 12:07 sol00002.log
Explanation of file names:

The solid daemon generates the solmsg.out log file.
The solidhac daemon generates the hacmsg.out log file.
The solid.db file is the database itself, and the logs directory contains the database
transaction logs.
The solid.ini files are the configuration files for the solid daemons; the solidhac.ini
files are the configuration files for the solidhac daemons.
Collecting CAA debug information for IBM support

The CAA component is now included in the snap command. The snap -e and clsnap
commands collect all the necessary information for IBM support. The snap command gathers
the following files from each node, compressing them into a .pax file:
LOG lscluster_zones
bootstrap_repository solid_lssrc
clrepos1_solidDB.tar solid_lssrc_S
dbpass solid_select_sys_tables
lscluster_clusters solid_select_tables
lscluster_network_interfaces syslog_caa
lscluster_network_statistics system_proc_version
lscluster_nodes system_uname
lscluster_storage_interfaces
8.3.3 PowerHA 7.1 cluster monitoring tools

PowerHA 7.1 comes with many commands and utilities that an administrator can use to
monitor the cluster. This section explains those tools that are most commonly used.
Using the clstat utility

The clstat utility is the most traditional and most used interactive tool to observe the cluster
status. Before using the clstat utility, you must convert the Simple Network Management
Protocol (SNMP) from version 3 to version 1, if it is not done yet. Example 8-41 shows the
steps and sample outputs.
Example 8-41 Converting SNMP from V3 to V1

seoul:/ # stopsrc -s snmpd
0513-044 The snmpd Subsystem was requested to stop.
seoul:/ # ls -ld /usr/sbin/snmpd

lrwxrwxrwx 1 root system 9 Sep 15 22:17 /usr/sbin/snmpd ->
snmpdv3ne
seoul:/ # /usr/sbin/snmpv3_ssw -1
Stop daemon: snmpmibd
In /etc/rc.tcpip file, comment out the line that contains: snmpmibd
In /etc/rc.tcpip file, remove the comment from the line that contains: dpid2
Make the symbolic link from /usr/sbin/snmpd to /usr/sbin/snmpdv1
Make the symbolic link from /usr/sbin/clsnmp to /usr/sbin/clsnmpne

Start daemon: dpid2

/usr/sbin/snmpdv1
seoul:/ # startsrc -s snmpd

0513-059 The snmpd Subsystem has been started. Subsystem PID is
8126570.
The clstat utility in interactive mode

With the new -i flag, you can now select the cluster ID from a list of available ones as shown
in Example 8-42.
Example 8-42 The clstat command in interactive mode

sydney:/ # clstat -i
-------------------------------------
Number of clusters active: 1
ID Name State
1108531106 korea UP
Select an option:
# - the Cluster ID q- quit
1108531106

-------------------------------------
Cluster: korea (1108531106)
Tue Oct 5 11:01:17 2010
State: UP Nodes: 2
SubState: STABLE
Node: busan State: UP

Interface: busan-b1 (0) Address: 192.168.101.144
State: UP
Interface: busan-b2 (0) Address: 192.168.201.144
State: UP
Node: seoul State: UP

Interface: seoul-b1 (0) Address: 192.168.101.143
State: UP
Interface: seoul-b2 (0) Address: 192.168.201.143
State: UP
Interface: poksap-db (0) Address: 10.168.101.143
State: UP
Resource Group: db2pok_ResourceGroup State: On line

The clstat utility with the -o flag
You can use the clstat utility with the -o flag as shown in Example 8-43. This flag instructs
the utility to run once and then exit. It is useful for scripts and cron jobs.
Example 8-43 The clstat utility with the option to run only once
sydney:/ # clstat -o

-------------------------------------
Cluster: au_cl (1128255334)

Mon Sep 20 10:26:10 2010
State: UP Nodes: 2
SubState: STABLE
Node: perth State: UP

Interface: perth (0) Address: 192.168.101.136
State: UP
Interface: perthb2 (0) Address: 192.168.201.136
State: UP
Interface: perths (0) Address: 10.168.201.136
State: UP
Resource Group: perthrg State: On line
Node: sydney State: UP

Interface: sydney (0) Address: 192.168.101.135
State: UP
Interface: sydneyb2 (0) Address: 192.168.201.135
State: UP
Interface: sydneys (0) Address: 10.168.201.135
State: UP
Resource Group: sydneyrg State: On line
sydney:/ #
Tip: The sfwcom and dpcomm interfaces that are shown with the lscluster -i command are
not shown in output of the clstat utility. The PowerHA 7.1 cluster is unaware of the CAA
cluster that is present at the AIX level.
Using the cldump utility

Another traditional way to observe the cluster status is to use the cldump utility, which also
relies on the SNMP infrastructure as shown in Example 8-44.
Example 8-44 cldump command

seoul:/ # cldump
Obtaining information via SNMP from Node: seoul...
_____________________________________________________________________________
Cluster Name: korea
Cluster State: UP
Cluster Substate: STABLE
_____________________________________________________________________________
Node Name: busan State: UP

Network Name: net_ether_01 State: UP
Address: 192.168.101.144 Label: busan-b1 State: UP

Address: 192.168.201.144 Label: busan-b2 State: UP
Node Name: seoul State: UP
Network Name: net_ether_01 State: UP
Address: 10.168.101.143 Label: poksap-db State: UP

Address: 192.168.101.143 Label: seoul-b1 State: UP
Address: 192.168.201.143 Label: seoul-b2 State: UP
Cluster Name: korea
Resource Group Name: db2pok_ResourceGroup

Startup Policy: Online On Home Node Only
Fallover Policy: Fallover To Next Priority Node In The List
Fallback Policy: Never Fallback
Site Policy: ignore
Node Group State
---------------------------- ---------------
seoul ONLINE
busan OFFLINE
Tools in the /usr/es/sbin/cluster/utilities/ file

The administrator of a running PowerHA 7.1 cluster can use several tools that are provided
with the cluster.es.server.utils file set. These tools are kept in the
/usr/es/sbin/cluster/utilities/ directory. Examples of the tools are provided in the
following sections.
Listing the PowerHA SystemMirror cluster interfaces

Example 8-45 shows the list of interfaces in the cluster using the cllsif command.
Example 8-45 Listing cluster interfaces using the cllsif command

seoul:/ # /usr/es/sbin/cluster/utilities/cllsif
Adapter Type Network Net Type Attribute Node IP
Address Hardware Address Interface Name Global Name Netmask
Alias for HB Prefix Length
busan-b2 boot net_ether_01 ether public busan

192.168.201.144 en2 255.255.255.0 24
192.168.101.144 en0 255.255.255.0 24
poksap-db service net_ether_01 ether public busan
10.168.101.143 255.255.255.0 24
seoul-b1 boot net_ether_01 ether public seoul
192.168.101.143 en0 255.255.255.0 24
192.168.201.143 en2 255.255.255.0 24
poksap-db service net_ether_01 ether public seoul
10.168.101.143 255.255.255.0 24

Listing the whole cluster topology information
Example 8-46 shows the cluster topology information that is generated by using the cllscf
command.
Example 8-46 Cluster topology listing by using the cllscf command

seoul:/ # /usr/es/sbin/cluster/utilities/cllscf
Cluster Name: korea
There were 1 networks defined: net_ether_01
There are 2 nodes in this cluster
NODE busan:
This node has 1 service IP label(s):
Service IP Label poksap-db:

IP address: 10.168.101.143
Hardware Address:
Network: net_ether_01
Attribute: public
Aliased Address?: Enable
Service IP Label poksap-db has 2 communication interfaces.

(Alternate Service) Communication Interface 1: busan-b2
IP Address: 192.168.201.144
Attribute: public
Alias address for heartbeat:

(Alternate Service) Communication Interface 2: busan-b1
IP Address: 192.168.101.144
Attribute: public

Service IP Label poksap-db has no communication interfaces for recovery.
This node has 1 persistent IP label(s):
Persistent IP Label busan:

IP address: 10.168.101.44
NODE seoul:
This node has 1 service IP label(s):
Service IP Label poksap-db:

IP address: 10.168.101.143
Hardware Address:
Attribute: public
Aliased Address?: Enable

Service IP Label poksap-db has 2 communication interfaces.
(Alternate Service) Communication Interface 1: seoul-b1
IP Address: 192.168.101.143
Attribute: public

(Alternate Service) Communication Interface 2: seoul-b2
IP Address: 192.168.201.143
Attribute: public

Service IP Label poksap-db has no communication interfaces for recovery.
This node has 1 persistent IP label(s):
Persistent IP Label seoul:

IP address: 10.168.101.43
Breakdown of network connections:
Connections to network net_ether_01

Node busan is connected to network net_ether_01 by these interfaces:
busan-b2
busan-b1
poksap-db
busan
Node seoul is connected to network net_ether_01 by these interfaces:

seoul-b1
seoul-b2
poksap-db
seoul
Tip: The cltopinfo -m command is used to show the heartbeat rings in the previous
versions of PowerHA. Because this concept no longer applies, the output of the cltopinfo
-m command is empty in PowerHA 7.1.
The PowerHA 7.1 cluster administrator must explore all the utilities in the
/usr/es/sbin/cluster/utilities/ directory in a testing system. Most of the utilities are only
informational tools. Remember to never trigger unknown commands in production systems.
8.3.4 PowerHA ODM classes

Example 8-47 on page 237 provides a comprehensive list of PowerHA Object Data Manager
(ODM) files. Never edit these files directly, unless you are directed by IBM support. However,
you can use the odmget command to grab cluster configuration information directly from these
files as explained in this section.

Example 8-47 PowerHA ODM files
seoul:/etc/es/objrepos # ls HACMP*
HACMPadapter HACMPpprcconsistgrp
HACMPcluster HACMPras
HACMPcommadapter HACMPresource
HACMPcommlink HACMPresourcetype
HACMPcsserver HACMPrg_loc_dependency
HACMPcustom HACMPrgdependency
HACMPdaemons HACMPrresmethods
HACMPdisksubsys HACMPrules
HACMPdisktype HACMPsa
HACMPercmf HACMPsa_metadata
HACMPercmfglobals HACMPsdisksubsys
HACMPevent HACMPserver
HACMPeventmgr HACMPsircol
HACMPfcfile HACMPsite
HACMPfcmodtime HACMPsiteinfo
HACMPfilecollection HACMPsna
HACMPgpfs HACMPsp2
HACMPgroup HACMPspprc
HACMPlogs HACMPsr
HACMPmonitor HACMPsvc
HACMPnetwork HACMPsvcpprc
HACMPnim HACMPsvcrelationship
HACMPnode HACMPtape
HACMPnpp HACMPtc
HACMPoemfilesystem HACMPtimer
HACMPoemfsmethods HACMPtimersvc
HACMPoemvgmethods HACMPtopsvcs
HACMPoemvolumegroup HACMPude
HACMPpager HACMPudres_def
HACMPpairtasks HACMPudresource
HACMPpathtasks HACMPx25
HACMPport HACMPxd_mirror_group
HACMPpprc
Use the odmget command followed by the name of the file in the /etc/es/objrepos directory.
Example 8-48 shows how to retrieve information about the cluster.
Example 8-48 Using the odmget command to grab cluster information

seoul:/ # ls -ld /etc/es/objrepos/HACMPcluster
-rw-r--r-- 1 root hacmp 4096 Sep 17 12:29
/etc/es/objrepos/HACMPcluster
seoul:/ # odmget HACMPcluster
HACMPcluster:
id = 1108531106
name = "korea"
nodename = "seoul"
sec_level = "Standard"
sec_level_msg = ""
sec_encryption = ""
sec_persistent = ""

last_node_ids = ""
highest_node_id = 0
last_network_ids = ""
highest_network_id = 0
last_site_ids = ""
highest_site_id = 0
handle = 2
cluster_version = 12
reserved1 = 0
reserved2 = 0
wlm_subdir = ""
settling_time = 0
rg_distribution_policy = "node"
noautoverification = 0
clvernodename = ""
clverhour = 0
clverstartupoptions = 0
Tip: In previous versions of PowerHA, the ODM HACMPtopsvcs class kept information about
the current instance number for a node. In PowerHA 7.1, this class always has the instance
number 1 (instanceNum = 1 as shown in the following example) because topology services
are not used anymore. This number never changes.
seoul:/ # odmget HACMPtopsvcs
HACMPtopsvcs:
hbInterval = 1
fibrillateCount = 4
runFixedPri = 1
fixedPriLevel = 38
tsLogLength = 5000
gsLogLength = 5000
instanceNum = 1
You can use the HACMPnode ODM class to discover which version of PowerHA is installed as
Example 8-49 Using the odmget command to retrieve the PowerHA version
seoul:/ # odmget HACMPnode | grep version | sort -u
version = 12
The following version numbers and corresponding HACMP/PowerHA release are available:
2: HACMP 4.3.1
3: HACMP 4.4
4: HACMP 4.4.1
5: HACMP 4.5
6: HACMP 5.1
7: HACMP 5.2
8: HACMP 5.3
9: HACMP 5.4
10: PowerHA 5.5
11: PowerHA 6.1
12: PowerHA 7.1

Querying the HACMPnode ODM class is useful during cluster synchronization after a migration,
when PowerHA issues warning messages about mixed versions among the nodes.
If the HACMPtopsvcs ODM class can no longer be used to discover if the configuration must be
synchronized across the nodes, you can query the HACMPcluster ODM class. This class
keeps a numeric attribute called handle. Each node has a different value for this attribute,
ranging from 1 to 32. You can retrieve the handle values by using the odmget or clhandle
commands as shown in Example 8-50.
Example 8-50 Viewing the cluster handles

seoul:/ # clcmd odmget HACMPcluster
-------------------------------
NODE seoul
-------------------------------
HACMPcluster:
id = 1108531106
name = "korea"
nodename = "seoul"
sec_level_msg = ""
sec_encryption = ""
sec_persistent = ""
last_node_ids = ""
highest_node_id = 0
last_site_ids = ""
highest_site_id = 0
handle = 2
reserved1 = 0
reserved2 = 0
wlm_subdir = ""
settling_time = 0
clvernodename = ""
clverhour = 0
-------------------------------
NODE busan
-------------------------------
HACMPcluster:
id = 1108531106
name = "korea"
nodename = "busan"
sec_level_msg = ""
sec_encryption = ""
sec_persistent = ""
last_node_ids = ""
highest_node_id = 0

last_site_ids = ""
highest_site_id = 0
handle = 1
reserved1 = 0
reserved2 = 0
wlm_subdir = ""
settling_time = 0
clvernodename = ""
clverhour = 0
seoul:/ # clcmd clhandle

-------------------------------
NODE seoul
-------------------------------
2 seoul
-------------------------------
NODE busan
-------------------------------
1 busan
When you perform a cluster configuration change in any node, that node receives a numeric
value of 0 over its handle.
Suppose that you want to add a new resource group to the korea cluster and that you make
the change from the seoul node. After you do the modification, and before you synchronize
the cluster, the handle attribute in the HACMPcluster ODM class in the seoul node has a value
of 0 as shown in Example 8-51.
Example 8-51 Handle values after a change, before synchronization

seoul:/ # clcmd odmget HACMPcluster | egrep "NODE|handle"
NODE seoul
handle = 0
NODE busan
handle = 1

-------------------------------
NODE seoul
-------------------------------
0 seoul
-------------------------------
NODE busan
-------------------------------
1 busan

After you synchronize the cluster, the handle goes back to its original value of 2 as shown in
Example 8-52.
Example 8-52 Original handle values after synchronization

seoul:/ # smitty sysmirror  Custom Cluster Configuration  Verify and
Synchronize Cluster Configuration (Advanced)
seoul:/ # clcmd odmget HACMPcluster | egrep "NODE|handle"

NODE seoul
handle = 2
NODE busan
handle = 1

-------------------------------
NODE seoul
-------------------------------
2 seoul
-------------------------------
NODE busan
-------------------------------
1
busan
If you experience a situation where more than one node has a handle with a 0 value, you or
another person might have performed the changes from different nodes. Therefore, you must
decide in which node you want to start the synchronization. As result, the cluster
modifications made on the other nodes are then lost.
8.3.5 PowerHA clmgr utility

The clmgr utility provides a new interface to PowerHA with consistency, usability, and
serviceability. The tool is packed into the cluster.es.server.utils file set as shown in
Example 8-53.
Example 8-53 The clmgr utility file set

seoul:/ # whence clmgr
/usr/es/sbin/cluster/utilities/clmgr
seoul:/ # lslpp -w /usr/es/sbin/cluster/utilities/clmgr

File Fileset Type
----------------------------------------------------------------------------
/usr/es/sbin/cluster/utilities/clmgr
cluster.es.server.utils Hardlink
The clmgr command generates a /var/hacmp/log/clutils.log log file.
The clmgr command supports the actions as listed in 5.2.1, “The clmgr action commands” on
page 104.
For monitoring purposes, you can use the query and view actions. For a list of object classes,
that are available for each action, see 5.2.2, “The clmgr object classes” on page 105.

Example using the query action
Example 8-54 shows the query action on the PowerHA cluster using the clmgr command.
Example 8-54 Query action on the PowerHA cluster using the clmgr command
seoul:/ # clmgr query cluster
CLUSTER_NAME="korea"
CLUSTER_ID="1108531106"
STATE="STABLE"
VERSION="7.1.0.1"
VERSION_NUMBER="12"
EDITION="STANDARD"
CLUSTER_IP=""
REPOSITORY="caa_private0"
SHARED_DISKS="cldisk2,cldisk1"
UNSYNCED_CHANGES="false"
SECURITY="Standard"
FC_SYNC_INTERVAL="10"
RG_SETTLING_TIME="0"
RG_DIST_POLICY="node"
MAX_EVENT_TIME="180"
MAX_RG_PROCESSING_TIME="180"
SITE_POLICY_FAILURE_ACTION="fallover"
SITE_POLICY_NOTIFY_METHOD=""
DAILY_VERIFICATION="Enabled"
VERIFICATION_NODE="Default"
VERIFICATION_HOUR="0"
VERIFICATION_DEBUGGING="Enabled"
LEVEL=""
ALGORITHM=""
GRACE_PERIOD=""
REFRESH=""
MECHANISM=""
CERTIFICATE=""
PRIVATE_KEY=""
seoul:/ # clmgr query interface

busan-b2
busan-b1
poksap-db
seoul-b1
seoul-b2
seoul:/ # clmgr query node

busan
seoul
seoul:/ # clmgr query network

net_ether_01
seoul:/ # clmgr query resource_group


seoul:/ # clmgr query volume_group
caavg_private
pokvg
Tip: Another way to check the PowerHA version is to query the SNMP subsystem as
follows:
seoul:/ # snmpinfo -m dump -v -o /usr/es/sbin/cluster/hacmp.defs
clstrmgrVersion
clstrmgrVersion.1 = "7.1.0.1"
clstrmgrVersion.2 = "7.1.0.1"
Example using the view action

Example 8-55 shows the view action on the PowerHA cluster using the clmgr command.
Example 8-55 Using the view action on the PowerHA cluster using clmgr
seoul:/ # clmgr view report cluster
Cluster: korea
Cluster services: active
State of cluster: up
Substate: stable
#############
APPLICATIONS
#############
Cluster korea provides the following applications: db2pok_ApplicationServer
Application: db2pok_ApplicationServer
db2pok_ApplicationServer is started by
/usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok
db2pok_ApplicationServer is stopped by
/usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok
Application monitors for db2pok_ApplicationServer:
db2pok_SQLMonitor
Monitor name: db2pok_SQLMonitor
Type: custom
Monitor method: user
Monitor interval: 120 seconds
Hung monitor signal: 9
Stabilization interval: 240 seconds
Retry count: 3 tries
Restart interval: 1440 seconds
Failure action: fallover
Cleanup method: /usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok
Restart method: /usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok
Monitor name: db2pok_ProcessMonitor
Type: process
Process monitored: db2sysc
Process owner: db2pok
Instance count: 1
Stabilization interval: 240 seconds
Retry count: 3 tries
Restart interval: 1440 seconds

Failure action: fallover
Cleanup method: /usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok
Restart method: /usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok
This application is part of resource group 'db2pok_ResourceGroup'.
Resource group policies:
Startup: on home node only
Fallover: to next priority node in the list
Fallback: never
State of db2pok_ApplicationServer: online
Nodes configured to provide db2pok_ApplicationServer: seoul {up}
busan {up}
Node currently providing db2pok_ApplicationServer: seoul {up}
The node that will provide db2pok_ApplicationServer if seoul fails
is: busan
Resources associated with db2pok_ApplicationServer:
Service Labels
poksap-db(10.168.101.143) {online}
Interfaces configured to provide poksap-db:
seoul-b1 {up}
with IP address: 192.168.101.143
on interface: en0
on node: seoul {up}
on network: net_ether_01 {up}
seoul-b2 {up}
on interface: en2
on node: seoul {up}
busan-b2 {up}
on interface: en2
on node: busan {up}
busan-b1 {up}
on interface: en0
on node: busan {up}
Shared Volume Groups:
pokvg
#############
TOPOLOGY
#############
korea consists of the following nodes: busan seoul
busan
Network interfaces:
busan-b2 {up}
on interface: en2
busan-b1 {up}
on interface: en0

seoul
Network interfaces:
seoul-b1 {up}
on interface: en0
seoul-b2 {up}
on interface: en2
seoul:/ # clmgr view report topology

Cluster Name: korea
Cluster IP Address:
NODE busan:
poksap-db 10.168.101.143
busan-b2 192.168.201.144
busan-b1 192.168.101.144
NODE seoul:
poksap-db 10.168.101.143
seoul-b1 192.168.101.143
seoul-b2 192.168.201.143
Network Attribute Alias Monitor method Node Adapter(s)
net_ether_01 public Enable Default monitoring busan busan-b2

busan-b1 poksap-db
seoul seoul-b1 seoul-b2 poksap-db
Adapter Type Network Net Type Attribute Node IP

Address Hardware Address Interface Name Global Name Netmask
Alias for HB Prefix Length

192.168.201.144 en2 255.255.255.0
22
192.168.101.144 en0 255.255.255.0
22
poksap-db service net_ether_01 ether public busan
10.168.101.143 255.255.255.0
22
192.168.101.143 en0 255.255.255.0
22

192.168.201.143 en2 255.255.255.0
22
poksap-db service net_ether_01 ether public seoul
10.168.101.143 255.255.255.0
22
You can also use the clmgr command to see the list of PowerHA SystemMirror log files as
Example 8-56 Viewing the PowerHA cluster log files using the clmgr command
seoul:/ # clmgr view log
Available Logs:
autoverify.log
cl2siteconfig_assist.log
cl_testtool.log
clavan.log
clcomd.log
clcomddiag.log
clconfigassist.log
clinfo.log
clstrmgr.debug
clstrmgr.debug.long
cluster.log
cluster.mmddyyyy
clutils.log
clverify.log
cspoc.log
cspoc.log.long
cspoc.log.remote
dhcpsa.log
dnssa.log
domino_server.log
emuhacmp.out
hacmp.out
ihssa.log
migration.log
sa.log
sax.log
Tip: The output verbose level can be set by using the -l option as in the following
example:
clmgr -l {low|med|high|max} action object
8.3.6 IBM Systems Director web interface

This section explains how to discover and monitor a cluster by using the IBM Systems
Director 6.1 web interface. For the steps to install IBM Systems Director, the IBM PowerHA
SystemMirror plug-in, and IBM System Director Common Agent, see Chapter 11, “Installing
IBM Systems Director and the PowerHA SystemMirror plug-in” on page 325.

Login page for IBM Systems Director
When you point the web browser to the IBM Systems Director IP address, port 8422, you are
presented with a login page. The root user and password are used to log on as shown in
Figure 8-1 on page 247.
Root user: Do not use the root user. The second person who logs on with the root user ID
unlogs the first person, and so on. The logon is exclusive. For a production environment,
create an AIX user ID for each person who must connect to the IBM Systems Director web
interface. This user ID must belong to smadmin. Therefore, everyone can connect
simultaneously to the IBM Systems Director web interface. For more information, see the
“Users and user groups in IBM Systems Director” topic in the IBM Systems Director V6.1.x
Information Center at:
http://publib.boulder.ibm.com/infocenter/director/v6r1x/index.jsp?topic=/direct
or.security_6.1/fqm0_c_user_accounts.html
smadmin (Administrator group): Members of the smadmin group are authorized for all
operations.
Figure 8-1 IBM Systems Director 6.1 login page

Welcome page for IBM Systems Director
On the welcome page, the administrator must first discover the systems with PowerHA to
administer. Figure 8-2 shows the link underlined in red.
Figure 8-2 IBM Systems Director 6.1 welcome page

Discovery Manager
In the Discovery Manager panel, the administrator must click the System discovery link as
Figure 8-3 IBM Systems Director 6.1 Discovery Manager

Selecting the systems and agents to discover
In the System Discovery panel, complete the following actions:
1. For Select a discovery option, select the Range of IPv4 addresses.
2. Enter the starting and ending IP addresses. In Figure 8-4, only the two IP addresses for
the seoul and busan nodes are used.
3. For Select the resource type to discover, leave the default of All.
4. Click the Discover now button. The discovery takes less than 1 minute in this case
because the IP range is limited to two machines.
Figure 8-4 Selecting the systems to discover

IBM Systems Director availability menu
In the left navigation bar, expand Availability and click the PowerHA SystemMirror link as
Figure 8-5 IBM Systems Director 6.1 availability menu

Initial panel of the PowerHA SystemMirror plug-in
In the Health Summary list, you can see that two systems have an OK status with one
resource group also having an OK status. Click the Manage Clusters link as shown in
Figure 8-6.
Figure 8-6 PowerHA SystemMirror plug-in initial menu

PowerHA available clusters
On the Cluster and Resource Group Management panel (Figure 8-7), the PowerHA plug-in
for IBM Systems Director shows the available clusters. This information has been retrieved in
the discovery process. Two clusters are shown: korea and ro_cl. In the korea cluster, the two
nodes, seoul and busan, are visible and indicate a healthy status. The General tab on the
right shows more relevant information about the selected cluster.
Figure 8-7 PowerHA SystemMirror plug-in: Available clusters
Cluster menu
You can right-click all the objects to access options. Figure 8-8 shows an example of the
options for the korea cluster.
Figure 8-8 Menu options when right-clicking a cluster in the PowerHA SystemMirror plug-in

PowerHA SystemMirror plug-in: Resource Groups tab
The Resource Group tab (Figure 8-9) shows the available resource groups in the cluster.
Figure 8-9 PowerHA SystemMirror plug-in: Resource Groups tab

Resource Groups menu
You can right-click the resource groups to access options such as those shown in
Figure 8-10.
Figure 8-10 Options available when right-clicking a resource group in PowerHA SystemMirror plug-in
PowerHA SystemMirror plug-in: Cluster tab

The Cluster tab has several tabs on the right that you can use to retrieve information about
the cluster. These tabs include the Resource Groups tab, Network tab, Storage tab, and
Additional Properties tab as shown in the following sections.
Resource Group tab

Figure 8-11 shows the Resource Groups tab and the information that is presented.
Figure 8-11 PowerHA SystemMirror plug-in: Resource Groups tab

Network tab
Figure 8-12 shows the Networks tab and the information that is displayed.
Figure 8-12 PowerHA SystemMirror plug-in: Networks tab
Storage tab
Figure 8-13 shows the Storage tab and the information that is presented.
Figure 8-13 PowerHA SystemMirror plug-in: Storage tab

Additional Properties tab
Figure 8-14 shows the Additional Properties tab and the information that is presented.
Figure 8-14 PowerHA SystemMirror plug-in Additional Properties tab
8.3.7 IBM Systems Director CLI (smcli interface)

The new web interface for IBM Systems Director is powerful, allowing IBM Systems Director
to be used anywhere to open a systems management console. However, it is often desirable
to perform certain functions against the management server by using a command line.
Whether scripting something to be used on many systems or to automate a process, the CLI
can be useful in a management environment such as IBM Systems Director.
Tip: To run the commands, the smcli interface requires you to be an IBM Systems Director
superuser.
Example 8-57 runs the smcli command host name mexico in IBM Systems Director to see the
available options with PowerHA.
Example 8-57 Available options for PowerHA in IBM Systems Director CLI
mexico:/ # /opt/ibm/director/bin/smcli lsbundle | grep sysmirror
sysmirror/help
sysmirror/lsac
sysmirror/lsam
sysmirror/lsappctl
sysmirror/lsappmon
sysmirror/lscl
sysmirror/lscluster
sysmirror/lsdependency
sysmirror/lsdp
sysmirror/lsfc
sysmirror/lsfilecollection
sysmirror/lsif
sysmirror/lsinterface
sysmirror/lslg
sysmirror/lslog
sysmirror/lsmd
sysmirror/lsmethod

sysmirror/lsnd
.
.
.
All the configuration commands listed in Example 8-57 can be triggered by using the smcli
command. Example 8-58 shows the commands that you can use.
Example 8-58 Using #smcli to retrieve PowerHA information

# Lists the clusters that can be managed by the IBM Systems Director:
mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lscluster
korea (1108531106)
# Lists the service labels of a cluster:

mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lssi -c korea
poksap-db
# Lists all interfaces defined in a cluster:

mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lsif -c korea
busan-b1
busan-b2
seoul-b1
seoul-b2
# Lists resource groups of a cluster:

mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lsrg -c korea
# Lists networks:
mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lsnw -c korea
net_ether_01
# Lists application servers of a cluster:

mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lsac -c korea
db2pok_ApplicationServer

9
Chapter 9. Testing the PowerHA 7.1 cluster

This chapter takes you through several simulations for testing a PowerHA 7.1 cluster and then
explains the cluster behavior and log files. This chapter includes the following topics:
Testing the SAN-based heartbeat channel
Testing the repository disk heartbeat channel
Simulation of a network failure
Testing the rootvg system event
Simulation of a crash in the node with an active resource group
Simulations of CPU starvation
Simulation of a Group Services failure
Testing a Start After resource group dependency
Testing dynamic node priority

9.1 Testing the SAN-based heartbeat channel
This section explains how to check the redundant heartbeat through the storage area network
(SAN)-based channel if the network communication between nodes is lost. The procedure is
based on the test cluster shown in Figure 9-1. In this environment, the PowerHA cluster is
synchronized, and the CAA cluster is running.
Figure 9-1 Testing the SAN-based heartbeat
Example 9-1 shows the working state of the CAA cluster.
Example 9-1 Initial error-free CAA status

Cluster Name: au_cl

Node sydney

Interface state UP
255.255.255.0
netmask 0.0.0.0
Interface state UP
255.255.255.0
netmask 0.0.0.0
Interface state UP
Node perth
Chapter 9. Testing the PowerHA 7.1 cluster 261

Interface state UP
255.255.255.0
netmask 0.0.0.0
Interface state UP
255.255.255.0
netmask 0.0.0.0
Interface state UP

You can check the connectivity between nodes by using the socksimple command. This
command provides a ping-type interface to send and receive packets over the cluster
communications channels. Example 9-2 shows the usage output of running the socksimple
command.
Example 9-2 socksimple usage

sydney:/ # socksimple
Usage: socksimple -r|-s [-v] [-a address] [-p port] [-t ttl]
-r|-s Receiver or sender. Required argument,

mutually exclusive
-a address Cluster address to listen/send on,
overrides the default. (must be < 16 characters long)
-p port port to listen/send on,
overrides the default of 12.
-p ttl Time-To-Live to send,
overrides the default of 1.
-v Verbose mode
You can obtain the cluster address for the -a option of the socksimple command from the
lscluster -c command output (Example 9-3).
Example 9-3 Node IDs of the CAA cluster

sydney:/ # lscluster -c
Cluster query for cluster aucl returns:
Cluster uuid: 98f28ffa-cfde-11df-9a82-00145ec5bf9a
Cluster id for node perth is 1
Primary IP address for node perth is 192.168.101.136
Cluster id for node sydney is 2
Primary IP address for node sydney is 192.168.101.135
To test the SAN-based heartbeat channel, follow these steps:

1. Check the cluster communication with all the network interfaces up (Example 9-4).
Example 9-4 The socksimple test with the network channel up

sydney:/ # socksimple -s -a 1
socksimple version 1.2
socksimpleing 1/12 with ttl=1:
1275 bytes from cluster host id = 1: seqno=1275 ttl=1 time=0.415 ms

--- socksimple statistics ---

3 packets transmitted, 3 packets received
round-trip min/avg/max = 0.347/0.381/0.415 ms
perth:/ # socksimple -r -a 1
Listening on 1/12:

Replying to socksimple from cluster node id = 2 bytes=1275 seqno=1275 ttl=1
perth:/ #
2. Disconnect the network interfaces, by pulling the cables in one node to simulate an
Ethernet network failure. Example 9-5 shows the interfaces status.
Example 9-5 Ethernet ports down

perth:/ # entstat -d ent1 | grep -i link
Link Status: UNKNOWN

Link Status: UNKNOWN
3. Check the cluster communication by using the socksimple command as shown in

Example 9-6.
Example 9-6 The socksimple test with Ethernet ports down

sydney:/ # socksimple -s -a 1
socksimpleing 1/12 with ttl=1:

--- socksimple statistics ---

2 packets transmitted, 6 packets received
round-trip min/avg/max = 0.897/67.427/150.791 ms
perth:/ # socksimple -r -a 1
Listening on 1/12:

perth:/
4. Check the status of the cluster interfaces by using the lscluster -i command.
Example 9-7 shows the status for both disconnected ports on the perth node. In this
example, the status has changed from UP to DOWN SOURCE HARDWARE RECEIVE SOURCE
HARDWARE TRANSMIT.
Example 9-7 CAA cluster status with Ethernet ports down

Cluster Name: aucl


Node sydney
Interface state UP
IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255
netmask 255.255.255.0
1
netmask 0.0.0.0
Interface state UP
netmask 255.255.255.0
1
netmask 0.0.0.0
Interface state UP

Node perth
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE
TRANSMIT
netmask 255.255.255.0
1
netmask 0.0.0.0
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE
TRANSMIT
netmask 255.255.255.0
1
netmask 0.0.0.0
Interface state UP

5. Reconnect the Ethernet cables and check the port status as shown in Example 9-8.
Example 9-8 Ethernet ports reconnected

Link Status: Up
perth:/ # entstat -d ent2|grep -i link

Link Status: Up
6. Check if the cluster status has recovered. Example 9-9 shows that both Ethernet ports on
the perth node are now in the UP state.
Example 9-9 CAA cluster status recovered

Cluster Name: aucl

Node sydney
Interface state UP
netmask 255.255.255.0
1
netmask 0.0.0.0

Interface state UP
netmask 255.255.255.0
1
netmask 0.0.0.0
Interface state UP
Node perth
Interface state UP
netmask 255.255.255.0
1
netmask 0.0.0.0

Interface state UP
netmask 255.255.255.0
1
netmask 0.0.0.0
Interface state UP
9.2 Testing the repository disk heartbeat channel

This section explains how to test the repository disk heartbeat channel.
9.2.1 Background
When the entire PowerHA SystemMirror IP network fails, and either the SAN-based heartbeat
network (sfwcom) does not exist, or it exists but has failed, CAA uses the
heartbeat-over-repository-disk (dpcom) feature.
The example in the next section describes dpcom heartbeating in a two-node cluster after all
IP interfaces have failed.

9.2.2 Testing environment
A two-node cluster is configured with the following topology:
en0 is not included in the PowerHA cluster, but it is monitored by CAA.
en3 through en5 are included in the PowerHA cluster and monitored by CAA.
No SAN-based communication channel (sfwcom) is available.
Initially, both nodes are online and running cluster services, all IP interfaces are online, and
the service IP address has an alias on the en3 interface.
This test scenario includes unplugging the cable of one interface at a time, starting with en3,
en4, en5, and finally en0. As each cable is unplugged, the service IP correctly swaps to the
next available interface on the same node. Each failed interface is marked as DOWN SOURCE
HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT as shown in Example 9-10. After the cables for
the en3 through en5 interfaces are unplugged, a local network failure event occurs, leading to
a selective failover of the resource group to the remote node. However, because the en0
interface is still up, CAA continues to heartbeat over the en0 interface.
Example 9-10 Output of the lscluster -i command

[hacmp27:HAES7101/AIX61-06 /]
# lscluster -i
Cluster Name: ha71sp1_aixsp2
Cluster uuid: c37f7324-daff-11df-903e-0011257e4998
Node hacmp27
Node uuid = 66b66b10-d16e-11df-aa3f-0011257e4998
Mac address = 0.11.25.7e.49.98
Interface state UP
IPV4 ADDRESS: 9.3.44.27 broadcast 9.3.44.255 netmask 255.255.255.0
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Mac address = 0.11.25.cc.d.b5
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT

Node hacmp28
Node uuid = 15e86116-d173-11df-8bdf-0011257e4340
Mac address = 0.11.25.7e.43.40
Interface state UP

Mac address = 0.11.25.cb.e1.d
Interface state UP
Mac address = 0.11.25.cb.e1.e
Interface state UP
Mac address = 0.11.25.cb.e1.f
Interface state UP

Example 9-11 shows the output of the lscluster -m command.
Example 9-11 Output of the lscluster -m command

# lscluster -m
Node name: hacmp27

uuid for node: 66b66b10-d16e-11df-aa3f-0011257e4998
ha71sp1_aixsp2 local c37f7324-daff-11df-903e-0011257e4998

n/a
------------------------------
Node name: hacmp28

uuid for node: 15e86116-d173-11df-8bdf-0011257e4340
State of node: UP

en0 UP
en5 DOWN
en4 DOWN
en3 DOWN

After the en0 cable is unplugged, CAA proceeds to heartbeat over the repository disk
(dpcom). This action is indicated by the node status REACHABLE THROUGH REPOS DISK ONLY in
the lscluster -m command (Example 9-12).
Example 9-12 Output of the lscluster -m command

# lscluster -m
Node name: hacmp27

State of node: UP NODE_LOCAL REACHABLE THROUGH REPOS DISK ONLY

n/a
------------------------------
Node name: hacmp28

State of node: UP REACHABLE THROUGH REPOS DISK ONLY

dpcom UP
en0 DOWN
en5 DOWN
en4 DOWN
en3 DOWN
# lscluster -m
Node name: hacmp27


State of node: UP REACHABLE THROUGH REPOS DISK ONLY

dpcom UP
en4 DOWN
en5 DOWN
en3 DOWN
en0 DOWN
------------------------------
Node name: hacmp28


n/a
Example 9-13 shows the output of the lscluster -i command with the dpcom status
changing from UP RESTRICTED AIX_CONTROLLED to UP AIX_CONTROLLED.
Example 9-13 Output of the lscluster -i command showing the dpcom status
# lscluster -i

Node hacmp27
Mac address = 0.11.25.7e.49.98


Interface state UP AIX_CONTROLLED
Node hacmp28
Mac address = 0.11.25.7e.43.40
Interface state UP
Interface state UP
Interface state UP

Interface state UP
Interface state UP AIX_CONTROLLED
After any interface cable is reconnected, such as the en0 interface, CAA stops heartbeating
over the repository disk and resumes heartbeating over the IP interface.
Example 9-14 shows the output of the lscluster -m command after the en0 cable is
reconnected. The dpcom status changes from UP to DOWN RESTRICTED, and the en0 interface
status changes from DOWN to UP.
Example 9-14 Output of the lscluster -m command after en0 is reconnected

[hacmp27:HAES/AIX61-06 /]
# lscluster -m
Node name: hacmp27


n/a
------------------------------
Node name: hacmp28

State of node: UP


dpcom DOWN RESTRICTED
en0 UP
en5 DOWN
en4 DOWN
en3 DOWN
Example 9-15 shows the output of the lscluster -i command. The en0 interface is now
marked as UP, and the dpcom returns to UP RESTRICTED AIX_CONTROLLED.
Example 9-15 Output of the lscluster -i command

[hacmp27:HAES/AIX61-06 /]
# lscluster -i

Node hacmp27
Mac address = 0.11.25.7e.49.98
Interface state UP

Node hacmp28
Mac address = 0.11.25.7e.43.40
Interface state UP

Interface state UP
Interface state UP
Interface state UP

9.3 Simulation of a network failure

The following section explains the simulation of a network failure.
9.3.1 Background
In PowerHA 7.1, the heartbeat method has changed. Heartbeating between the nodes is now
done by AIX. The newly introduced CAA takes the role for heartbeating and event
management.
This simulation tests a network down scenario and looks at the log files of PowerHA and CAA
monitoring. This test scenario has a two-node cluster, and one network interface is down on
one of the nodes using the ifconfig command.
This cluster has one IP heartbeat path and two non-heartbeat paths. One of the
non-heartbeat paths is a SAN-based heartbeat channel (sfwcom). The other non-heartbeat
path is heartbeating over the repository disk (dpcom). Although IP connectivity is lost when
using the ifconfig command, PowerHA SystemMirror use CAA for heartbeating over the two
other channels. This process is similar to the rs232 or diskhb heartbeat networks in previous
versions of PowerHA.
9.3.2 Testing environment

Before starting the network failover test, you check the status of the cluster. The resource
group myrg is on the riyad node as shown in Figure 9-2.
riyad:/ # netstat -i
en0 1500 link#2 a2.4e.5f.b4.5.2 74918 0 50121 0 0
en0 1500 192.168.100 riyad 74918 0 50121 0 0
en0 1500 10.168.200 saudisvc 74918 0 50121 0 0
lo0 16896 link#1 3937 0 3937 0 0
lo0 16896 127 loopback 3937 0 3937 0 0
lo0 16896 loopback 3937 0 3937 0 0
riyad:/ # clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
myrg ONLINE riyad
OFFLINE jeddah
Figure 9-2 Status of the riyad node

The output of the lscluster -i command (Figure 9-3) shows that every adapter has the UP
state.
riyad:/ # lscluster -i |egrep "Interface|Node"

Node riyad
Node uuid = 2f1590d0-cc02-11df-bf20-a24e5fb40502
Interface state UP
Interface state UP
Node jeddah
Node uuid = 39710df0-cc04-11df-929f-a24e5f0d9e02
Interface state UP
Interface state UP
Figure 9-3 Output of the lscluster -i command
9.3.3 Testing a network failure

Now, the ifconfig en0 down command is issued on the riyad node. The lscluster
command shows en0 in a DOWN state and the resource group of the cluster moves to the next
available node in the chain as shown in Figure 9-4.
riyad:/ # lscluster -i |egrep "Interface|Node"

Node riyad
Node uuid = 2f1590d0-cc02-11df-bf20-a24e5fb40502
Interface state DOWN SOURCE SOFTWARE
Interface state UP
Node jeddah
Node uuid = 39710df0-cc04-11df-929f-a24e5f0d9e02
Interface state UP
Interface state UP
Figure 9-4 The lscluster -i command after a network failure

The clRGinfo command shows that the myrg resource group moved to the jeddah node
(Figure 9-5).
riyad:/ # clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
myrg OFFLINE riyad
ONLINE jeddah
Figure 9-5 clRGinfo while network failure
You can also check the network down event in the /var/hacmp/adm/cluster.log file
(Figure 9-6).
Oct 6 09:57:42 riyad user:notice PowerHA SystemMirror for AIX: EVENT START:
network_down riyad net_ether_01
Oct 6 09:57:42 riyad user:notice PowerHA SystemMirror for AIX: EVENT
COMPLETED: network_down riyad net_ether_01 0
Oct 6 09:57:42 riyad user:notice PowerHA SystemMirror for AIX: EVENT START:
network_down_complete riyad net_ether_01
Oct 6 09:57:43 riyad user:notice PowerHA SystemMirror for AIX: EVENT
COMPLETED: network_down_complete riyad net_ether_01 0
Figure 9-6 Network down event from the cluster.log file

You can see this event by monitoring the AHAFS events. You can monitor AHAFS event by
running the /usr/sbin/rsct/bin/ahafs_mon_multi command as shown in Figure 9-7.
jeddah:/ # /usr/sbin/rsct/bin/ahafs_mon_multi
=== write String : CHANGED=YES;CLUSTER=YES
=== files being monitored:
fd file
3 /aha/cluster/nodeState.monFactory/nodeStateEvent.mon
4 /aha/cluster/nodeAddress.monFactory/nodeAddressEvent.mon
5 /aha/cluster/networkAdapterState.monFactory/networkAdapterStateEvent.mon
6 /aha/cluster/nodeList.monFactory/nodeListEvent.mon
7 /aha/cpu/processMon.monFactory/usr/sbin/rsct/bin/hagsd.mon
==================================
Loop 1:
Event for
/aha/cluster/networkAdapterState.monFactory/networkAdapterStateEvent.mon has
occurred.
BEGIN_EVENT_INFO
TIME_tvsec=1286376025
TIME_tvnsec=623294923
SEQUENCE_NUM=0
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=ADAPTER_DOWN
INTERFACE_NAME=en0
NODE_NUMBER=2
NODE_ID=0x2F1590D0CC0211DFBF20A24E5FB40502
CLUSTER_ID=0x93D8689AD0F211DFA49CA24E5F0D9E02
END_EVPROD_INFO
END_EVENT_INFO
==================================
Figure 9-7 Event monitoring from AHAFS
With help from the caa_event, you can monitor the network failure event. You can see the
CAA event by running the /usr/sbin/rsct/bin/caa_event -a command (Figure 9-8).
# /usr/sbin/rsct/bin/caa_event -a
EVENT: adapter liveness:
event_type(0)
node_number(2)
node_id(0)
sequence_number(0)
reason_number(0)
p_interface_name(en0)
EVENT: adapter liveness:
event_type(1)
node_number(2)
node_id(0)
sequence_number(1)
reason_number(0)
p_interface_name(en0)
Figure 9-8 Network failure in CAA event monitoring

In this test scenario, you can see that the non-IP based heartbeat channel is working.
Compared to previous version, heartbeating is now performed by CAA.
9.4 Testing the rootvg system event

This scenario tests the event monitoring capability of PowerHA 7.1 on the new rootvg
system. Because events are now being monitored at the kernel level with CAA, you can
monitor the loss of access to the rootvg volume group.
9.4.1 The rootvg system event

As discussed previously, event monitoring is now at the kernel level. The
/usr/lib/drivers/phakernmgr kernel extension, which is loaded by the clevmgrdES
subsystem, monitors these events for loss of rootvg. It can initiate a node restart operation if
enabled to do so as shown in Figure 9-9.
PowerHA 7.1 has a new system event that is enabled by default. This new event allows for the
monitoring of the loss of the rootvg volume group while the cluster node is up and running.
Previous versions of PowerHA/HACMP were unable to monitor this type of loss. Also the
cluster was unable to perform a failover action in the event of the loss of access to rootvg. An
example is if you lose a SAN disk that is hosting the rootvg for this cluster node.
The new option is available under the SMIT menu path smitty sysmirror Custom Cluster
Configuration  Events  System Events. Figure 9-9 shows that the rootvg system event
is defined and enabled by default in PowerHA 7.1.
Change/Show Event Response

[Entry Fields]
* Event Name ROOTVG +
* Response Log event and reboot +
* Active Yes +
Figure 9-9 The rootvg system event
The default event properties instruct the system to log an event and restart when a loss of
rootvg occurs. This exact scenario is tested in the next section to demonstrate this concept.
9.4.2 Testing the loss of the rootvg volume group

We simulate this test with a two-node cluster. The rootvg volume group is hosted on SAN
storage through a Virtual I/O Server (VIOS). The test removes access to the rootvg file
systems with the cluster node still up and running. This test can be done in several ways, from
pulling the physical SAN connection to making the storage unavailable to the operating
system. In this situation, the VSCSI resource is made unavailable on the VIOS.
This scenario entails a two-node cluster with one resource group. The cluster is running on
two nodes: sydney and perth. The rootvg volume group is hosted by the VIOS on a VSCSI
disk.

Cluster node status and mapping
First, you check the VIOS for the client mapping. You can identify the client partition number
by running the uname -L command on the cluster node. In this case, the client partition is 7.
Next you run the lsmap -all command on the VIOS, as shown in Figure 9-10, and look up
the client partition. Only one LUN is mapped through VIOS to the cluster node, because the
shared storage is attached by using Fibre Channel (FC) adapters.
lsmap -all
SVSA Physloc Client Partition

ID
--------------- -------------------------------------------- ------------------
vhost5 U9117.MMA.101F170-V1-C26 0x00000007
VTD vtscsi13
Status Available
LUN 0x8100000000000000
Backing device lp5_rootvg
Physloc
Figure 9-10 VIOS output of the lsmap command showing the rootvg resource
Check the node to ensure that you have the right disk as shown in Figure 9-11.
sydney:/ # lscfg -l hdisk0

hdisk0 U9117.MMA.101F170-V7-C5-T1-L8100000000000000 Virtual SCSI
Disk Drive
sydney:/ # lspv
hdisk0 00c1f170ff638163 rootvg active
caa_private0 00c0f6a0febff5d4 caavg_private active
hdisk2 00c1f170674f3d6b dbvg
hdisk3 00c1f1706751bc0d appvg
Figure 9-11 PowerHA node showing the mapping of hdisk0 to the VIOS
After the mapping is established, review the cluster status to ensure that the resource group
is online as shown in Figure 9-12.
sydney:/ # clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
dbrg ONLINE sydney
OFFLINE perth
sydney:/ # lssrc -ls clstrmgrES | grep "Current state"

Current state: ST_STABLE
Figure 9-12 Sydney cluster status

Testing by taking the rootvg volume group offline
To perform the test, take the mapping offline on the VIOS by removing the virtual target
device definition. You do this test while the PowerHA node is up and running as shown in
Figure 9-13.
$ rmvdev -vtd vtscsi13

$ lsmap -vadapter vhost5
SVSA Physloc Client Partition
ID
--------------- -------------------------------------------- ------------------
vhost5 U9117.MMA.101F170-V1-C26 0x00000007
VTD NO VIRTUAL TARGET DEVICE FOUND

Figure 9-13 VIOS: Taking the rootvg LUN offline
You have now removed the virtual target device (VTD) mapping that maps the rootvg LUN to
the client partition, which in this case, is the PowerHA node called sydney. You perform this
operation while the node is up and running and hosting the resource group. This operation
demonstrates what happens to the node when rootvg access has been lost.
While checking the node, the node halted and failed the resource group over to the standby
node perth (Figure 9-14). This behavior is new and expected in this situation. It is a result of
the system event that monitors access to rootvg from the kernel. Checking perth shows that
the failover happened.
perth:/ # clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
dbrg OFFLINE sydney
ONLINE perth
Figure 9-14 Node status from the standby node showing that the node failed over

9.4.3 Loss of rootvg: What PowerHA logs
To show that this event is recognized and that you took the correct action, check the system
error report shown in Figure 9-15.
LABEL: KERNEL_PANIC
IDENTIFIER: 225E3B63
Date/Time: Wed Oct 6 14:07:54 2010

Sequence Number: 2801
Machine Id: 00C1F1704C00
Node Id: sydney
Class: S
Type: TEMP
WPAR: Global
Resource Name: PANIC
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
ASSERT STRING
PANIC STRING
System Halt because of rootvg failure
Figure 9-15 System error report showing a rootvg failure
9.5 Simulation of a crash in the node with an active resource

group
This section presents a scenario to simulate the node crash with the resource group active.
The scenario is made of the hot-standby cluster configuration with participating nodes seoul
and busan with only one Ethernet network and two Ethernet interfaces in each node. The halt
-q command is triggered in the seoul node that is hosting the resource group.
The result is that the resource group moved to the standby node as expected. Example 9-16
shows the relevant output that is written to the busan:/var/hacmp/adm/cluster.log file.
Example 9-16 Output of the resource move to the standby node

Sep 29 16:30:22 busan user:warn|warning cld[11599982]: Shutting down all services.
Sep 29 16:30:23 busan user:warn|warning cld[11599982]: Unmounting file systems.
Sep 29 16:30:28 busan daemon:err|error ConfigRM[10879056]: (Recorded using libct_ffdc.a cv 2):::Error ID:
:::Reference ID::::Template ID: a098bf90:::Details File: :::Location:
RSCT,PeerDomain.C,1.99.1.519,17853:::CONFIGRM_PENDINGQUORUM_ER The operational quorum state of the active
peer domain has changed to PENDING_QUORUM. This state usually indicates that exactly half of the nodes
that are defined in the peer domain are online. In this state cluster resources cannot be recovered
although none will be stopped explicitly.
Sep 29 16:30:28 busan local0:crit clstrmgrES[5701662]: Wed Sep 29 16:30:28 Removing 2 from ml_idx
Sep 29 16:30:28 busan user:notice PowerHA SystemMirror for AIX: EVENT START: node_down seoul

Sep 29 16:30:28 busan user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: node_down seoul 0
Sep 29 16:30:31 busan user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_release busan 1
Sep 29 16:30:31 busan user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move busan 1 RELEASE
Sep 29 16:30:32 busan user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move busan 1 RELEASE 0
Sep 29 16:30:32 busan user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_release busan 1 0
Sep 29 16:30:32 busan user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_fence busan 1
Sep 29 16:30:33 busan user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_fence busan 1 0
Sep 29 16:30:35 busan user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_fence busan 1
Sep 29 16:30:35 busan user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_fence busan 1 0
Sep 29 16:30:35 busan user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_acquire busan 1
Sep 29 16:30:35 busan user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move busan 1 ACQUIRE
Sep 29 16:30:36 busan user:notice PowerHA SystemMirror for AIX: EVENT START: acquire_takeover_addr
Sep 29 16:30:38 busan user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: acquire_takeover_addr 0
Sep 29 16:30:45 busan user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move busan 1 ACQUIRE 0
Sep 29 16:30:45 busan user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_acquire busan 1 0
Sep 29 16:30:45 busan user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_complete busan 1
Sep 29 16:30:46 busan user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_complete busan 1 0
The cld messages are related to the solidDB. The cld subsystem determines whether the
local node must become the primary or secondary solidDB server in a failover. Before the
crash, solidDB was active on the seoul node as follows:
seoul:/ # lssrc -ls IBM.StorageRM | grep Leader
Group Leader: seoul, 0xdc82faf0908920dc, 2
As expected, after the crash, solidDB is active in the remaining busan node as follows:
busan:/ # lssrc -ls IBM.StorageRM | grep Leader
Group Leader: busan, 0x564bc620973c9bdc, 1
With the absence of the seoul node, its interfaces are in STALE status as shown in
Example 9-17.
Example 9-17 The lscluster -i command to check the status of the cluster
busan:/ # lscluster -i
Cluster Name: korea

Node busan
Mac address = a2.4e.57.e1.8a.3
Interface state UP
255.255.255.0

255.255.255.0
IPV4 MULTICAST ADDRESS: 228.168.101.43 broadcast 0.0.0.0 netmask
0.0.0.0
Mac address = a2.4e.57.e1.8a.7
Interface state UP
255.255.255.0
255.255.255.0
0.0.0.0
Node seoul
Mac address = a2.4e.50.54.31.3
Interface state STALE
255.255.255.0
255.255.255.0
0.0.0.0

Mac address = a2.4e.50.54.31.7
255.255.255.0
0.0.0.0
Results: The results were the same when issuing the halt command instead of the halt
-q command.
9.6 Simulations of CPU starvation

In previous versions of PowerHA, CPU starvation could activate the deadman switch, leading
the starved node to a halt with a consequent move of the resource groups. In PowerHA 7.1,
the deadman switch no longer exists, and its functionality is accomplished at the kernel
interruption level. This test shows how the absence of the deadman switch can influence
cluster behavior.
Scenario 1
This scenario shows the use of a stress tool on the CPU of one node with more than 50
processes in the run queue and a duration of 60 seconds.
Overview
This scenario consists of a hot-standby cluster configuration with participating nodes seoul
and busan with only one Ethernet network. Each node has two Ethernet interfaces. The
resource group is hosted on seoul, and solidDB is active on the busan node. A tool is run to
stress the seoul CPU with more than 50 processes in the run queue with a duration of 60
seconds as shown in Example 9-18 on page 293.

Example 9-18 Scenario testing the use of a stress tool on one node
seoul:/ # clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
OFFLINE busan
Beneath the lpartstat output header, you see the CPU and memory configuration for each
node:
Seoul: Power 6, type=Shared, mode=Uncapped, smt=On, lcpu=2, mem=3584MB, ent=0.50
Busan: Power 6, type=Shared, mode=Uncapped, smt=On, lcpu=2, mem=3584MB, ent=0.50
Results
Before the test, the seoul node is running within an average of 3% of its entitled capacity. The
run queue is within an average of three processes as shown in Example 9-19.
Example 9-19 The vmstat result of the seoul node

seoul:/ # vmstat 2
System configuration: lcpu=2 mem=3584MB ent=0.50
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------------------
r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec
2 0 424045 10674 0 0 0 0 0 0 92 1508 359 1 2 97 0 0.02 3.4
3 0 424045 10674 0 0 0 0 0 0 84 1001 346 1 1 97 0 0.02 3.1
3 0 424044 10675 0 0 0 0 0 0 88 1003 354 1 1 97 0 0.02 3.1
3 0 424045 10674 0 0 0 0 0 0 91 1507 352 1 2 97 0 0.02 3.5
3 0 424047 10672 0 0 0 0 0 0 89 1057 370 1 2 97 0 0.02 3.3
3 0 424064 10655 0 0 0 0 0 0 94 1106 379 1 2 97 0 0.02 3.6
During the test, the entitled capacity raised to 200%, and the run queue raised to an average
of 50 processes as shown in Example 9-20.
Example 9-20 Checking the node status after running the stress test
seoul:/ # vmstat 2
----- ----------- ------------------------ ------------ -----------------------
52 0 405058 167390 0 0 0 0 0 0 108 988 397 42 8 50 0 0.25 50.6
41 0 405200 167248 0 0 0 0 0 0 78 140 245 99 0 0 0 0.79 158.1
49 0 405277 167167 0 0 0 0 0 0 71 206 249 99 0 0 0 1.00 199.9
50 0 405584 166860 0 0 0 0 0 0 73 33 241 99 0 0 0 1.00 199.9
48 0 405950 166491 0 0 0 0 0 0 71 297 244 99 0 0 0 1.00 199.8
As expected, the CPU starvation did not trigger a resource group move from the seoul node
to the busan node. The /var/adm/ras/syslog.caa log file reported messages about solidDB
daemons being unable to communicate, but the leader node continued to be the busan node
as shown in Example 9-21 on page 294.

Example 9-21 Status of the nodes after triggering a CPU starvation scenario
seoul:/ # clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
OFFLINE busan
Scenario 2
This scenario shows the use of a stress tool on the CPU of two nodes with more than 50
processes in the run queue and a duration of 60 seconds.
Overview
and busan with only one Ethernet network. Each node has two Ethernet interfaces. Both the
resource group and the solidDB are active in busan node. A tool is run to stress the CPU of
both nodes with more than 50 processes in the run queue with a duration of 60 seconds as
Example 9-22 Scenario testing the use of a stress tool on both nodes
seoul:/ # clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
db2pok_Resourc OFFLINE seoul
ONLINE busan
Results
Before the test, both nodes have a low run queue and low entitled capacity as shown in
Example 9-23.
Example 9-23 Results of the stress test in scenario two

seoul:/ # vmstat 2
----- ----------- ------------------------ ------------ -----------------------
1 0 389401 181315 0 0 0 0 0 0 95 1651 302 2 2 97 0 0.02 3.5
1 0 389405 181311 0 0 0 0 0 0 91 960 316 1 2 97 0 0.02 3.3
1 0 389406 181310 0 0 0 0 0 0 88 953 299 1 1 97 0 0.02 3.1
1 0 389408 181308 0 0 0 0 0 0 97 1461 301 1 2 97 0 0.02 3.5
1 0 389411 181305 0 0 0 0 0 0 109 967 326 1 3 96 0 0.02 4.7
busan:/ # vmstat 2

----- ----------- ------------------------ ------------ -----------------------
1 0 450395 349994 0 0 0 0 0 0 77 670 363 1 2 97 0 0.02 3.4
1 0 450395 349994 0 0 0 0 0 0 80 477 359 1 1 98 0 0.02 3.1
1 0 450395 349994 0 0 0 0 0 0 80 554 369 1 1 97 0 0.02 3.4
1 0 450395 349994 0 0 0 0 0 0 73 479 368 1 1 98 0 0.02 3.1
1 0 450395 349994 0 0 0 0 0 0 81 468 339 1 1 98 0 0.01 2.9
During the test, the seoul node kept an average of 50 processes in the run queue and an
entitled capacity of 200% as shown in Example 9-24.
Example 9-24 Seoul node vmstat results during the test

seoul:/ # vmstat 2
----- ----------- ------------------------ ------------ -----------------------
43 0 371178 199534 0 0 0 0 0 0 74 312 251 99 0 0 0 1.00 199.8
52 0 371178 199534 0 0 0 0 0 0 73 19 247 99 0 0 0 1.00 200.0
52 0 371176 199534 0 0 0 0 0 0 75 108 249 99 0 0 0 1.00 199.9
47 0 371075 199635 0 0 0 0 0 0 74 33 257 99 0 0 0 1.00 200.1
The busan node did not respond to the vmstat command during the test. When the CPU
stress finished, it could throw just one line of output showing a run queue of 119 processes
(Example 9-25).
Example 9-25 Busan node showing only one line of output

busan:/ # vmstat 2
----- ----------- ------------------------ ------------ -----------------------
119 0 450463 349911 0 0 0 0 0 0 56 19 234 99 0 0 0 0.50 99.6
Both the resource group and solidDB database did not move from the busan node as shown
in Example 9-26.
Example 9-26 Status of the busan node

seoul:/ # clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
db2pok_Resourc OFFLINE seoul
ONLINE busan

Conclusion
The conclusion of this test is that eventual peak performance degradation events do not
cause resource group moves and unnecessary outages.

9.7 Simulation of a Group Services failure
and busan with only one Ethernet network. Each node has two Ethernet interfaces. We end
the cthags process in the seoul node that was hosting the resource group.
As a result, the seoul node halted as expected, and the resource group is acquired by the
remaining node as shown in Example 9-27.
Example 9-27 Resource group movement

seoul:/ # lssrc -ls cthags
5 locally-connected clients. Their PIDs:
6095070(IBM.ConfigRMd) 6357196(rmcd) 5963828(IBM.StorageRMd) 7471354(clstrmgr)
12910678(gsclvmd)
HA Group Services domain information:
Domain established by node 2
Number of groups known locally: 8
Number of Number of local
Group name providers providers/subscribers
rmc_peers 2 1 0
s00O3RA00009G0000015CDBQGFL 2 1 0
IBM.ConfigRM 2 1 0
IBM.StorageRM.v1 2 1 0
CLRESMGRD_1108531106 2 1 0
CLRESMGRDNPD_1108531106 2 1 0
CLSTRMGR_1108531106 2 1 0
d00O3RA00009G0000015CDBQGFL 2 1 0
Critical clients will be terminated if unresponsive
seoul:/ # ps -ef | grep cthagsd | grep -v grep

root 5963978 3866784 4 17:02:33 - 0:00 /usr/sbin/rsct/bin/hagsd cthags
seoul:/ # kill -9 5963978
The seoul:/var/adm/ras/syslog.caa log file recorded the messages before the crash. You
can observe that the seoul node was halted after 1 second as shown in Example 9-28.
Example 9-28 Message in the syslog.caa file in the seoul node

Sep 29 17:02:33 seoul daemon:err|error RMCdaemon[6357196]: (Recorded using libct_ffdc.a cv 2):::Error ID:
6XqlQl0dZucA/POE1DK4e.1...................:::Reference ID: :::Template ID: b1731da3:::Details File:
:::Location: RSCT,rmcd_gsi.c,1.50,10
48 :::RMCD_2610_101_ER Internal error. Error data 1 00000001 Error data 2 00000000 Error data 3 dispatch_gs
Sep 29 17:02:33 seoul local0:crit clstrmgrES[7471354]: Wed Sep 29 17:02:33 announcementCb: Called,
state=ST_STABLE, provider token 1
Sep 29 17:02:33 seoul local0:crit clstrmgrES[7471354]: Wed Sep 29 17:02:33 announcementCb: GsToken 3,
AdapterToken 4, rm_GsToken 1
Sep 29 17:02:33 seoul local0:crit clstrmgrES[7471354]: Wed Sep 29 17:02:33 announcementCb: GRPSVCS
announcment code=512; exiting
Sep 29 17:02:33 seoul local0:crit clstrmgrES[7471354]: Wed Sep 29 17:02:33 CHECK FOR FAILURE OF RSCT
SUBSYSTEMS (cthags)
Sep 29 17:02:33 seoul daemon:err|error ConfigRM[6095070]: (Recorded using libct_ffdc.a cv 2):::Error ID:
:::Reference ID: :::Template ID: 362b0a5f:::Details File: :::Location:
RSCT,PeerDomain.C,1.99.1.519,21079:::CONFIGRM_EXIT_GS_ER The peer domain configuration manager daemon

(IBM.ConfigRMd) is exiting due to the Group Services subsystem terminating. The configuration manager
daemon will restart automatically, synchronize the nodes configuration with the domain and rejoin the
domain if possible.
Sep 29 17:02:34 seoul daemon:notice StorageRM[5963828]: (Recorded using libct_ffdc.a cv 2):::Error ID:
:::Reference ID: :::Template ID: a8576c0d:::Details File: :::Location: RSCT,StorageRMDaemon.C,1.56,323
:::STORAGERM_STOPPED_ST IBM.StorageRM daemon has been stopped.
Sep 29 17:02:34 seoul daemon:notice ConfigRM[6095070]: (Recorded using libct_ffdc.a cv 2):::Error ID:
:::Reference ID: :::Template ID: de84c4db:::Details File: :::Location: RSCT,IBM.ConfigRMd.C,1.55,346
:::CONFIGRM_STARTED_STIBM.ConfigRM daemon has started.
Sep 29 17:02:34 seoul daemon:notice snmpd[3342454]: NOTICE: lost peer (SMUX ::1+51812+5)
Sep 29 17:02:34 seoul daemon:notice RMCdaemon[15663146]: (Recorded using libct_ffdc.a cv 2):::Error ID:
6eKora0eZucA/Xuo/D
K4e.1...................:::Reference ID: :::Template ID: a6df45aa:::Details File: :::Location:
RSCT,rmcd.c,1.75,225:::RMCD_INFO_0_ST The daemon is started.
Sep 29 17:02:34 seoul user:notice PowerHA SystemMirror for AIX: clexit.rc : Unexpected termination of
clstrmgrES
Sep 29 17:02:34 seoul user:notice PowerHA SystemMirror for AIX: clexit.rc : Halting system immediately!!!
9.8 Testing a Start After resource group dependency

This test uses the example that was configured in 5.1.6, “Configuring Start After and Stop
After resource group dependencies” on page 96. Figure 9-16 shows a summary of the
configuration. The dependency configuration of the Start After resource group is tested to see
whether it works as expected.
Figure 9-16 Start After dependency between the apprg and dbrg resource group

9.8.1 Testing the standard configuration of a Start After resource group
dependency
Example 9-29 shows the state of a resource group pair after a normal startup of the cluster on
both nodes.
Example 9-29 clRGinfo for a Start After resource group pair

sydney:/ # clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
dbrg ONLINE sydney
OFFLINE perth
apprg ONLINE perth
OFFLINE sydney
With both resource groups online, the source (dependent) apprg resource group can be
brought offline and then online again. Alternatively, it can be gracefully moved to another
node without any influence on the target dbrg resource group. With both resource groups
online, the source (dependent) apprg resource group can be brought offline and then online
again. Alternatively, it can be gracefully moved to another node without any influence on the
target dbrg resource group. Target resource group can also be brought offline. However, to
bring the source resource group online, the target resource group must be brought online
manually (if it is offline).
If you start the cluster only on the home node of the source resource group, the apprg
resource group in this case, the cluster waits until the dbrg resource group is brought online
as shown in Example 9-30. The startup policy is Online On Home Node Only for both resource
groups.
Example 9-30 Offline because the target is offline from clRGinfo

sydney:/ # clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
dbrg OFFLINE sydney
OFFLINE perth
apprg OFFLINE due to target offlin perth
OFFLINE sydney
9.8.2 Testing application startup with Startup Monitoring configured

For this test, both resource groups are started on the same node. This way their application
scripts logs messages in the same file so that you can see the detailed sequence of their start
and finish moments. The home node is temporarily modified to sydney for both resource
groups. Then the cluster is started only on the sydney node with both resource groups.

Example 9-31 shows the start, stop, and monitoring scripts. Note the syslog configuration that
was made to log the messages through the local7 facility in the
/var/hacmp/log/StartAfter_cluster.log file.
Example 9-31 Dummy start, stop, and monitoring scripts

sydney:/HA71 # ls
app_mon.sh app_stop.sh db_start.sh
app_start.sh db_mon.sh db_stop.sh
sydney:/HA71 # cat db_start.sh

#!/bin/ksh
fp="local7.info"
file="èxpr "//$0" : '.*/$[^/]*$'`"
# cleanup
if [ -f /dbmp/db.lck ]; then rm /dbmp/db.lck; fi
logger -t"$file" -p$fp "Starting up DB... "

sleep 50
echo "DB started at:\n\t`date`">/dbmp/db.lck
logger -t"$file" -p$fp "DB is running!"
exit 0
sydney:/HA71 # cat db_stop.sh

#!/bin/ksh
fp="local7.info"
file="èxpr "//$0" : '.*/$[^/]*$'`"
logger -t"$file" -p$fp "Shutting down DB... "

sleep 20
# cleanup
if [ -f /dbmp/db.lck ]; then rm /dbmp/db.lck; fi
logger -t"$file" -p$fp "DB stopped!"
exit 0
sydney:/HA71 # cat db_mon.sh

#!/bin/ksh
fp="local7.info"
file="èxpr "//$0" : '.*/$[^/]*$'`"
if [ -f /dbmp/db.lck ]; then
logger -t"$file" -p$fp "DB is running!"
exit 0
fi
logger -t"$file" -p$fp "DB is NOT running!"
exit 1
sydney:/HA71 # cat app_start.sh

#!/bin/ksh
fp="local7.info"
file="èxpr "//$0" : '.*/$[^/]*$'`"
# cleanup
if [ -f /appmp/app.lck ]; then rm /appmp/app.lck; fi
logger -t"$file" -p$fp "Starting up APP... "

sleep 10

echo "APP started at:\n\t`date`">/appmp/app.lck
logger -t"$file" -p$fp "APP is running!"
exit 0
sydney:/HA71 # cat app_stop.sh

#!/bin/ksh
fp="local7.info"
file="èxpr "//$0" : '.*/$[^/]*$'`"
logger -t"$file" -p$fp "Shutting down APP... "

sleep 2
# cleanup
if [ -f /appmp/app.lck ]; then rm /appmp/app.lck; fi
logger -t"$file" -p$fp "APP stopped!"
exit 0
sydney:/HA71 # cat app_mon.sh

#!/bin/ksh
fp="local7.info"
file="èxpr "//$0" : '.*/$[^/]*$'`"
if [ -f /appmp/app.lck ]; then
logger -t"$file" -p$fp "APP is running!"
exit 0
fi
logger -t"$file" -p$fp "APP is NOT running!"
exit 1
sydney:/HA71 # grep local7 /etc/syslog.conf

local7.info /var/hacmp/log/StartAfter_cluster.log rotate size 256k files 4
Without Startup Monitoring, the APP startup script is launched before the DB startup script
returns as shown Example 9-32.
Example 9-32 Startup sequence without Startup monitoring mode

...
Oct 12 07:53:26 sydney local7:info db_mon.sh: DB is NOT running!
Oct 12 07:53:27 sydney local7:info db_start.sh: Starting up DB...
Oct 12 07:53:36 sydney local7:info app_mon.sh: APP is NOT running!
Oct 12 07:53:37 sydney local7:info app_start.sh: Starting up APP...
Oct 12 07:53:47 sydney local7:info app_start.sh: APP is running!
Oct 12 07:53:53 sydney local7:info app_mon.sh: APP is running!
Oct 12 07:54:17 sydney local7:info db_start.sh: DB is running!
Oct 12 07:54:23 sydney local7:info app_mon.sh: APP is running!
...

With Startup Monitoring, the APP startup script is launched after the DB startup script
returns, as shown in Example 9-33, and as expected.
Example 9-33 Startup sequence with Startup Monitoring

...
Oct 12 08:02:39 sydney local7:info db_start.sh: Starting up DB...
Oct 12 08:03:29 sydney local7:info db_start.sh: DB is running!
Oct 12 08:03:33 sydney local7:info db_mon.sh: DB is running!
Oct 12 08:03:49 sydney local7:info app_mon.sh: APP is NOT running!
Oct 12 08:03:50 sydney local7:info app_start.sh: Starting up APP...
Oct 12 08:04:00 sydney local7:info app_start.sh: APP is running!
...
Example 9-34 shows the state change of the resource groups during this startup.
Example 9-34 Resource group state during startup

sydney:/ # clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
dbrg OFFLINE sydney
OFFLINE perth
apprg OFFLINE sydney
OFFLINE perth
sydney:/ # clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
dbrg ACQUIRING sydney
OFFLINE perth
apprg TEMPORARY ERROR sydney
OFFLINE perth
sydney:/ # clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
dbrg ONLINE sydney
OFFLINE perth
apprg ACQUIRING sydney

OFFLINE perth

sydney:/ # clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
dbrg ONLINE sydney
OFFLINE perth
apprg ONLINE sydney
OFFLINE perth
9.9 Testing dynamic node priority

This test has the algeria, brazil, and usa nodes, and one resource group in the cluster as
shown in Figure 9-17. This resource group is configured to fail over based on a script return
value. The DNP.sh script returns different values for each node. For details about configuring
the dynamic node priority (DNP), see 5.1.8, “Configuring the dynamic node priority (adaptive
failover)” on page 102.
Figure 9-17 Dynamic node priority test environment
Table 9-1 provides the cluster details.
Table 9-1 Cluster details

Field Value
Resource name algeria_rg
Participating nodes algeria, brazil, usa
Dynamic node priority policy cl_lowest_nonzero_udscript_rc

Field Value
DNP script path /usr/IBM/HTTPServer/bin/DNP.sh
DNP script timeout value 20
The default node priority is algeria first, then brazil, and then usa. The usa node gets the
lowest return value from DNP.sh. When a resource group failover is triggered, the algeria_rg
resource group is moved to the usa node, because the return value is the lowest one as
Example 9-35 Expected return value for each nodes

usa:/ # clcmd cat /usr/IBM/HTTPServer/bin/DNP.sh
-------------------------------
NODE usa
-------------------------------
exit 100
-------------------------------
NODE brazil
-------------------------------
exit 105
-------------------------------
NODE algeria
-------------------------------
exit 103
When the resource group fails over, algeria_rg moves from the algeria node to the usa
node, which has the lowest return value in DNP.sh as shown in Figure 9-18.
# clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
algeria_rg ONLINE algeria
OFFLINE brazil
OFFLINE usa
# clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
algeria_rg OFFLINE algeria
OFFLINE brazil
ONLINE usa
Figure 9-18 clRGinfo of before and after takeover

Then the DNP.sh script is modified to set brazil with the lowest return value as shown in
Example 9-36.
Example 9-36 Changing the DNP.sh file

usa:/ # clcmd cat /usr/IBM/HTTPServer/bin/DNP.sh
-------------------------------
NODE usa
-------------------------------
exit 100
-------------------------------
NODE brazil
-------------------------------
exit 101
-------------------------------
NODE algeria
-------------------------------
exit 103
Upon resource group failover, the resource group moves to brazil as long as it has the
lowest return value among the cluster nodes this time as shown in Figure 9-19.
# clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
OFFLINE brazil
ONLINE usa
# clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
ONLINE brazil
OFFLINE usa
Figure 9-19 Resource group moving
To simplify the test scenario, DNP.sh is defined to simply return a value. In a real situation, you
can replace this DNP.sh sample file with any customized script. Then, node failover is done
based upon the return value of your own script.

10
Chapter 10. Troubleshooting PowerHA 7.1

This chapter shares the experiences of the writers of this IBM Redbooks publication and the
lessons learned in all the phases of implementing PowerHA 7.1 to help you troubleshoot your
migration, installation, configuration, and Cluster Aware AIX (CAA).

Locating the log files
Troubleshooting the migration
Troubleshooting the installation and configuration
Troubleshooting problems with CAA

10.1 Locating the log files
This section explains where you can find the various log files in your PowerHA cluster to
assist in managing problems with CAA and PowerHA.
10.1.1 CAA log files

You can check the CAA clutils log file and the syslog file for error messages as explained in
the following sections.
The clutils file

If you experience a problem with an operation, such as creating a cluster in CAA, check the
/var/hacmp/log/clutils.log log file.
The syslog facility

The CAA service uses the syslog facility to log errors and debugging information. All CAA
messages are written to the /var/adm/ras/syslog.caa file.
For verbose logging information, you must enable debug mode by editing the
/etc/syslog.conf configuration file and adding the following line as shown in Figure 10-1:
*.debug /tmp/syslog.out rotate size 10m files 10
local0.crit /dev/console
local0.info /var/hacmp/adm/cluster.log
user.notice /var/hacmp/adm/cluster.log
daemon.notice /var/hacmp/adm/cluster.log
*.info /var/adm/ras/syslog.caa rotate size 1m files 10
*.debug /tmp/syslog.out rotate size 10m files 10
Figure 10-1 Extract from the /etc/syslog.conf file
After you make this change, verify that a syslog.out file is in the in /tmp directory. If this file is
not in the directory, create one by entering the touch /tmp/syslog.out command. After you
create the file, refresh the syslog daemon by issuing the refresh -s syslogd command.
When debug mode is enabled, you capture detailed debugging information in the
/tmp/syslog.out file. This information can assist you in troubleshooting problems with
commands, such as the mkcluster command during cluster migration.
10.1.2 PowerHA log files

The following PowerHA log files are most commonly used:
/var/hacmp/adm/cluster.log
One of the main sources of information for the administrator. This file
tracks time-stamped messages of all PowerHA events, scripts, and
daemons.
/var/hacmp/log/hacmp.out
Along with cluster.log file, this file is the most important source of
information. Recent PowerHA releases are sending more details to
this log file, including summaries of events and the location of
resource groups.

/var/log/clcomd/clcomd.log
Includes information about communication that is exchanged among
all the cluster nodes.
Increasing the verbose logging level

You can increase the verbose logging level in PowerHA by enabling the export
VERBOSE_LOGGING=high setting. This setting enables a high level of logging for PowerHA. The
result is that you see more information in the log files when this variable is exported in such
logs as the hacmp.out and clmigcheck.log files.
Listing the PowerHA log files by using the clmgr utility

One of the common ways to have a list of all PowerHA log files is to use the clmgr
command-line utility. First you run the clmgr view log command to access a list of the
available logs as shown in Example 10-1. Then you run the clmgr view log logname
command replacing logname with the log that you want to analyze.
Example 10-1 Generating a list of PowerHA log files with the clmgr utility
seoul:/ # clmgr view log
ERROR: """" does not appear to exist!
Available Logs:
autoverify.log
cl_testtool.log
clavan.log
clcomd.log
clcomddiag.log
clconfigassist.log
clinfo.log
clstrmgr.debug
clstrmgr.debug.long
cluster.log
cluster.mmddyyyy
clutils.log
clverify.log
cspoc.log
cspoc.log.long
cspoc.log.remote
dhcpsa.log
dnssa.log
domino_server.log
emuhacmp.out
hacmp.out
ihssa.log
migration.log
sa.log
sax.log
seoul:/ # clmgr view log cspoc.log | more

Warning: no options were provided for log "cspoc.log".
Defaulting to the last 500 lines.
09/21/10 10:23:09 seoul: success: clresactive -v datavg
09/21/10 10:23:10 seoul: success: /usr/es/sbin/cluster/cspoc/clshowfs2 datavg
09/21/10 10:23:29 [========== C_SPOC COMMAND LINE ==========]
Chapter 10. Troubleshooting PowerHA 7.1 307

09/21/10 10:23:29 /usr/es/sbin/cluster/sbin/cl_chfs -cspoc -nseoul,busan -FM -a
size=+896 -A no /database/logdir
09/21/10 10:23:29 busan: success: clresactive -v datavg
09/21/10 10:23:30 seoul: success: eval LC_ALL=C lspv
09/21/10 10:23:35 seoul: success: chfs -A no -a size="+1835008" /database/logdir
09/21/10 10:23:36 seoul: success: odmget -q 'attribute = label and value =
/database/logdir' CuAt
09/21/10 10:23:37 busan: success: eval varyonvg -n -c -A datavg ;
imfs -lx lvdata09
; imfs -l lvdata09;
varyonvg -n -c -P datavg.
10.2 Troubleshooting the migration

This section offers a collection of problems and solutions that you might encounter when
migration testing. The information is based on the experience of the writers of this Redbooks
publication.
10.2.1 The clmigcheck script

The clmigcheck script writes all activity to the /tmp/clmigcheck.log file (Figure 10-2).
Therefore, you must first look in this file for an error message if you run into any problems with
the clmigcheck utility.
mk_cluster: ERROR: Problems encountered creating the cluster in AIX.

Use the syslog facility to see output from the mkcluster command.
Error termination on: Wed Sep 22 15:47:43 EDT 2010

Figure 10-2 Output from the clmigcheck.log file
10.2.2 The ‘Cluster still stuck in migration’ condition

When migration is completed, you might not progress to the update of the Object Data
Manager (ODM) entries until the node_up event is run on the last node of the cluster. If you
have this problem, start the node to see if this action completes the migration protocol and
updates the version numbers correctly. For PowerHA 7.1, the version number must be 12 in
the HACMPcluster class. You can verify this number by running odmget as shown in example
7-51. If the version number is less than 12, you are still stuck in migration and must call IBM
support.
10.2.3 Existing non-IP networks

The following section provides details about problems with existing non-IP networks that are
not removed. It describes a possible workaround to remove disk heartbeat networks if they
were not deleted as part of the migration process.

After the migration, the output of the cltopinfo command might still show the disk heartbeat
network as shown in Example 10-2.
Example 10-2 The cltopinfo command with the disk heartbeat still being displayed
berlin:/ # cltopinfo
Cluster IP Address:
NODE berlin:
Network net_diskhb_01
berlin_hdisk1_01 /dev/hdisk1
berlin 192.168.101.141
alleman 10.168.101.142
german 10.168.101.141
berlinb1 192.168.200.141
berlinb2 192.168.220.141
NODE munich:
Network net_diskhb_01
munich_hdisk1_01 /dev/hdisk1
munich 192.168.101.142
alleman 10.168.101.142
german 10.168.101.141
munichb1 192.168.200.142
munichb2 192.168.220.142
Resource Group http_rg

Fallover Policy Fallover To Next Priority Node In The List
Participating Nodes munich berlin
Service IP Label alleman
Resource Group nfs_rg

Fallback Policy Fallback To Higher Priority Node In The List
Participating Nodes berlin munich
Service IP Label german

To remove the disk heartbeat network, follow these steps:
1. Stop PowerHA on all cluster nodes. You must perform this action because the removal
does not work in a running cluster. Figure 10-3 shows the error message that is received
when trying to remove the network in an active cluster.
COMMAND STATUS
cldare: Migration from PowerHA SystemMirror to PowerHA SystemMirror/ES

detected.
A DARE event cannot be run until the migration has completed.
F1=Help F2=Refresh F3=Cancel F6=Command

F8=Image F9=Shell F10=Exit /=Find
n=Find Next
Figure 10-3 Cluster synchronization error message

2. Remove the network:
a. Follow the path smitty sysmirror  Cluster Nodes and Networks  Manage
Networks and Network Interfaces  Networks  Remove a Network.
b. On the SMIT panel, similar to the one shown in Figure 10-4, select the disk heartbeat
network that you want to remove.
You might have to repeat these steps if you have more than one disk heartbeat network.
Networks
Add a Network
Change/Show a Network
Remove a Network
+--------------------------------------------------------------------------+
| Select a Network to Remove |
| |
| |
| net_diskhb_01 |
| net_ether_01 (192.168.100.0/22) |
| net_ether_010 (10.168.101.0/24 192.168.200.0/24 192.168.220.0/24) |
| |
F9+--------------------------------------------------------------------------+
Figure 10-4 Removing the disk heartbeat network
3. Synchronize your cluster by selecting the path: smitty sysmirror  Custom Cluster
Configuration  Verify and Synchronize Cluster Configuration (Advanced).
4. See if the network is deleted by using the cltopinfo command as shown in Example 10-3.
Example 10-3 Output of the cltopinfo command after removing the disk heartbeat network
berlin:/ # cltopinfo
Cluster IP Address:
NODE berlin:
berlin 192.168.101.141
german 10.168.101.141
alleman 10.168.101.142
berlinb1 192.168.200.141

berlinb2 192.168.220.141
NODE munich:
munich 192.168.101.142
german 10.168.101.141
alleman 10.168.101.142
munichb1 192.168.200.142
munichb2 192.168.220.142
Resource Group http_rg

Participating Nodes munich berlin
Service IP Label alleman
Resource Group nfs_rg

Fallback Policy Fallback To Higher Priority Node In The List
Participating Nodes berlin munich
Service IP Label german
berlin:/ #
5. Start PowerHA on all your cluster nodes by running the smitty cl_start command.
10.3 Troubleshooting the installation and configuration

This section explains how you can recover from various installation and configuration
problems on CAA and PowerHA.
10.3.1 The clstat and cldump utilities and the SNMP

After installing and configuring PowerHA 7.1 in AIX 7.1, the clstat and cldump utilities do not
work. If you experience this problem, convert the SNMP from version 3 to version 1. See
Example 10-4 for all the steps to correct this problem.
Example 10-4 The clstat utility not working under SNMP V3
seoul:/ # clstat -a
Failed retrieving cluster information.
There are a number of possible causes:
clinfoES or snmpd subsystems are not active.
snmp is unresponsive.
snmp is not configured correctly.
Cluster services are not active on any nodes.
Refer to the HACMP Administration Guide for more information.
seoul:/ # stopsrc -s snmpd

0513-044 The snmpd Subsystem was requested to stop.

snmpdv3ne
seoul:/ # /usr/sbin/snmpv3_ssw -1
Stop daemon: snmpmibd
In /etc/rc.tcpip file, comment out the line that contains: snmpmibd
In /etc/rc.tcpip file, remove the comment from the line that contains: dpid2
Make the symbolic link from /usr/sbin/snmpd to /usr/sbin/snmpdv1
Make the symbolic link from /usr/sbin/clsnmp to /usr/sbin/clsnmpne
Start daemon: dpid2

/usr/sbin/snmpdv1
seoul:/ # startsrc -s snmpd

0513-059 The snmpd Subsystem has been started. Subsystem PID is 8126570.
10.3.2 The /var/log/clcomd/clcomd.log file and the security keys

You might find that you cannot start the clcomd daemon and its log file has messages
indicating problems with the security keys as shown in Example 10-5.
Example 10-5 The clcomd daemon indicating problems with the security keys
2010-09-23T00:02:07.983104: WARNING: Cannot read the key
/etc/security/cluster/key_md5_des
/etc/security/cluster/key_md5_3des
/etc/security/cluster/key_md5_aes
This problem means that the /etc/cluster/rhosts file is not completed correctly. On all
cluster nodes, edit this file by using the IP addresses as the communication paths during
cluster definition, before the first synchronization. Use the host name as the persistent
address and the communication path. Then add the persistent addresses to the
/etc/cluster/rhosts file. Finally, issue the startsrc -s clcomd command.
10.3.3 The ECM volume group

When creating an ECM volume group by using the PowerHA C-SPOC menus, the
administrator receives the message shown in Example 10-6 about the inability to create the
group.
Example 10-6 Error messages when trying to create an ECM volume group using C-SPOC
seoul: 0516-1335 mkvg: This system does not support enhanced concurrent capable
seoul: volume groups.
seoul: 0516-862 mkvg: Unable to create volume group.
seoul: cl_rsh had exit code = 1, see cspoc.log and/or clcomd.log for more
information
cl_mkvg: An error occurred executing mkvg appvg on node seoul
In /var/hacmp/log/cspoc.log, the messages are:

09/14/10 17:41:40 [========== C_SPOC COMMAND LINE ==========]
09/14/10 17:41:40 /usr/es/sbin/cluster/sbin/cl_mkvg -f -n -B -cspoc -nseoul,busan
-rdatarg -y datavg -s32 -V100 -lfalse -
E 00c0f6a0107734ea 00c0f6a010773532 00c0f6a0fed38de6 00c0f6a0fed3d324
00c0f6a0fed3ef8f
09/14/10 17:41:40 busan: success: clresactive -v datavg
09/14/10 17:41:41 cl_mkvg: cl_mkvg: An error occurred executing mkvg datavg on
node seoul
09/14/10 17:41:41 seoul: FAILED: mkvg -f -n -B -y datavg -s 32 -V 100 -C cldisk4
cldisk3 cldisk1 cldisk2 cldisk5
09/14/10 17:41:41 seoul: 0516-1335 mkvg: This system does not support enhanced
concurrent capable
09/14/10 17:41:41 seoul: volume groups.
09/14/10 17:41:41 seoul: 0516-862 mkvg: Unable to create volume group.
09/14/10 17:41:41 seoul: RETURN_CODE=1
09/14/10 17:41:41 seoul: cl_rsh had exit code = 1, see cspoc.log and/or clcomd.log
for more information
09/14/10 17:41:42 seoul: success: cl_vg_fence_init datavg rw cldisk4 cldisk3
cldisk1 cldisk2 cldisk5
In this case, install the bos.clvm.enh file set and any fixes for this file set for the system to stay
in a consistent version state.
10.3.4 Communication path

If your cluster node communication path is misconfigured, you might see an error message
similar to the one shown in Figure 10-5.
ERROR: Communications Path for node brazil must be set to hostname
ERROR:
Figure 10-5 clmigcheck error for communication path

If you see an error for communication path while running the clmigcheck program, verify that
the /etc/hosts file includes the communication path for the cluster. Also check the
communication path in the HACMPnode ODM class as shown in Figure 10-6.
algeria:/ # odmget HACMPnode | grep -p COMMUNICATION

HACMPnode:
name = "algeria"
object = "COMMUNICATION_PATH"
value = "algeria"
node_id = 1
node_handle = 1
version = 12
HACMPnode:
name = "brazil"
object = "COMMUNICATION_PATH"
value = "brazil"
node_id = 3
node_handle = 3
version = 12
Figure 10-6 Communication path definition at HACMPnode.odm
Because the clmigcheck program is a ksh script, certain profiles can cause a similar problem. If
the problem persists after you correct the /etc/hosts configuration file, try to remove the
contents of the kshrc file because it might be affecting the behavior of the clmigcheck program.
If your /etc/cluster/rhosts program is not configured properly, you see an error message
similar to the one shown in Figure 10-7. The /etc/cluster/rhosts file must contain the fully
qualified domain name of each node in the cluster (that is, the output from the host name
command). After changing the /etc/cluster/rhosts file, run the stopsrc and startsrc
commands on the clcomd subsystem.
brazil:/ # clmigcheck
lslpp: Fileset hageo* not installed.
rshexec: cannot connect to node algeria
ERROR: Internode communication failed,
check the clcomd.log file for more information.
brazil:/ # clrsh algeria date

connect: Connection refused
rshexec: cannot open socket
Figure 10-7 The clcomd error message
You can also check clcomd communication by using the clrsh command as shown in
Figure 10-8.
algeria:/ # clrsh algeria date

Mon Sep 27 11:14:12 EDT 2010
algeria:/ # clrsh brazil date
Mon Sep 27 11:14:15 EDT 2010
Figure 10-8 Checking the clcomd connection

10.4 Troubleshooting problems with CAA
In this chapter, we discuss various problems that you could encounter on configuration or
installation of CAA, and provide recovery steps.
10.4.1 Previously used repository disk for CAA

When defining a PowerHA cluster, you must define a disk to use as the repository for the
CAA. If the specified disk was used previously as a repository by another cluster, upon
synchronizing the cluster, you receive a message in the /var/adm/ras/syslog.caa file (or
another file defined in /etc/syslog.conf). Example 10-7 shows the message that you
receive.
Example 10-7 CAA error message in the /var/adm/ras/syslog.caa file

Sep 16 08:58:14 seoul user:err|error syslog: validate_device: Specified device,
hdisk1, is a repository.
Sep 16 08:58:14 seoul user:warn|warning syslog: To force cleanup of this disk, use
rmcluster -r hdisk1
Example 10-8 shows the exact error message saved in the smit.log file.
Example 10-8 CAA errors in the smit.log file

ERROR: Problems encountered creating the cluster in AIX. Use the syslog facility
to see output from the mkcluster command.
ERROR: Creating the cluster in AIX failed. Check output for errors in local
cluster configuration, correct them, and try synchronization
again.
The message includes the solution as shown in Example 10-7. You run the rmcluster
command as shown in Example 10-9 to remove all CAA structures from the specified disk.
Example 10-9 Removing CAA structures from a disk

seoul:/ # rmcluster -r hdisk1
This operation will scrub hdisk1, removing any volume groups and clearing cluster
identifiers.
If another cluster is using this disk, that cluster will be destroyed.
Are you sure? (y/[n]) y
remove_cluster_repository: Couldn't get cluster repos lock.
remove_cluster_repository: Force continue.
After you issue the rmcluster command, the administrator can synchronize the cluster again.
Tip: After running the rmcluster command, verify that the caa_private0 disk has been
unconfigured and is not seen on other nodes. Run the lqueryvg -Atp command against
the repository disk to ensure that the volume group definition is removed from the disk. If
you encounter problems with the rmcluster command, see “Removal of the volume group
when the rmcluster command does not” on page 320 for information about how to
manually remove the volume group.

10.4.2 Repository disk replacement
The information to replace a repository disk is currently only available in the
/usr/es/sbin/cluster/README7.1.0.UPDATE file. However, the following information has been
provided to assist you in solving this problem:
1. If necessary, add a new disk and ensure that it is recognized by AIX. The maximum size
required is 10 GB. The disk must be zoned and masked to all cluster nodes.
2. Identify the current repository disk. You can use any of the following commands to obtain
this information:
lspv | grep caa_private
cltopinfo
lscluster -d
3. Stop cluster services on all nodes. Either bring resource groups offline or place them in an
unmanaged state.
4. Remove the CAA cluster by using the following command:
rmcluster -fn clustername
5. Verify that the AIX cluster is removed by running the following command in each node:
lscluster -m
6. If the CAA cluster is still present, run the following command in each node:
clusterconf -fu
7. Verify that the cluster repository is removed by using the lspv command. The repository
disk (see step 2) must not belong to any volume group.
8. Define a new repository disk by following the path: smitty sysmirror  Cluster Nodes
and Networks  Initial Cluster Setup (Typical)  Define Repository Disk and
Cluster IP Address.
9. Verify and synchronize the PowerHA cluster:
#smitty cm_ver_and_sync
10.Verify that the AIX cluster is recreated by running the following command:
#lscluster -m
11.Verify that the repository disk has changed by running the following command:
lspv | grep caa_private
12.Start cluster services on all nodes:
smitty cl_start
10.4.3 CAA cluster after the node restarts

In some cases, the CAA cluster disappears after a system reboot or halt. If you encounter this
situation, try the following solutions:
Wait 10 minutes. If you have another node in your cluster, the clconfd daemon checks for
nodes that need to join or sync up. It wakes up every 10 minutes.
If the previous method does not work, run the clusterconf command manually. This
solution works only if the system is aware of the repository disk location. You can check it
by running the lsattr -El cluster0 command.

See if clvdisk contains the repository disk UUID. Otherwise, you see the clusterconf error
message as shown in Example 10-10.
Example 10-10 The clusterconf error message

riyad:/ # clusterconf -v
_find_and_load_repos(): No repository candidate found.
leave_sinc: Could not get cluster disk names from cache file
/etc/cluster/clrepos_cache: No such file or directory
leave_sinc: Could not find cluster disk names.
Manually define the repository disk by using the following command:

clusterconf -vr caa_private0
If you know that the repository disk is available, and you know that your node is listed in
the configuration on the repository disk, use the -s flag on the clusterconf command to
do a search for it. This utility examines all locally visible hard disk drives to find the
repository disk.
10.4.4 Creation of the CAA cluster

You might encounter an error message about creating the CAA cluster when the clmigcheck
utility is run. You might also see such a message when trying to install PowerHA for the first
time or when creating a CAA cluster configuration. Depending on whether you are doing a
migration or a new configuration, you either see a problem in the clmigcheck.log file or on the
verification of your cluster.
One of the error messages that you see is “ERROR: Problems encountered creating the
cluster in AIX.” This message indicates a problem with creating the CAA cluster. The
clmigcheck program calls the mkcluster command to create the CAA cluster, which is what
you must look for in the logs.
To proceed with the troubleshooting, enable the syslog debugging as discussed in 10.2.1,
“The clmigcheck script” on page 308.
Incorrect entries in the /etc/filesystems file

When the CAA cluster is created, the cluster creates a caavg_private volume group and the
associated file systems for CAA. This information is kept in the /var/adm/ras/syslog.caa log
file. Any problems that you face when running the mkcluster command are also logged in the
/var/hacmp/clutils.log file.
If you encounter a problem when creating your cluster, check these log files to ensure that the
volume group and file systems are created without any errors.

Figure 10-9 shows the contents of caavg_private volume group.
# lsvg -l caavg_private
caavg_private:
fslv00 jfs2 4 4 1 open/syncd /clrepos_private1
fslv01 jfs2 4 4 1 closed/syncd /clrepos_private2
Figure 10-9 Contents of the caavg_private volume group
Figure 10-10 shows a crfs failure while creating the CAA cluster. This problem was corrected
by removing incorrect entries in the /etc/filesystems file. Likewise, problems can happen
when you already have the same logical volume name that must be used by the CAA cluster,
for example.
Sep 29 15:50:49 riyad user:info cluster[9437258]: stdout: caalv_private3

Sep 29 15:50:49 riyad user:info cluster[9437258]: stderr:
Sep 29 15:50:49 riyad user:info cluster[9437258]: cl_run_log_method:
'/usr/lib/cluster/clreposfs ' returned 1
Sep 29 15:50:49 riyad user:info cluster[9437258]: stdout:
Sep 29 15:50:49 riyad user:info cluster[9437258]: stderr: crfs:
/clrepos_private1 file system already exists '/usr/sbin/crfs -v jfs2 -m
/clrepos_private1 -g caavg_private -a options=dio -a logname=INLINE -a
size=256M' failed with rc=1
Sep 29 15:50:49 riyad user:err|error cluster[9437258]: cluster_repository_init:
create_clv failed
Sep 29 15:50:49 riyad user:info cluster[9437258]: cl_run_log_method:
'/usr/sbin/varyonvg -b -u caavg_private' returned 0
Figure 10-10 The syslog.caa entries after a failure during CAA creation
Tip: When you look at the syslog.caa file, focus on the AIX commands (such as mkvg, mklv,
and crfs) and their returned values. If you find non-zero return values, a problem exists.

10.4.5 Volume group name already in use
A volume group that is already in use can cause the error message discussed in 10.4.4,
“Creation of the CAA cluster” on page 318. When you encounter the error message, enable
syslog debugging. The /tmp/syslog.out log file has the entries shown in Figure 10-11.
Sep 23 11:46:09 chile user:info cluster[21037156]: cl_run_log_method:

'/usr/sbin/mkvg -f -y caavg_private -s 64 caa_private0' returned 1
Sep 23 11:46:09 chile user:info cluster[21037156]: stdout:
Sep 23 11:46:09 chile user:info cluster[21037156]: stderr: 0516-360
/usr/sbin/mkvg: The device name is already used; choose a different name.
Sep 23 11:46:09 chile user:err|error cluster[21037156]:
cluster_repository_init: create_cvg failed
Figure 10-11 Extract from the syslog.out file
You can see that the volume group creation failed because the name is already in use. This
problem can happen for several reasons. Ffor example, it can occur if the disk was previously
used as the CAA repository or the disk had the volume group descriptor area (VGDA)
information of other volume group in it.
Disk previously used by CAA volume group or third party

If the disk was previously used by CAA or AIX, you can recover from this situation by running
the following command:
rmcluster -r hdiskx
For the full sequence of steps, see 10.4.1, “Previously used repository disk for CAA” on
page 316.
If you find that the rmcluster command has not removed your CAA definition from the disk,
use the steps in the following section, “Removal of the volume group when the rmcluster
command does not.”
Removal of the volume group when the rmcluster command does not
In this situation, you must use the Logical Volume Manager (LVM) commands, which you can
do in one of two ways. The easiest method is to import the volume group, vary on the volume
group, and then reduce it so that the VGDA is removed from the disk. If this method does not
work, use the dd command to overwrite special areas of the disk.
Tip: Make sure that the data contained on the disk is not needed because usage of the
following steps destroys the volume group data on the disk.
Removing the VGDA from the disk

This method involves importing the volume group from the disk and reducing it from the
volume group to remove the VGDA information without losing the PVID. If you are able to
import the volume group, activate it by using the varyonvg command:
# varyonvg vgname
If the activation fails, run the exportvg command to remove the volume group definition from
the ODM. Then try to import it with a different name as follows:
# exportvg vgname
# importvg -y new-vgname hdiskx

If you cannot activate the imported volume group, use the reducevg command as shown in
Example 10-12.
reducevg -df test_vg caa_private0

0516-1246 rmlv: If caalv_private1 is the boot logical volume, please run 'chpv
-c <diskname>'
as root user to clear the boot record and avoid a potential boot
off an old boot image that may reside on the disk from which this
logical volume is moved/removed.
rmlv: Logical volume caalv_private1 is removed.
0516-1246 rmlv: If caalv_private2 is the boot logical volume, please run 'chpv
-c <diskname>'
as root user to clear the boot record and avoid a potential boot
off an old boot image that may reside on the disk from which this
logical volume is moved/removed.
rmlv: Logical volume caalv_private2 is removed.
Figure 10-12 The reducevg command
After you complete the forced reduction, check whether the disk no longer contains a volume
group by using the lqueryvg -Atp hdisk command.
Also verify whether any previous volume group definition is still being displayed on the other
nodes of your cluster by using the lspv command. If the lspv output shows the PVID with one
associated volume group, you can fix it by running the exportvg vgname command.
If experience any problems with this procedure, try a force overwrite of the disk as described
in “Overwriting the disk.”
Overwriting the disk

This method involves writing data to the top of the disk to overwrite the VGDA information and
effectively cleaning the disk, leaving it ready for use by other volume groups.
Attention: Only attempt this method if the rmcluster and reducevg procedures fail and if
AIX still has access to the disk. You can check this access by running the lquerypv -h
/dev/hdisk command.
Enter the following command:

# dd if=/dev/zero of=/dev/hdiskx bs=4 count=1
This command zeros only the part of the disk that contains the repository offset. Therefore,
you do not lose the PVID information.
In some cases, this procedure is not sufficient to resolve the problem. If you need to
completely overwrite the disk, run the following procedure:
Attention: This procedure overwrites the entire disk structure including the PVID. You
must follow the steps as shown to change the PVID if required during migration.
# dd if=/dev/zero of=/dev/hdiskn bs=512 count=9

# chdev -l hdiskn -a pv=yes
# rmdev -dl hdiskn
# cfgmgr

On any other node in the cluster, you must also update the disk:
# rmdev -dl hdiskn
# cfgmgr
Run the lspv command to check that the PVID is the same on both nodes. To ensure that you
have the real PVID, query the disk as follows:
# lquerypv -h /dev/hdiskn
Look for the PVID, which is in sector 80 as shown in Figure 10-13.
chile:/ # lquerypv -h /dev/hdisk3

00000000 C9C2D4C1 00000000 00000000 00000000 |................|
00000010 00000000 00000000 00000000 00000000 |................|
00000020 00000000 00000000 00000000 00000000 |................|
00000030 00000000 00000000 00000000 00000000 |................|
00000040 00000000 00000000 00000000 00000000 |................|
00000050 00000000 00000000 00000000 00000000 |................|
00000060 00000000 00000000 00000000 00000000 |................|
00000070 00000000 00000000 00000000 00000000 |................|
00000080 000FE401 68921CEA 00000000 00000000 |....h...........|
Figure 10-13 PVID from the lquerypv command
The PVID should match the lspv output as shown in Figure 10-14.
chile:/ # lspv
hdisk1 000fe4114cf8d1ce None
hdisk2 000fe40163c54011 None
hdisk3 000fe40168921cea None
hdisk4 000fe4114cf8d3a1 None
hdisk6 000fe4114cf8d4d5 None
hdisk8 000fe4114cf8d608 ny_datavg
hdisk0 000fe40140a5516a rootvg active
Figure 10-14 The lspv output showing PVID
10.4.6 Changed PVID of the repository disk

Your repository disk PVID might have changed because of a dd on the whole disk or a change
in the logical unit number (LUN). If this change happened and you must complete the
migration, follow the guidance in this section to change it.

If you are in a migration that has not yet been completed, change the PVID section in the
/var/clmigcheck/clmigcheck.txt file (Figure 10-15). You must change this file on every node
in your cluster.
CLUSTER_REPOSITORY_DISK:000fe40120e16405
Figure 10-15 Changing the PVID in the clmigcheck.txt file
If this is post migration and PowerHA is installed, you must also modify the HACMPsircol ODM
class (Figure 10-16) on all nodes in the cluster.
HACMPsircol:
name = "newyork_sircol"
id = 0
uuid = "0"
repository = "000fe4114cf8d258"
ip_address = ""
nodelist = "serbia,scotland,chile,"
Figure 10-16 The HACMPsircol ODM class
To modify the HACMPsircol ODM class, enter the following commands:

# odmget HACMPsircol > HACMPsircol.add
# vi HACMPsircol.add
Change the repository = "000fe4114cf8d258" line to your new PVID as follows:

# odmdelete -o HACMPsircol
# odmadd HACMPsircol.add
Then save the file.
10.4.7 The ‘Cluster services are not active’ message

After migration of PowerHA, if you notice that CAA cluster services are not running, you see
the “Cluster services are not active” message when you run the lscluster command.
You also notice that the CAA repository disk is not varied on.
You might be able to recover by recreating the CAA cluster from the last CAA configuration
(HACMPsircol class in ODM) as explained in the following steps:
1. Clear the CAA repository disk as explained in “Previously used repository disk for CAA” on
page 316.
2. Perform a synchronization or verification of the cluster. Upon synchronizing the cluster, the
mkcluster command is run to recreate the CAA cluster. However, if the problem still
persists, contact IBM support.

11
Chapter 11. Installing IBM Systems Director

and the PowerHA SystemMirror
plug-in
This chapter explains how to install IBM Systems Director Version 6.2. It also explains how to
install the PowerHA SystemMirror plug-in for the IBM Systems Director, and the necessary
agents on the client machines to be managed by Systems Director. For detailed planning,
prerequisites, and instructions, see Implementing IBM Systems Director 6.1, SG24-7694.

Installing IBM Systems Director Version 6.2
Installing the SystemMirror plug-in
Installing the clients

11.1 Installing IBM Systems Director Version 6.2
Before you configure the cluster using the SystemMirror plug-in, you must install and
configure the IBM Systems Director. You can install the IBM Systems Director Server on AIX,
Linux, or Windows operating system. For quick reference, this section provides the installation
steps on AIX. See the information in the following topics in the IBM Systems Director
Information Center for details about installation on other operating systems:
The “IBM Systems Director V6.2.x” topic for general information
http://publib.boulder.ibm.com/infocenter/director/v6r2x/index.jsp?topic=/com.ib
m.director.main.helps.doc/fqm0_main.html
“Installing IBM Systems Director on the management server” topic for installation
information
m.director.install.helps.doc/fqm0_t_installing.html
The following section, “Hardware requirements”, explains the installation requirements of IBM
Systems Director v6.2 on AIX.
11.1.1 Hardware requirements

See the “Hardware requirements for running IBM Systems Director Server” topic in the IBM
Systems Director Information Center for details about the recommended hardware
requirements for installing IBM Systems Director:
http://publib.boulder.ibm.com/infocenter/director/v6r2x/topic/com.ibm.director.pla
n.helps.doc/fqm0_r_hardware_requirements_for_running_ibm_systems_director_server.h
tml
Table 11-1 lists the hardware requirements for IBM Systems Director Server running on AIX
for a small configuration that has less than 500 managed systems.
Table 11-1 Hardware requirements for IBM Systems Director Server on AIX
Resource Requirement
CPU Two processors, IBM POWER5™, POWER6 or POWER7™, or for

partitioned systems:
Entitlement = 1
Uncapped Virtual processors = 4
Weight = Default
Memory 3 GB
Disk storage 4 GB
File system requirement root = 1.2 GB

(during installation) /tmp = 2 GB
/opt = 4 GB

More information: Disk storage requirements for running the IBM Systems Director
Server are used by the /opt file system. Therefore, a total of 4 GB is required for the /opt
file system while installing IBM Systems Director and during run time.
For more details about hardware requirements, see the “Recommended hardware
requirements for IBM Systems Director Server running on AIX” topic in the IBM Systems
Director Information Center at:
m.director.plan.helps.doc/fqm0_r_hardware_requirements_servers_running_aix.html
11.1.2 Installing IBM Systems Director on AIX

For the prerequisites and complete steps for installing IBM Systems Director, see the
following topics in the IBM Systems Director Information Center:
“Preparing to install IBM Systems Director Server on AIX”
http://publib.boulder.ibm.com/infocenter/director/v6r2x/index.jsp?topic=/
com.ibm.director.install.helps.doc/fqm0_t_preparing_to_install_ibm_director_on_
aix.html
“Installing IBM Systems Director Server on AIX,” which provides the complete installation
steps
m.director.install.helps.doc/fqm0_t_installing_ibm_director_server_on_aix.html
The following steps summarize the process for installing IBM Systems Director on AIX:
1. Increase the file size limit:
ulimit -f 4194302 (or to unlimited)
2. Increase the number of file descriptors:
ulimit -n 4000
3. Verify the file system (/, /tmp and /opt) size as mentioned in Table 11-1 on page 326:
df -g / /tmp /opt
4. Download IBM Systems Director from the IBM Systems Director Downloads page at:
http://www.ibm.com/systems/management/director/downloads/
5. Extract the content:
gzip -cd <package_name> | tar -xvf -
where <package_name> is the file name of the download package.
6. Install the content by using the script in the extracted package:
./dirinstall.server
Chapter 11. Installing IBM Systems Director and the PowerHA SystemMirror plug-in 327
11.1.3 Configuring and activating IBM Systems Director
To configure and activate IBM Systems Director, follow these steps:
1. Configure IBM Systems Director by using the following script:
/opt/ibm/director/bin/configAgtMgr.sh
Agent password: The script prompts for an agent password for which you can
consider giving the host system root password or any other common password of your
choice. This password is used by IBM Systems Director for its internal communication
and does have any external impact.
2. Start IBM Systems Director:

/opt/ibm/director/bin/smstart
3. Monitor the activation process as shown in Figure 11-1. This process might take 2-3
minutes.
/opt/ibm/director/bin/smstatus -r
Inactive
Starting
Active
Figure 11-1 Activation status for IBM Systems Director
Some subsystems are added as part of the installation process as follows:

Some process start automatically:
root 6553804 7995522 0 13:24:40 pts/0 0:00
/opt/ibm/director/jre/bin/java -Xverify:none -cp /opt/ibm/director/lwi/r
root 7340264 1 0 13:19:26 pts/2 3:14
/opt/ibm/director/jre/bin/java -Xms512m -Xmx2048m -Xdump:system:events=g
root 7471292 2949286 0 12:00:31 - 0:00
/opt/freeware/cimom/pegasus/bin/cimssys platform_agent
root 7536744 1 0 12:00:31 - 0:00
/opt/ibm/icc/cimom/bin/dirsnmpd
root 8061058 3604568 0 13:16:32 - 0:14
/var/opt/tivoli/ep/_jvm/jre/bin/java -Xmx384m -Xminf0.01 -Xmaxf0.4 -Dsun
4. Log in to IBM Systems Director by using the following address:
https://<hostname.domain.com or IP>:8422/ibm/console/logon.jsp
In this example, we use the following address:
https://indus74.in.ibm.com:8422/ibm/console/logon.jsp
5. On the welcome page (Figure 12-4 on page 335) that opens, log in using root credentials.
After completing the installation of IBM Systems Director, install the SystemMirror plug-in as
explained in the following section.

11.2 Installing the SystemMirror plug-in
The IBM Systems Director provides two sets of plug-ins:
The SystemMirror server plug-in to be installed in the IBM Systems Director Server.
The SystemMirror agent plug-in to be installed in the cluster nodes or the endpoints as
discovered by IBM Systems Director.
11.2.1 Installing the SystemMirror server plug-in

You must install the SystemMirror server plug-in in the IBM Systems Director Server.
Table 11-2 on page 329 outlines the installation steps for the SystemMirror server plug-in
depending on your operating system. You can find this table and more information about the
installation in the SystemMirror installation steps chapter in “Configuring AIX Clusters for High
Availability Using PowerHA SystemMirror for Systems Director,” which you can download
from:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101774
Table 11-2 Installation steps for the SystemMirror server plug-in

Operating Installation steps
system
AIX and Linux Graphical installation:

# chmod 700 IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin
# IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin
Textual Installation:
# IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin -i console
Silent mode installation
Edit the installer.properties file.
# IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin -i silent
Windows Graphical installation:

IBMSystemsDirector-PowerHA_SystemMirror-Windows.exe
Textual installation:
IBMSystemsDirector-PowerHA_SystemMirror-Windows.exe -i console
Silent installation:
First, edit the installer.properties file.
IBMSystemsDirector-PowerHA_SystemMirror-Windows.exe -i silent
export DISPLAY: export DISPLAY =<ip address of X Windows Server>:1 is required to

export the display of the server running the X Window System server to use the graphical
installation.
Verifying the installation of the SystemMirror plug-in

The interface plug-in of the subagent is loaded when the IBM System Director Server starts.
To check the installation, run the following command depending on your environment:
AIX / Linux:
/opt/ibm/director/lwi/bin/lwiplugin.sh -status | grep mirror
Windows:
C:/Program Files/IBM/Director/lwi/bin/lwiplugin.bat
Figure 11-2 shows the output of the plug-in status.
94:RESOLVED:com.ibm.director.power.ha.systemmirror.branding:7.1.0.1:com.ibm.director.power.ha.systemmirr
or.branding
95:ACTIVE:com.ibm.director.power.ha.systemmirror.common:7.1.0.1:com.ibm.director.power.ha.systemmirror.c
ommon
96:ACTIVE:com.ibm.director.power.ha.systemmirror.console:7.1.0.1:com.ibm.director.power.ha.systemmirror.
console
97:RESOLVED:com.ibm.director.power.ha.systemmirror.helps.doc:7.1.0.1:com.ibm.director.power.ha.systemmir
ror.helps.doc
98:INSTALLED:com.ibm.director.power.ha.systemmirror.server.fragment:7.1.0.0:com.ibm.director.power.ha.sy
stemmirror.server.fragment
99:ACTIVE:com.ibm.director.power.ha.systemmirror.server:7.1.0.1:com.ibm.director.power.ha.systemmirror.s
erver
Figure 11-2 Output of the plug-in status command
If the subagent interface plug-in shows the RESOLVED status instead of the ACTIVE status,
attempt to start the subagent. Enter the following commands by using the lwiplugin.sh script
on AIX and Linux or the lwiplugin.bat script on Windows and the plug-in number (which is
94):
AIX and Linux
/opt/ibm/director/agent/bin/lwiplugin.sh -start 94
Windows
C:/Program Files/IBM/Director/lwi/bin/lwiplugin.bat -start 94
If Systems Director was active during installation of the plug-in, you must stop it and restart it
as follows:
1. Stop the IBM Systems Director Server:
# /opt/ibm/director/bin/smstop
2. Start the IBM Systems Director Server:
# /opt/ibm/director/bin/smstart
3. Monitor the startup process:
# /opt/ibm/director/bin/smstatus -r
Inactive
Starting
Active *** (the "Active" status can take a long time)
11.2.2 Installing the SystemMirror agent plug-in in the cluster nodes

Install the cluster.es.director.agent file set by using SMIT. This file set is provided with the
base PowerHA SystemMirror installable images.
More information: See the SystemMirror agent installation section in Configuring AIX
Clusters for High Availability Using PowerHA SystemMirror for Systems Director paper at:
http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101774
See also PowerHA SystemMirror for IBM Systems Director, SC23-6763.

11.3 Installing the clients
You must perform the steps in the following sections in each node that is going to be
managed by the PowerHA SystemMirror plug-in for IBM Systems Director. This topic includes
the following sections:
Installing the common agent
Installing the PowerHA SystemMirror agent
11.3.1 Installing the common agent

Perform these steps on each node that is going to be managed by the IBM Systems Director
Server:
1. Extract the SysDir6_2_Common_Agent_AIX.jar file set:
# /usr/java5/bin/jar -xvf SysDir6_2_Common_Agent_AIX.jar
2. Give execution permission to the repository/dir6.2_common_agent_aix.sh file:
# chmod +x repository/dir6.2_common_agent_aix.sh
3. Execute the repository/dir6.2_common_agent_aix.sh file:
# ./repository/dir6.2_common_agent_aix.sh
Some subsystems are added as part of the installation process:
Some process start automatically:
root 421934 1 0 15:55:30 - 0:00
/opt/ibm/icc/cimom/bin/dirsnmpd
root 442376 1 0 15:55:40 - 0:00 /usr/bin/cimlistener
root 458910 1 0 15:55:31 - 0:00
/opt/freeware/cimom/pegasus/bin/CIM_diagd
root 516216 204950 0 15:55:29 - 0:00
/opt/freeware/cimom/pegasus/bin/cimssys platform_agent
root 524366 1 0 15:55:29 - 0:00 ./slp_srvreg -D
root 581780 1 0 15:55:37 - 0:04 [cimserve]
root 626740 204950 0 15:55:29 - 0:00
/opt/freeware/cimom/pegasus/bin/cimssys cimsys
root 630862 1 0 15:55:29 - 0:00
/opt/ibm/director/cimom/bin/tier1slp
11.3.2 Installing the PowerHA SystemMirror agent
To install the PowerHA SystemMirror agent on the nodes, follow these steps:
1. Install the cluster.es.director.agent.rte file set:
# smitty install_latest
2. Stop the common agent:
# stopsrc -s platform_agent
# stopsrc -s cimsys
3. Start the common agent:
# startsrc -s platform_agent
Tip: The cimsys subsystem starts along with the platform_agent subsystem.

12
Chapter 12. Creating and managing a cluster

using IBM Systems Director
The SystemMirror plug-in provided by IBM Systems Director is used to configure and manage
the PowerHA cluster. This plug-in provides a state-of-the-art interface and a command-line
interface (CLI) for cluster configuration. It includes wizards to help you create and manage the
cluster and the resource groups. The plug-in also helps in seamless integration of Smart
Assists and third-party application support.
This chapter explains how to create and manage the PowerHA SystemMirror cluster with IBM
Systems Director.

Creating a cluster with the SystemMirror plug-in wizard
Creating a cluster with the SystemMirror plug-in CLI
Performing cluster management
Performing cluster management with the SystemMirror plug-in CLI
Creating a resource group with the SystemMirror plug-in GUI wizard
Resource group management using the SystemMirror plug-in wizard
Managing a resource group with the SystemMirror plug-in CLI
Verifying and synchronizing a configuration with the GUI
Verifying and synchronizing with the CLI
Performing cluster monitoring with the SystemMirror plug-in

12.1 Creating a cluster
You can create a cluster by using the wizard for the SystemMirror plug-in or by using the CLI
commands for the SystemMirror plug-in. This topic explains how to use both methods.
12.1.1 Creating a cluster with the SystemMirror plug-in wizard

To create the cluster by using the GUI wizard of the SystemMirror plug-in, follow these steps.
1. Go to your IBM Systems Director server.
2. On the login page (Figure 12-1), log in to IBM Systems Director with your user ID and
password.
Figure 12-1 Systems Director login console
3. In the IBM Systems Director console, in the left navigation pane, expand Availability and
select PowerHA SystemMirror (Figure 12-2).
Figure 12-2 Selecting the PowerHA SystemMirror link in IBM Systems Director

4. In the right pane, under Cluster Management, click Create Cluster (Figure 12-3).
Figure 12-3 The Create Cluster link under Cluster Management
5. Starting with the Create Cluster Wizard, follow the wizard panes to create the cluster.
a. In the Welcome pane (Figure 12-4), click Next.
Figure 12-4 Create Cluster Wizard
Chapter 12. Creating and managing a cluster using IBM Systems Director 335
b. In the Name the cluster pane (Figure 12-5), in the Cluster name field, provide a name
for the cluster. Click Next.
Figure 12-5 Entering the cluster name
c. In the Choose nodes pane (Figure 12-6), select the host names of the nodes.
Figure 12-6 Selecting the cluster nodes

Common storage: The cluster nodes must have the common storage for the
repository disk. To verify the common storage, in the Choose nodes window, click
the Common storage button. The Common storage window (Figure 12-7) opens
showing the common disks.
Figure 12-7 Verifying common storage availability for the repository disk
d. In the Configure nodes pane (Figure 12-8), set the controlling node. The controlling
node in the cluster is considered to be the primary or home node. Click Next.
Figure 12-8 Setting the controlling node
e. In the Choose repositories pane (Figure 12-9), choose the storage disk that is shared
among all nodes in the cluster to use as the common storage repository. Click Next.
f. In the Configure security pane (Figure 12-10), specify the security details to secure
communication within the cluster.
Figure 12-10 Configuring the cluster security configuration

g. In the Summary pane (Figure 12-11), verify the configuration details.
Figure 12-11 Summary pane
6. Verify the cluster creation in the AIX cluster nodes by using either of the following
commands:
– The CAA command:
/usr/sbin/lscluster -m
– The PowerHA command:
/usr/es/sbin/cluster/utilities/cltopinfo
12.1.2 Creating a cluster with the SystemMirror plug-in CLI

IBM Systems Director provides a CLI to monitor and manage the system. This section
explains how to create a cluster by using the SystemMirror plug-in CLI.
Overview of the CLI
The CLI is executed by using a general-purpose smcli command. To list the available CLI
commands for managing the cluster, run the smcli lsbundle command as shown in
Figure 12-12.
# smcli lsbundle | grep sysmirror

sysmirror/help
sysmirror/lsac
sysmirror/lsam
sysmirror/lsappctl
sysmirror/lsappmon
sysmirror/lscl
sysmirror/lscluster
sysmirror/lsdependency
sysmirror/lsdp
sysmirror/lsfc
sysmirror/lsfilecollection
sysmirror/lsif
sysmirror/lsinterface
sysmirror/lslg
sysmirror/lslog
sysmirror/lsmd
sysmirror/lsmethod
.....
.....
Figure 12-12 CLI commands specific to SystemMirror
You can retrieve help information for the commands (Figure 12-12) as shown in Figure 12-13.
# smcli lscluster --help
smcli sysmirror/lscluster {-h|-?|--help} \

[-v|--verbose]
smcli sysmirror/lscluster [-v|--verbose] \
[<CLUSTER>[,<CLUSTER#2>,...]]
Command Alias: lscl

Figure 12-13 CLI help option
Creating a cluster with the CLI

Before you create a cluster, ensure that you have all the required details to create the cluster:
Cluster nodes
Persistent IP (if any)
Repository disk
Controlling node
Security options (if any)
To verify the availability of the mkcluster command, you can use the smcli lsbundle
command in IBM Systems Director as shown in Figure 12-12.

To create a cluster, issue the smcli mkcluster command from the IBM Systems Director
Server as shown in Example 12-1.
Example 12-1 Creating a cluster with the smcli mkcluster CLI command
smcli mkcluster -i 224.0.0.0 \
-r hdisk3 \
–n nodeA.xy.ibm.com,nodeB.xy.ibm.com \
DB2_Cluster
You can use the -h option to list the commands that are available (Figure 12-14).
# smcli mkcluster -h
smcli sysmirror/mkcluster {-h|-?|--help} [-v|--verbose]
smcli sysmirror/mkcluster [{-i|--cluster_ip} <multicast_address>] \
[{-S|--fc_sync_interval} <##>] \
[{-s|--rg_settling_time} <##>] \
[{-e|--max_event_time} <##>] \
[{-R|--max_rg_processing_time} <##>] \
[{-c|--controlling_node} <node>] \
[{-d|--shared_disks} <DISK>[,<DISK#2>,...] ] \
{-r|--repository} <disk> \
{-n|--nodes} <NODE>[, <NODE#2>,...] \
[<cluster_name>]
Figure 12-14 The mkcluster -h command to list the available commands
To verify that the cluster has been created, you can use the smcli lscluster command.
Command help: To assistance with using the commands, you can use either of the
following help options:
smcli <command name> -help --verbose
smcli <command name> -h -v
12.2 Performing cluster management

You can perform cluster management by using the GUI wizard for the SystemMirror plug-in or
by using the CLI commands for the SystemMirror plug-in. This topic explains how to use both
methods.
12.2.1 Performing cluster management with the SystemMirror plug-in GUI

wizard
IBM Systems Director provides GUI wizards to manage the network, storage, and snapshots
of a cluster. IBM Systems Director also provides functionalities to add nodes, view cluster
services status changes, review reports, and verify and synchronize operations. The following
sections guide you through these functionalities.
Accessing the Cluster Management Wizard
To access the Cluster Management Wizard, follow these steps:
1. In the IBM Systems Director console, expand Availability and select PowerHA
SystemMirror (Figure 12-3 on page 335).
2. In the right pane, under Cluster Management, click the Manage Clusters link
(Figure 12-15).
Figure 12-15 Manage cluster

Cluster management functionality
This section describes the cluster management functionality:
Cluster Management window (Figure 12-16)
After clicking the Manage Clusters link in the IBM Systems Director console, you see the
Cluster Management pane. This pane contains a series of tabs to help you manage your
cluster.
Figure 12-16 Cluster Management pane
Edit Advanced Properties button
Under the General tab, you can click the Edit Advanced Properties button to modify the
cluster properties. For example, you can change the controlling node as shown in
Figure 12-17.
Figure 12-17 Editing the advanced properties, such as the controlling node
Add Network tab

Under the Networks tab, you can click the Add Network button to add a network as
Figure 12-18 Add Network function

Storage management
On the Storage tab, you can perform disk management tasks such as converting the
hdisk into VPATH. From the View drop-down list, select Disks to modify the disk properties
as shown in Figure 12-19.
Figure 12-19 Cluster storage management
Capture Snapshot
You can capture and manage snapshots through the Snapshots tab. To capture a new
snapshot, click the Create button on the Snapshots tab as shown in Figure 12-20.
Figure 12-20 Capture Snapshot function
File collection and logs management
You can manage file collection and logs on the Additional Properties tab. From the View
drop-down list, select either File Collections or Log files as shown in Figure 12-21.
Figure 12-21 Additional Properties tab: File Collections and Log files options
Creating a file collection

On the Additional Properties tab, when you select File Collections from the View
drop-down list and click the Create button, you can create a file collection as shown in
Figure 12-22.
Figure 12-22 Creating a file collection

Collect log files button
On the Additional Properties tab, when you select Log files from the View drop-down list
and click the Collect log files button, you can collect log files as shown in Figure 12-23.
Figure 12-23 Collect log files
The Systems Director plug-in also provides a CLI to manage the cluster. The following section
explains the available CLI commands and how you can find help for each of these commands.
12.2.2 Performing cluster management with the SystemMirror plug-in CLI

The SystemMirror plug-in provides a CLI to most of the cluster management functions. For a
list o the available functions, use the following command:
smcli lsbundle | grep sysmirror
A few of the CLI commands are provided as follows for a quick reference:
Snapshot creation
You can use the smcli mksnapshot command to create a snapshot. Figure 12-24 on
page 348 shows the command for obtaining detailed help about this command.
mkss: mkss is the alias for the mksnapshot command.
# smcli mkss -h -v
smcli sysmirror/mksnapshot [-h|-?|--help] [-v|--verbose]

smcli sysmirror/mksnapshot {-c|--cluster} <CLUSTER> \
{-d|--description} "<DESCRIPTION>" \
[{-M|--methods} <METHOD>[,<METHOD#2>,...] ] \
[-s|--save_logs] \
<snapshot_name>
Figure 12-24 Help details for the mksnapshot command
Figure 12-2 shows usage of the smcli mkss command.
Example 12-2 Usage of the mksnapshot command

smcli mkss -c selma04_cluster -d "Selma04 cluster snapshot taken on Sept2010"
selma04_sep10_ss
Verify the snapshot by using the smcli lsss command as shown in Example 12-3.
Example 12-3 Verifying the snapshot

# smcli lsss -c selma04_cluster selma04_sep10_ss
NAME="selma04_sep10_ss"
DESCRIPTION="Selma04 cluster snapshot taken on Sept2010"
METHODS=""
SAVE_LOGS="false"
CAPTURE_DATE="Sep 29 09:47"
NODE="selma03"
File collection
You can use the smcli mkfilecollection command to create a file collection as shown in
Example 12-4. A file collection helps to keep the files and directories synchronized on all
nodes in the cluster.
Example 12-4 File collection

# smcli mkfilecollection -c selma04_cluster -C -d "File Collection for the
selma04 cluster" -F /home selma04_file_collection
# smcli lsfilecollection -c selma04_cluster selma04_file_collection

NAME="selma04_file_collection"
DESCRIPTION="File Collection for the selma04 cluster"
FILE="/home"
SIZE="256"
Log files
You can use the smcli lslog command (Example 12-5) to list the available log files in the
cluster. Then you can use the smcli vlog command to view the log files.
Example 12-5 Log file management

# smcli lslog -c selma04_cluster
Node: selma03
=============
autoverify.log

cl_testtool.log
clavan.log
clcomd.log
clcomddiag.log
....
....(output truncated)
# smcli vlog -c selma04_cluster -n selma03 -T 4 clverify.log

Collector succeeded on node selma03 (31610 bytes)
selma03 0
Modification functionality: At the time of writing this IBM Redbooks publication, an edit
or modification CLI command, such as to modify the controlling node, is not available for its
initial release. Therefore, use the GUI wizards for the modification functionality.
12.3 Creating a resource group with the SystemMirror plug-in

GUI wizard
You can configure the resource group by using the Resource Group Wizard as follows:
1. Log in to IBM Systems Director.
2. In the left navigation area, expand Availability and select PowerHA SystemMirror
(Figure 12-25).
3. In the right pane, under Resource Group Management, click Add a resource group link.
Figure 12-25 Resource group management
4. On the Clusters tab, click the Actions list and select Add Resource Group
(Figure 12-26). Then select the cluster node, and click the Action button.
Alternative: You can select the resource group configuration wizard by selecting the
cluster nodes, as shown in Figure 12-26.
Figure 12-26 Adding a resource group
5. In the Choose a cluster pane (Figure 12-27), choose the cluster where the resource group.
Notice that this step is highlighted under welcome in the left pane.
Figure 12-27 Choose the cluster for the resource group configuration
You can now choose to create either a custom resource group or a predefined resource group
as explained in 12.3.1, “Creating a custom resource group” on page 351, and 12.3.2,
“Creating a predefined resource group” on page 353.

12.3.1 Creating a custom resource group
To create a customer resource group, follow these steps:
1. In the Add a resource group pane (Figure 12-28), select the Create a custom resource
group option, enter a resource group name, and click Next.
Figure 12-28 Adding a resource group
2. In the Choose nodes pane (Figure 12-29), select the nodes for which you want to
configure the resource group.
Figure 12-29 Selecting the nodes for configuring a resource group
3. In the Choose policies and attributes pane (Figure 12-30), select the policies to add to the
resource group.
Figure 12-30 Selecting the policies and attributes
4. In the Choose resources pane (Figure 12-31), select the shared resources to define for
the resource group.
Figure 12-31 Selecting the shared resources

5. In the Summary pane (Figure 12-32), review the settings and click the Finish button to
create the resource group.
Figure 12-32 Summary pane of the Resource Creation wizard
12.3.2 Creating a predefined resource group

For a set of applications, such as IBM SAP, WebSphere®, DB2, HTTP Server, and Tivoli
Directory Server, the SystemMirror plug-in facilitates the process of creating predefined
resource groups.
To configure the predefined resource groups, follow these steps:

1. In the Add a resource group pane (Figure 12-33 on page 354), select the Create
predefined resource groups for one of the following discovered applications radio
button. Then select the application for which the resource group is to be configured.
Application list: Only the applications installed in the cluster nodes are displayed
under the predefined resource group list.
Figure 12-33 Predefined resource group configuration
2. In the Choose components pane, for the predefined resource group, select the
components of the application to create the resource group. In the example shown in
Figure 12-34, the Tivoli Director Server component is selected. Each component already
has the predefined properties such as the primary node and takeover node.
Modify the properties per your configuration and requirements. Then create the resource
group.
Figure 12-34 Application components

12.3.3 Verifying the creation of a resource group
To verify the creation of a resource group, follow these steps:
1. In the right pane, under Cluster Management, click the Manage Clusters link
(Figure 12-15 on page 342).
2. Click the Resource Groups tab (Figure 12-35).
Figure 12-35 Resource Groups tab
3. Enter the following base SystemMirror command to verify that the resource group has
been created:
/usr/es/sbin/cluster/utilities/clshowres
12.4 Managing a resource group

You can manage a resource group by using the SystemMirror plug-in wizard or the
SystemMirror plug-in CLI commands. This topic explains how to use both methods.
12.4.1 Resource group management using the SystemMirror plug-in wizard

The SystemMirror plug-in wizard has simplified resource group management with the addition
of the following functionalities:
Checking the status of a resource group
Moving a resource group across nodes
Creating dependencies
Accessing the resource group management wizard

To access the Resource Group Management wizard, follow these steps:
2. In the left pane, expand Availability and select PowerHA SystemMirror (Figure 12-36 on
page 356).
3. In the right pane, under Resource Group Management, select Manage Resource Groups
(Figure 12-36).
Figure 12-36 Resource group management link
The Resource Group Management wizard opens as in Figure 12-37. Alternatively, you can
access the Resource Group Management wizard by selecting Manage Cluster under Cluster
Management (Figure 12-36).
To access the Cluster and Resource Group Management wizard, select the Resource
Groups tab as shown in Figure 12-37.
Figure 12-37 Resource Group Management tab

Resource group management functionality
The Resource Group Management wizard includes the following functions:
Create Dependency function
a. Select the Clusters button to see the resource groups defined under the cluster.
b. Click the Action list and select Create Dependency (as shown in Figure 12-38).
Alternatively, right-click a cluster name and select Create Dependency.
Figure 12-38 Selecting the Create Dependency function
c. In the Parent-child window (Figure 12-39), select the dependency type to configure the
dependencies.
Figure 12-39 Parent-child window
Resource group removal
Right-click the selected resource group, and click Remove to remove the resource group
Figure 12-40 Cluster and Resource Group Management pane
Application Availability and Configuration reports

The Application Availability and Configuration reports show the configuration details of the
resource group. The output of these reports is similar to the output produced by the
clshowres command in the base PowerHA installation. You can also see the status of the
application. To access these reports, right-click a resource group name, select Reports
and then select Application Availability or Configuration as shown in Figure 12-41.
Figure 12-41 Application Monitors

Resource group status change
To view move, online, and offline status changes, right-click a resource group name and
select Advanced. Then select the option you need as shown in Figure 12-42.
Figure 12-42 viewing a status change
12.4.2 Managing a resource group with the SystemMirror plug-in CLI

Similar to the CLI commands for cluster creation and management, a set of CLI commands
are provided for resource group management. To list the available CLI commands for
managing the cluster, run the smcli lsbundle command (Figure 12-12 on page 340).
The following commands are specific to resource groups:

To remove the resource group in the controlling node:
sysmirror/rmresgrp
To start the resource group in online state:
sysmirror/startresgrp
To stop the resource group to an offline state:
sysmirror/stopresgrp
To move the resource group to an online state:
sysmirror/moveresgrp
To list all the configured resource groups:
sysmirror/lsresgrp
If the resource group name is used along with this command, it provides the details of the
resource group.
Examples of CLI command usage
This section shows examples using the CLI commands for resource group management.
To list the resource groups, use the following command as shown in Example 12-6:
smcli lsresgrp -c <cluster name>
Example 12-6 The smcli lsresgrp command

# smcli lsresgrp -c selma04_cluster
myRG
RG01_selma03
RG02_selma03
RG03_selma04
RG04_selma04_1
RG05_selma03_04
RG06_selma03_04
RG_dhe
To remove the resource group, use the following command as shown in Example 12-7:
smcli rmresgrp -c <cluster name> -C <RG_name>
Example 12-7 The smcli rmresgrp command using the -C option to confirm the removal operation
# smcli rmresgrp -c selma04_cluster Test_AltRG
Removing this resource group will cause all user-defined PowerHA information
to be DELETED.
Removing objects is something which is not easily reversed, and therefore

requires confirmation. If you are sure that you want to proceed with this
removal operation, re-run the command using the "--confirm" or "-C" option.
Consider creating a snapshot of the current cluster configuration first,

though, since restoring a snapshot will be the only way to reverse any
deletions.
12.5 Verifying and synchronizing a configuration

You can verify and synchronize a cluster by using the wizard for the SystemMirror plug-in or
by using the CLI commands for the SystemMirror plug-in. This topic explains how to use both
methods.
12.5.1 Verifying and synchronizing a configuration with the GUI

To verify and synchronize the configuration by using the Synchronization and Verification
function of the SystemMirror plug-in, follow these steps:
2. Expand Availability and select PowerHA SystemMirror as shown in Figure 12-4 on
page 335.
3. Under Cluster Management, select the Manage Clusters link.

4. In the Cluster and Resource Group Management wizard, select the cluster for which you
want to perform the synchronize and verification function. Then select the Action button or
right-click the cluster to access the Verify and Synchronize option as shown in
Figure 12-43.
Figure 12-43 Cluster management option list
5. In the Verify and Synchronize pane (Figure 12-44), select whether you want to
synchronize the entire configuration, only the unsynchronized changes, or verify. Then
click OK.
Figure 12-44 Verify and Synchronize window
6. Optional: Undo the changes to the configuration after synchronization.
a. To access this option, in the Cluster and Resource Group Management wizard, on the
Clusters tab, select the cluster for which you want to perform the synchronize and
verification function (Figure 12-43 on page 361).
b. As shown in (Figure 12-45), select the Recovery  Undo local changes of
configuration.
Figure 12-45 Recovering the configuration option
c. When you see the Undo Local Changes of the Configuration message (Figure 12-46),
click OK.
Figure 12-46 Undo changes message window
Snapshot for the undo changes option: The undo changes option creates a
snapshot before it deletes the configuration since the last synchronization.

12.5.2 Verifying and synchronizing with the CLI
This section shows examples of performing cluster verification and synchronization by using
the CLI functionality:
Synchronization
You can use the synccluster command to verify and synchronize the cluster. This
command copies the cluster configuration from the controlling node of the specified cluster
to each of the other nodes in the cluster.
The help option is available by using the smcli synccluster -h -v command as shown in
Example 12-8. Here you see options such as to perform a verification or synchronization
(see Example 12-9).
Example 12-8 The help option of the smcli synccluster command

# smcli sysmirror/synccluster -h -v
smcli sysmirror/synccluster {-h|-?|--help} [-v|--verbose]

smcli sysmirror/synccluster [-n|--no_verification}] \
<CLUSTER>
smcli sysmirror/synccluster [-x|--fix_errors}] \
[-C|--changes_only}] \
[-t|--custom_tests_only}] \
[{-M|--methods} <METHOD>[,<METHOD#2>,...] ] \
[{-e|--maximum_errors} <##>] \
[-F|--force] \
[{-l|--logfile} <full_path_to_file>] \
<CLUSTER>
Command Alias: sycl

.....
.....
<output truncated>
Example 12-9 shows how to synchronize cluster changes and to log the output in its own
specific log file.
Example 12-9 smcli synccluster changes only with the log file option
# smcli synccluster -C -l /tmp/sync.log selma04_cluster
Undo changes
To restore the cluster configuration back to the configuration after any synchronization,
use the smcli undochanges command. This operation restores the cluster configuration
from the active configuration database. Typically, this command has the effect of
discarding any unsynchronized changes.
The help option is available by using the smcli undochanges -h -v command as shown in
Example 12-10.
Example 12-10 The help option for the smcli undochanges command
# smcli undochanges -h -v
smcli sysmirror/undochanges {-h|-?|--help} [-v|--verbose]

smcli sysmirror/undochanges <CLUSTER>
Command Alias: undo
-h|-?|--help
Requests help for this command.
-v|--verbose
Requests maximum details in the displayed information.
<CLUSTER> The label of a cluster to perform this operation on.
...
<output truncated >
12.6 Performing cluster monitoring with the SystemMirror

plug-in
This topic explains how to monitor the status of the cluster and the resource group before and
while the cluster services are active. It also covers problem determination steps and how to
collect log files to analyze cluster issues.
12.6.1 Monitoring cluster activities before starting a cluster

This section explains the features you can use to monitor for cluster activities before starting
the cluster:
Topology view
After the cluster and its resource groups are configured, select the topology view to
understand the overall status of cluster and its configuration:
a. Log in to IBM Systems Director.
b. Expand Availability and select PowerHA SystemMirror as shown in Figure 12-4 on
page 335.
c. In the right pane, select the cluster to be monitored and click Actions. Select Map
View (Figure 12-47) to access the Map view of the cluster configuration.
Figure 12-47 Map view of cluster configuration

Map view: The map view is available for resource configuration. As shown in
Figure 12-47 on page 364, select the Resource Groups tab. Click Action, and click
Map View to see the map view of the resource group configuration as shown in
Figure 12-48.
Test_AhRG
myRG RG_test_NChg
_testinggg
RG_testing11
RG_testing9
RG01_selma03
RG_testing6
selma_04_cluster
RG_testing2
RG05_selma03_04
RG_TEST_4
RG06_selma03_04
Figure 12-48 Map view of resource group configuration
Cluster subsystem services status:
You can view the status of PowerHA services, such as the clcomd subsystem, by using the
Status feature. To access this feature, select the cluster for which the service status is to
be viewed. Click the Action button and select Reports  Status.
You now see the cluster service status details, similar to the example in Figure 12-49.
Figure 12-49 Cluster Services status
Cluster Configuration Report

Before starting the cluster services, access the cluster configuration report. Select the
cluster for which the configuration report is to be viewed. Click the Action button and
select Reports, which shows the Cluster Configuration Report page (Figure 12-50).
Figure 12-50 Cluster Configuration Report

You can also view the Cluster Topology Configuration Report by using the following
command:
/usr/es/sbin/cluster/utilities/cltopinfo
Then select the cluster, click the Action button, and select Reports  Configuration.
You see the results in a format similar to the example in Figure 12-51.
Figure 12-51 Cluster Topology Configuration Report
Similarly you can view the configuration report for the resource group as shown in
Figure 12-52. On the Resource Groups tab, select the resource group for which you want
to view the configuration. Then click the Action button and select Reports.
Figure 12-52 Resource Group Configuration Report
Application monitoring
To locate the details of the application monitors that are configured and assigned to a
resource group, select the cluster. Click the Action button and select Reports 
Applications. Figure 12-53 shows the status of the application monitoring.
Figure 12-53 Application monitoring status
Similarly you can view the configuration report for networks and interfaces by selecting the
cluster, clicking the Action button, and selecting Reports  Networks and Interfaces.
12.6.2 Monitoring an active cluster

When the cluster service is active, to see the status of the resource group, select the cluster
for which the status is to be viewed. Click the Action button and select Report  Event
Summary. You can now access the online status of the resource group and events summary
Figure 12-54 Resource group online status

12.6.3 Recovering from cluster configuration issues
To recover from cluster configuration issues, such as recovering from an event failure and
undoing local changes, consider the following tips:
Getting the proper GUI
Select the cluster and click the Actions button. Then select Recovery and choose the
appropriate action as shown in Figure 12-55.
Figure 12-55 Recovery options
Releasing cluster modification locks

After you issue the release of the cluster modification locks, you see a message similar to
the one shown in Figure 12-56. Before you perform the operation, save a snapshot of the
cluster as indicated in the message.
Figure 12-56 Release cluster modification locks
Recovering from an event failure
After you issue a cluster recover from event failure, you see a message similar to the one
shown in Figure 12-57. Verify that you have addressed all problems that led to the error
before continuing with the operation.
Figure 12-57 Recovery from an event failure
Collecting problem determination data

To collect problem determination data, select the Turn on debugging option and Collect
the RSCT log files (Figure 12-58).
Figure 12-58 Collect Problem Determination Data window
Undoing local changes of a configuration

To undo local changes of a configuration, see 12.5.1, “Verifying and synchronizing a
configuration with the GUI” on page 360.

13
Chapter 13. Disaster recovery using DS8700

Global Mirror
This chapter explains how to configure disaster recovery based on IBM PowerHA
SystemMirror for AIX Enterprise Edition using IBM System Storage DS8700 Global Mirror as
a replicated resource. This support was added in version 6.1 with service pack 3 (SP3).

Planning for Global Mirror
Installing the DSCLI client software
Scenario description
Configuring the Global Mirror resources
Configuring AIX volume groups
Configuring the cluster
Failover testing
LVM administration of DS8000 Global Mirror replicated resources

13.1 Planning for Global Mirror
Proper planning is crucial to the success of any disaster recovery solution. This topic reveals
the basic requirements to implement Global Mirror and integrate it with the IBM PowerHA
SystemMirror for AIX Enterprise Edition.
13.1.1 Software prerequisites

Global Mirror functionality works with all the AIX levels that are supported by PowerHA
SystemMirror Standard Edition. The following software is required for the configuration of the
PowerHA SystemMirror for AIX Enterprise Edition for Global Mirror:
The following base file sets for PowerHA SystemMirror for AIX Enterprise Edition 6.1:
– cluster.es.pprc.cmds
– cluster.es.pprc.rte
– cluster.es.spprc.cmds
– cluster.es.spprc.rte
– cluster.msg.en_US.pprc
PPRC and SPPRC file sets: The PPRC and SPPRC file sets are not required for
Global Mirror support on PowerHA.
The following additional file sets included in SP3 (must be installed separately and require
the acceptance of licenses during the installation):
– cluster.es.genxd
cluster.es.genxd.cmds 6.1.0.0 Generic XD support - Commands
cluster.es.genxd.rte 6.1.0.0 Generic XD support - Runtime
– cluster.msg.en_US.genxd
cluster.msg.en_US.genxd 6.1.0.0 Generic XD support - Messages
AIX supported levels:
– 5.3 TL9, RSCT 2.4.12.0, or later
– 6.1 TL2 SP1, RSCT 2.5.4.0, or later
The IBM DS8700 microcode bundle 75.1.145.0 or later
DS8000 CLI (DSCLI) 6.5.1.203 or later client interface (must be installed on each
PowerHA SystemMirror node):
– Java™ 1.4.1 or later
– APAR IZ74478, which removes the previous Java requirement
The path name for the DSCLI client in the PATH for the root user on each PowerHA
SystemMirror node (must be added)
13.1.2 Minimum DS8700 requirements

Before you implement PowerHA SystemMirror with Global Mirror, you must ensure that the
following requirements are met:
Collect the following information for all the HMCs in your environment:
– IP addresses
– Login names and passwords
– Associations with storage units

Verify that all the data volumes that must be mirrored are visible to all relevant AIX hosts.
Verify that the DS8700 volumes are appropriately zoned so that the IBM FlashCopy®
volumes are not visible to the PowerHA SystemMirror nodes.
Ensure all Hardware Management Consoles (HMCs) are accessible by using the Internet
Protocol network for all PowerHA SystemMirror nodes where you want to run Global
Mirror.
13.1.3 Considerations
The PowerHA SystemMirror Enterprise Edition using DS8700 Global Mirror has the following
considerations:
The AIX Virtual SCSI is not supported in this initial release.
No auto-recovery is available from a PPRC path or link failure.
If the PPRC path or link between Global Mirror volumes breaks down, the PowerHA
Enterprise Edition is unaware of it. (PowerHA does not process Simple Network
Management Protocol (SNMP) for volumes that use DS8K Global Mirror technology for
mirroring). In this case, the user must identify and correct the PPRC path failure.
Depending on timing conditions, such an event can result in the corresponding Global
Mirror session to go to a “Fatal” state. If this situation occurs, the user must manually stop
and restart the corresponding Global Mirror Session (using the rmgmir and mkgmir DSCLI
commands) or an equivalent DS8700 interface.
Cluster Single Point Of Control (C-SPOC) cannot perform the some Logical Volume
Manager (LVM) operations on nodes at the remote site that contain the target volumes.
Operations that require nodes at the target site to read from the target volumes result in an
error message in C-SPOC. Such operations include such functions as changing the file
system size, changing the mount point, and adding LVM mirrors. However, nodes on the
same site as the source volumes can successfully perform these tasks, and the changes
can be propagated later to the other site by using a lazy update.
Attention: For C-SPOC operations to work on all other LVM operations, you must
perform all C-SPOC operations with the DS8700 Global Mirror volume pairs in a
synchronized or consistent state. Alternatively, you must perform them in the active
cluster on all nodes.
The volume group names must be listed in the same order as the DS8700 mirror group
names in the resource group.
13.2 Installing the DSCLI client software

You can download the latest version of the DS8000 DSCLI client software from the following
web page:
ftp://ftp.software.ibm.com/storage/ds8000/updates/DS8K_Customer_Download_Files/CLI
Install the DS8000 DSCLI software on each PowerHA SystemMirror node. By default, the
installation process installs the DSCLI in the /opt/ibm/dscli directory. Add the installation
directory of the DSCLI into the PATH environment variable for the root user.
For more details about the DS8000 DSCLI, see the IBM System Storage DS8000:
Command-Line Interface User’s Guide, SC26-7916.
Chapter 13. Disaster recovery using DS8700 Global Mirror 373

13.3 Scenario description
This scenario uses a three-node cluster named Txrmnia. Two nodes are in the primary site,
Texas, and one node is in the site Romania. The jordan and leeann nodes are at the Texas
site and the robert node is at the Romania site. The primary site, Texas, has both local
automatic failover and remote recovery. Figure 13-1 provides a software and hardware
overview of the tested configuration between the two sites.
Txrmnia
Figure 13-1 DS8700 Global Mirror test scenario
For this test, the resources are limited. Each system has a single IP, an XD_ip network, and
single Fibre Channel (FC) host adapters. Ideally, redundancy might exist throughout the
system, including in the local Ethernet networks, cross-site XD_ip networks, and FC
connectivity. This scenario has a single resource group, ds8kgmrg, which consists of a service
IP address (service_1), a volume group (txvg), and a DS8000 Global Mirror replicated
resource (texasmg). To configure the cluster, see 13.6, “Configuring the cluster” on page 385.
13.4 Configuring the Global Mirror resources

This section explains how to perform the following tasks:
Checking the prerequisites
Identifying the source and target volumes
Configuring the Global Mirror relationships
For each task, the DS8000 storage units are already added to the storage area network
(SAN) fabric and zoned appropriately. Also, the volumes are already provisioned to the nodes.

For details about how to set up the storage units, see IBM System Storage DS8700
Architecture and Implementation, SG24-8786.
13.4.1 Checking the prerequisites

To check the prerequisites, follow these steps:
1. Ensure that the DSCLI installation path is in the PATH environment variable on all nodes.
2. Verify that you have the appropriate microcode version on each storage unit by running
the ver -lmc command in a DSCLI session as shown in Example 13-1.
Example 13-1 Checking the microcode level

(0) root @ r9r4m21: : /
# dscli -cfg /opt/ibm/dscli/profile/dscli.profile.hmc1
Date/Time: October 6, 2010 2:15:33 PM CDT IBM DSCLI Version: 6.5.15.19 DS:
IBM.2107-75DC890
dscli> ver -lmc

Date/Time: October 6, 2010 2:15:41 PM CDT IBM DSCLI Version: 6.5.15.19 DS: -
Storage Image LMC
==========================
IBM.2107-75DC890 5.5.1.490
dscli>
3. Check the code bundle level that corresponds to your LMC version on the “DS8700 Code
Bundle Information” web page at:
http://www.ibm.com/support/docview.wss?uid=ssg1S1003593
The code bundle level must be at version 75.1.145.0 or later. Also on the same page,
verify that your displayed DSCLI version corresponds to the installed code bundle level or
a later level.
Example 13-2 shows the extra parameters inserted into the DSCLI configuration file for the
storage unit in the primary site, /opt/ibm/dscli/profile/dscli.profile.hmc1. Adding these
parameters helps to prevent from having to type them each time they are required.
Example 13-2 Editing the DSCLI configuration file

username: redbook
password: r3dbook
hmc1: 9.3.207.122
devid: IBM.2107-75DC890
remotedevid: IBM.2107-75DC980
13.4.2 Identifying the source and target volumes

Figure 13-2 on page 376 shows the volume allocation in DS8000 units for the scenario in this
chapter. Global Copy source volumes are attached to both nodes in the primary site, Texas,
and the corresponding Global Copy target volumes are attached to the node in the secondary
site, Romania. The gray volumes, FlashCopy targets, are not exposed to the hosts.

Texas Romania
2604 2600 2C00 2C04
0A08 2E00 2800 2804
Global Copy
Flash Copy
Data Volume
Flash Copy Volume
Figure 13-2 Volume allocation in DS8000 units
Table 13-1 shows the association between the source and target volumes of the replication
relationship and between their logical subsystems (LSS, the two most significant digits of a
volume identifier highlighted in bold in the table). Table 13-1 also indicates the mapping
between the volumes in the DS8000 units and their disk names on the attached AIX hosts.
Table 13-1 AIX hdisk to DS8000 volume mapping

Site Texas Site Romania
AIX disk LSS/VOL ID LSS/VOL ID AIX disk
hdisk10 2E00 2800 hdisk2
hdisk6 2600 2C00 hdisk6
You can easily obtain this mapping by using the lscfg -vl hdiskX | grep Serial command
as shown in Example 13-3. The hdisk serial number is a concatenation of the storage image
serial number and the ID of the volume at the storage level.
Example 13-3 The hdisk serial number in the lscfg command output
# lscfg -vl hdisk10 | grep Serial
Serial Number...............75DC8902E00
# lscfg -vl hdisk6 | grep Serial
Serial Number...............75DC8902600
Symmetrical configuration: In an actual environment (and different from this sample

environment), to simplify the management of your Global Mirror environment, maintain a
symmetrical configuration in terms of both physical and logical elements. With this type of
configuration, you can keep the same AIX disk definitions on all nodes. It also helps you
during configuration and management operations of the disk volumes within the cluster.

13.4.3 Configuring the Global Mirror relationships
In this section, you configure the Global Mirror replication relationships by performing the
following tasks:
Creating PPRC paths
Creating Global Copy relationships
Creating FlashCopy relationships
Selecting an available Global Mirror session identifier
Defining Global Mirror sessions for all involved LSSs
Including all the source and target volumes in the Global Mirror session
Creating PPRC paths

In this task, the appropriate FC links have been configured between the storage units.
Example 13-4 shows the FC links that are available for the setup.
Example 13-4 Available FC links

dscli> lsavailpprcport -remotewwnn 5005076308FFC804 2e:28
IBM.2107-75DC890
Local Port Attached Port Type
=============================
I0010 I0210 FCP
I0013 I0203 FCP
I0013 I0310 FCP
I0030 I0200 FCP
I0030 I0230 FCP
I0030 I0330 FCP
I0040 I0200 FCP
I0040 I0230 FCP
I0041 I0232 FCP
I0041 I0331 FCP
I0042 I0211 FCP
I0110 I0203 FCP
I0110 I0310 FCP
I0110 I0311 FCP
I0111 I0310 FCP
I0111 I0311 FCP
I0130 I0200 FCP
I0130 I0230 FCP
I0130 I0300 FCP
I0130 I0330 FCP
I0132 I0232 FCP
I0132 I0331 FCP
dscli>
Complete the following steps:

1. Run the lssi command on the remote storage unit to obtain the remote wwnn parameter
for the lsavailpprcport command. The last parameter is one possible pair of your source
and target LSSs.
2. For redundancy and bandwidth, configure more FC links by using redundant SAN fabrics.

3. Among the multiple displayed links, choose two that have their ports on different adapters.
Use them to create the PPRC path for the 2e:28 LSS pair (see Example 13-5).
Example 13-5 Creating pprc paths

dscli> mkpprcpath -remotewwnn 5005076308FFC804 -srclss 2e -tgtlss 28
I0030:I0230 I0110:I0203
IBM.2107-75DC890
CMUC00149I mkpprcpath: Remote Mirror and Copy path 2e:28 successfully
established.
dscli> lspprcpath 2e
IBM.2107-75DC890
Src Tgt State SS Port Attached Port Tgt WWNN
=========================================================
2E 28 Success FF28 I0030 I0230 5005076308FFC804
2E 28 Success FF28 I0110 I0203 5005076308FFC804
dscli>
4. In a similar manner, configure one PPRC path for each other involved LSS pair.
5. Because the PPRC paths are unidirectional, create a second path, in the opposite
direction, for each LSS pair. You use the same procedure, but work on the other storage
unit (see Example 13-6). We select different FC links for this direction.
Example 13-6 Creating PPRC paths in opposite directions

dscli> mkpprcpath -remotewwnn 5005076308FFC004 -srclss 28 -tgtlss 2e
I0311:I0111 I0300:I0130
IBM.2107-75DC980
CMUC00149I mkpprcpath: Remote Mirror and Copy path 28:2e successfully
established.
dscli>
Creating Global Copy relationships

Create Global Copy relationship between the source and target volumes and then check their
status by using the commands shown in Example 13-7.
Example 13-7 Creating Global Copy relationships

dscli> mkpprc -type gcp 2e00:2800 2600:2c00
Date/Time: October 5, 2010 5:57:13 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
CMUC00153I mkpprc: Remote Mirror and Copy volume pair relationship 2E00:2800 successfully created.
CMUC00153I mkpprc: Remote Mirror and Copy volume pair relationship 2600:2C00 successfully created.
dscli> lspprc 2e00:2800 2600:2c00
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status
==================================================================================================
2600:2C00 Copy Pending - Global Copy 26 60 Disabled True
2E00:2800 Copy Pending - Global Copy 2E 60 Disabled True
dscli>

Creating FlashCopy relationships
Create FlashCopy relationships on both DS8000 storage units as shown in Example 13-8.
Example 13-8 Creating FlashCopy relationships

dscli> mkflash -tgtinhibit -nocp -record 2e00:0a08 2600:2604
CMUC00137I mkflash: FlashCopy pair 2E00:0A08 successfully created.
CMUC00137I mkflash: FlashCopy pair 2600:2604 successfully created.
dscli> lsflash 2e00:0a08 2600:2604
ID SrcLSS SequenceNum Timeout ActiveCopy Recording Persistent Revertible
SourceWriteEnabled TargetWriteEnabled BackgroundCopy
===========================================================================================
2E00:0A08 0A 0 60 Disabled Enabled Enabled Disabled Enabled
Disabled Disabled
2600:2604 26 0 60 Disabled Enabled Enabled Disabled Enabled
Disabled Disabled
dscli>
dscli> mkflash -tgtinhibit -nocp -record 2800:2804 2c00:2c04

CMUC00137I mkflash: FlashCopy pair 2800:2804 successfully created.
CMUC00137I mkflash: FlashCopy pair 2C00:2C04 successfully created.
dscli> lsflash 2800:2804 2c00:2c04
ID SrcLSS SequenceNum Timeout ActiveCopy Recording Persistent Revertible
SourceWriteEnabled TargetWriteEnabled BackgroundCopy
===========================================================================================
2800:2804 28 0 60 Disabled Enabled Enabled Disabled Enabled
Disabled Disabled
2C00:2C04 2C 0 60 Disabled Enabled Enabled Disabled Enabled
Disabled Disabled
dscli>
Selecting an available Global Mirror session identifier

Example 13-9 lists the Global Mirror sessions that are already defined on each DS8000
storage unit. In this scenario, we chose 03 as the session identifier because it is free on both
storage units.
Example 13-9 Sessions defined on both DS8000 storage units

dscli> lssession 00-ff
LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
04 77 Normal 0400 Join Pending Primary Copy Pending Secondary Simplex True Disable
0A 04 Normal 0A04 Join Pending Primary Suspended Secondary Simplex False Disable
16 05 Normal 1604 Join Pending Primary Suspended Secondary Simplex False Disable
1C 04 Normal 1C00 Join Pending Primary Suspended Secondary Simplex False Disable
dscli> lssession 00-ff


AllowCascading
===========================================================================================================
1A 20 Normal 1A00 Join Pending Primary Simplex Secondary Copy Pending True Disable
1C 01 - - - - - - -
30 77 Normal 3000 Join Pending Primary Simplex Secondary Copy Pending True Disable
dscli>
Defining Global Mirror sessions for all involved LSSs

Define the Global Mirror sessions for all the LSSs associated with source and target volumes
as shown in Example 13-10. The same freely available session identifier, determined in
“Selecting an available Global Mirror session identifier” on page 379, is used on both storage
units.
Example 13-10 Defining the GM session for the source and target volumes
dscli> mksession -lss 2e 03
CMUC00145I mksession: Session 03 opened successfully.
dscli> mksession -lss 26 03
dscli> mksession -lss 28 03

dscli> mksession -lss 2c 03
dscli>
Including all the source and target volumes in the Global Mirror session
Add the volumes in the Global Mirror sessions and verify their status by using the commands
Example 13-11 Adding source and target volumes to the Global Mirror sessions
dscli> chsession -lss 26 -action add -volume 2600 03
CMUC00147I chsession: Session 03 successfully modified.
dscli> chsession -lss 2e -action add -volume 2e00 03
dscli> lssession 26 2e
AllowCascading
===========================================================================================================
26 03 Normal 2600 Join Pending Primary Copy Pending Secondary Simplex True Disable
2E 03 Normal 2E00 Join Pending Primary Copy Pending Secondary Simplex True Disable
dscli>
dscli> chsession -lss 2c -action add -volume 2c00 03
dscli> chsession -lss 28 -action add -volume 2800 03

dscli> lssession 28 2c
AllowCascading
===========================================================================================================
2C 03 Normal 2C00 Join Pending Primary Simplex Secondary Copy Pending True Disable
dscli>
13.5 Configuring AIX volume groups

In this scenario, you create a volume group and a file system on the hdisks associated with
the DS8000 source volumes. These volumes are already identified in 13.4.2, “Identifying the
source and target volumes” on page 375. They are hdisk6 and hdisk10 on the jordan node.
You must configure the volume groups and file systems on the cluster nodes. The application
might need the same major number for the volume group on all nodes. Perform this
configuration task because it might be useful later for additional configuration of the Network
File System (NFS).
For the nodes on the primary site, you can use the standard procedure. You define the
volume groups and file systems on one node and then import them to the other nodes. For
the nodes on the secondary site, you must first suspend the replication on the involved target
volumes.
13.5.1 Configuring volume groups and file systems on primary site

In this task, you create an AIX volume group on the hdisks associated with the DS8000
source volumes on the jordan node and import it on the leeann node. Follow these steps:
1. Choose the next free major number on all cluster nodes by running the lvlstmajor
command on each cluster node. The next common free major number on all systems is 50
Example 13-12 Running the lvlstmajor command on all cluster nodes

root@leeann: lvlstmajor
50...
root@robert: lvlstmajor
44..54,56...
root@jordan: # lvlstmajor
50...

2. Create a volume group, called txvg, and a file system, called /txro. These volumes are
already identified in 13.4.2, “Identifying the source and target volumes” on page 375. They
are hdisk6 and hdisk10 on the jordan node. Example 13-13 shows a list of commands to
run on the jordan node.
Example 13-13 Creating txvg volume group on jordan

root@jordan: mkvg -V 50 -y txvg hdisk6 hdisk10
0516-1254 mkvg: Changing the PVID in the ODM.
txvg
root@jordan:chvg -a n xvg
root@jordan: mklv -e x -t jfs2 -y txlv txvg 250
txlv
root@jordan: mklv -e x -t jfs2log -y txloglv txvg 1
txloglv
root@jordan: crfs -v jfs2 -d /dev/txlv -a log=/dev/txloglv -m /txro -A no
File system created successfully.
1023764 kilobytes total disk space.
New File System size is 2048000
root@jordan: lsvg -p txvg
txvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk6 active 511 385
102..00..79..102..102
hdisk10 active 511 386
103..00..79..102..102
root@jordan:lspv|grep -e hdisk6 -e hdisk10
hdisk6 000a625afe2a4958 txvg active
hdisk10 000a624a833e440f txvg active
root@jordan: varyoffvg txvg
root@jordan:
3. Import the volume group on the second node on the primary site, leeann, as shown in
Example 13-14:
a. Verify that the shared disks have the same PVID on both nodes.
b. Run the rmdev -dl command for each hdisk.
c. Run the cfgmgr program.
d. Run the importvg command.
Example 13-14 Importing the txvg volume group on the leeann node
root@leean: rmdev -dl hdisk6
hdisk6 deleted
root@leean: rmdev -dl hdisk10
hdisk10 deleted
root@leean: cfgmgr
root@leean:lspv | grep -e hdisk6 -e hdisk10
hdisk6 000a625afe2a4958 txvg
hdisk10 000a624a833e440f txvg
root@leean: importvg -V 51 -y txvg hdisk6
txvg
root@leean: lsvg -l txvg
txvg:
txlv jfs2 250 250 2 open/syncd /txro
txloglv jfs2log 1 1 1 open/syncd N/A

root@leean: chvg -a n txvg
root@leean: varyoffvg txvg
13.5.2 Importing the volume groups in the remote site

To import the volume groups in the remote site, use the following steps. Example 13-15
shows the commands to run on the primary site.
1. Obtain a consistent replica of the data, on the primary site, by ensuring that the volume
group is varied off as shown by the last command in Example 13-14.
2. Ensure that the Global Copy is in progress and that the Out of Sync count is 0.
3. Suspend the replication by using the pausepprc command.
Example 13-15 Pausing the Global Copy relationship on the primary site
dscli> lspprc -l 2600 2e00
ID State Reason Type Out Of Sync Tracks Tgt Read Src Cascade Tgt Cascade Date
Suspended SourceLSS Timeout (secs) Critical Mode First Pass Status Incremental Resync Tgt Write GMIR CG
PPRC CG isTgtSE DisableAutoResync
===========================================================================================================
===========================================================================================================
2600:2C00 Copy Pending - Global Copy 0 Disabled Disabled Invalid -
26 60 Disabled True Disabled Disabled N/A Disabled
Unknown False
2E00:2800 Copy Pending - Global Copy 0 Disabled Disabled Invalid -
2E 60 Disabled True Disabled Disabled N/A Disabled
Unknown False
dscli> pausepprc 2600:2C00 2E00:2800
CMUC00157I pausepprc: Remote Mirror and Copy volume pair 2600:2C00 relationship successfully paused.
CMUC00157I pausepprc: Remote Mirror and Copy volume pair 2E00:2800 relationship successfully paused.
dscli> lspprc -l 2600 2e00
Suspended SourceLSS Timeout (secs) Critical Mode First Pass Status Incremental Resync Tgt Write GMIR CG
PPRC CG isTgtSE DisableAutoResync
===========================================================================================================
===========================================================================================================
2600:2C00 Suspended Host Source Global Copy 0 Disabled Disabled Invalid -
26 60 Disabled True Disabled Disabled N/A Disabled
Unknown False
2E00:2800 Suspended Host Source Global Copy 0 Disabled Disabled Invalid -
2E 60 Disabled True Disabled Disabled N/A Disabled
Unknown False
dscli>
4. To make the target volumes available to the attached hosts, use the failoverpprc
command on the secondary site as shown in Example 13-16.
Example 13-16 The failoverpprc command on the secondary site storage unit
dscli> failoverpprc -type gcp 2C00:2600 2800:2E00
CMUC00196I failoverpprc: Remote Mirror and Copy pair 2C00:2600 successfully reversed.
CMUC00196I failoverpprc: Remote Mirror and Copy pair 2800:2E00 successfully reversed.
dscli> lspprc 2C00:2600 2800:2E00

====================================================================================================
2800:2E00 Suspended Host Source Global Copy 28 60 Disabled True
2C00:2600 Suspended Host Source Global Copy 2C 60 Disabled True
dscli>
5. Refresh and check the PVIDs. Then import and vary off the volume group as shown in
Example 13-17.
Example 13-17 Importing the volume group txvg on the secondary site node, robert
root@robert: rmdev -dl hdisk2
hdisk2 deleted
root@robert: rmdev -dl hdisk6
hdisk6 deleted
root@robert: cfgmgr
root@robert: lspv |grep -e hdisk2 -e hdisk6
root@robert: importvg -V 50 -y txvg hdisk2
txvg
root@robert: lsvg -l txvg
txvg:
txlv jfs2 250 250 2 closed/syncd /txro
txloglv jfs2log 1 1 1 closed/syncd N/A
root@robert: varyoffvg txvg
6. Re-establish the Global Copy relationship as shown in Example 13-18.
Example 13-18 Re-establishing the initial Global Copy relationship

dscli> failbackpprc -type gcp 2600:2C00 2E00:2800
CMUC00197I failbackpprc: Remote Mirror and Copy pair 2600:2C00 successfully failed back.
CMUC00197I failbackpprc: Remote Mirror and Copy pair 2E00:2800 successfully failed back.
dscli> lspprc 2600:2C00 2E00:2800
==================================================================================================
dscli> lspprc 2800 2c00

Date/Time: October 6, 2010 4:24:57 AM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
=========================================================================================================
2600:2C00 Target Copy Pending - Global Copy 26 unknown Disabled Invalid
2E00:2800 Target Copy Pending - Global Copy 2E unknown Disabled Invalid
dscli>

13.6 Configuring the cluster
To configure the cluster, you must complete all software prerequisites. Also you must
configure the /etc/hosts file properly, and verify that the clcomdES subsystem is running on
each node.
To configure the cluster, follow these steps:

1. Add a cluster.
2. Add all three nodes.
3. Add both sites.
4. Add the XD_ip network.
5. Add the disk heartbeat network.
6. Add the base interfaces to XD_ip network.
7. Add the service IP address.
8. Add the DS8000 Global Mirror replicated resources.
9. Add a resource group.
10.Add a service IP, application server, volume group, and DS8000 Global Mirror Replicated
Resource to the resource group.
13.6.1 Configuring the cluster topology

Configuring a cluster entails the following tasks:
Adding a cluster
Adding nodes
Adding sites
Adding networks
Adding communication interfaces
Adding a cluster
To add a cluster, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select Extended Configuration  Extended Topology Configuration 
Configure an HACMP Cluster  Add/Change/Show an HACMP Cluster.
3. Enter the cluster name, which is Txrmnia in this scenario, as shown in Figure 13-3. Press
Enter.
Add/Change/Show an HACMP Cluster

[Entry Fields]
* Cluster Name [Txrmnia]
Figure 13-3 Adding a cluster in the SMIT menu
The output is displayed in the SMIT Command Status window.

Adding nodes
To add the nodes, follow these steps:
2. In SMIT, select the path Extended Configuration  Extended Topology
Configuration  Configure HACMP Nodes  Add a Node to the HACMP Cluster.
3. Enter the desired node name, which is jordan in this case, as shown in Figure 13-4. Press
Enter.
Add a Node to the HACMP Cluster

[Entry Fields]
* Node Name [jordan]
Communication Path to Node [] +
Figure 13-4 Add a Node SMIT menu
4. In this scenario, repeat these steps two more times to add the additional nodes of leeann
and robert.
Adding sites
Configuration  Configure HACMP Sites  Add a Site.
3. Enter the desired site name, which in this scenario is the Texas site with the nodes jordan
and leeann, as shown in Figure 13-5. Press Enter.
Add a Site

[Entry Fields]
* Site Name [Texas] +
* Site Nodes jordan leeann +
Figure 13-5 Add a Site SMIT menu
4. In this scenario, repeat these steps to add the Romania site with the robert node.

Example 13-19 shows the site definitions. The dominance information is displayed, but not
relevant until a resource group is defined later by using the nodes.
Example 13-19 cllssite information about site definitions

./cllssite
----------------------------------------------------
Sitename Site Nodes Dominance Protection Type
---------------------------------------------------
Texas jordan leeann NONE
Romania robert NONE
Adding networks
Configuration  Configure HACMP Networks  Add a Network to the HACMP
Cluster.
3. Choose the desired network type, which in this scenario is XD_ip.
4. Keep the default network name and press Enter (Figure 13-6).
Add an IP-Based Network to the HACMP Cluster

[Entry Fields]
* Network Name [net_XD_ip_01]
* Network Type XD_ip
* Netmask(IPv4)/Prefix Length(IPv6) [255.255.255.0]
* Enable IP Address Takeover via IP Aliases [Yes] +
IP Address Offset for Heartbeating over IP Aliases []
Figure 13-6 Add an IP-Based Network SMIT menu
5. Repeat these steps but select a network type of diskhb for the disk heartbeat network and
keep the default network name of net_diskhb_01.
Adding communication interfaces

Configuration  Configure HACMP Communication Interfaces/Devices  Add
Communication Interfaces/Devices  Add Pre-defined Communication Interfaces
and Devices  Communication Interfaces.
3. Select the previously created network, which in this scenario is net_XD_ip_01.

4. Complete the SMIT menu fields. The first interface in this scenario is for jordan is shown
in Figure 13-7. Press Enter.
Add a Communication Interface

[Entry Fields]
* IP Label/Address [jordan_base] +
* Network Type XD_ip
* Network Name net_XD_ip_01
* Node Name [jordan] +
Figure 13-7 Add communication interface SMIT menu
5. Repeat these steps and select Communication Devices to complete the disk heartbeat
network.
The topology is now configured. Also you can see all the interfaces and devices from the
cllsif command output shown in Figure 13-8.
Adapter Type Network Net Type Attribute Node IP Address

jordan_base boot net_XD_ip_01 XD_ip public jordan 9.3.207.209
jordandhb service net_diskhb_01 diskhb serial jordan /dev/hdisk8
leeann_base boot net_XD_ip_01 XD_ip public leeann 9.3.207.208
leeanndhb service net_diskhb_01 diskhb serial leeann /dev/hdisk8
robert_base boot net_XD_ip_01 XD_ip public robert 9.3.207.207
Figure 13-8 Cluster interfaces and devices defined
13.6.2 Configuring cluster resources and resource group

The test scenario has only one resource group, which contains the resources of the service
IP address, volume group, and DS8000 replicated resources. Configure the cluster resources
and resource group as explained in the following sections.
Defining the service IP

Define the service IP by following these steps:
2. In SMIT, select the path Extended Configuration  Extended Resource
Configuration  HACMP Extended Resources Configuration  Configure HACMP
Service IP Labels/Addresses  Add a Service IP Label/Address  Configurable on
Multiple Nodes.
3. Choose the net_XD_ip_01 network and press Enter.
4. Choose the appropriate IP label or address. Press Enter.

In this scenario, we added serviceip_2, as shown in Figure 13-9.
Add a Service IP Label/Address configurable on Multiple Nodes (extended)

[Entry Fields]
* IP Label/Address serviceip_2 +
Netmask(IPv4)/Prefix Length(IPv6) []
* Network Name net_XD_ip_01
Alternate HW Address to accompany IP Label/Address []
Associated Site ignore
Figure 13-9 Add a Service IP Label SMIT menu
In most true site scenarios, where each site is on different segments, it is common to create
at least two service IP labels. You create one for each site by using the Associated Site
option, which indicates the desire to have site-specific service IP labels. With this option, you
can have a unique service IP label at each site. However, we do not use them in this test
because we are on the same network segment.
Defining the DS8000 Global Mirror resources

To fully define the Global Mirror resources, follow these steps:
1. Add a storage agent or agents.
2. Add a storage system or systems.
3. Add a mirror group or groups.
Because these options are all new, define each one before you configure them:
Storage agent A generic name given by PowerHA SystemMirror for an entity such as
the IBM DS8000 HMC. Storage agents typically provide a one-point
coordination point and often use TCP/IP as their transport for
communication. You must provide the IP address and authentication
information that will be used to communicate with the HMC.
Storage system A generic name given by PowerHA SystemMirror for an entity such as
a DS8700 Storage Unit. When using Global Mirror, you must associate
one storage agent with each storage system. You must provide the
IBM DS8700 system identifier for the storage system. For example,
IBM.2107-75ABTV1 is a storage identifier for a DS8000 Storage
System.
Mirror group A generic name given by PowerHA SystemMirror for a logical
collection of volumes that must be mirrored to another storage system
that resides on a remote site. A Global Mirror session represents a
mirror group.
Adding a storage agent

To add a storage agent, follow these steps:
Configuration  HACMP Extended Resources Configuration  Configure DS8000
Global Mirror Resources  Configure Storage Agents  Add a Storage Agent.

3. Complete the menu appropriately and press Enter. Figure 13-10 shows the configuration
for this scenario.
Add a Storage Agent

[Entry Fields]
* Storage Agent Name [ds8khmc]
* IP Addresses [9.3.207.122]
* User ID [redbook]
* Password [r3dbook]
Figure 13-10 Add a Storage Agent SMIT menu
It is possible to have multiple storage agents. However, this test scenario has only one
storage agent that manages both storage units.
Important: The user ID and password are stored as flat text in the
HACMPxd_storage_agent.odm file.
Adding a storage system

To add the storage systems, follow these steps:
Global Mirror Resources  Configure Storage Systems  Add a Storage System.
for this scenario.
Add a Storage System

[Entry Fields]
* Storage System Name [texasds8k]
* Storage Agent Name(s) ds8kmainhmc +
* Site Association Texas +
* Vendor Specific Identification [IBM.2107-75DC890] +
* WWNN [5005076308FFC004] +
Figure 13-11 Add a Storage System SMIT menu

4. Repeat these steps for the storage system at Romania site, and name it romaniads8k.
Example 13-20 shows the configuration.
Example 13-20 Storage systems definitions

Storage System Name texasds8k
Storage Agent Name(s) ds8kmainhmc
Site Association Texas
Vendor Specific Identification IBM.2107-75DC890
WWNN 5005076308FFC004
Storage System Name romaniads8k

Storage Agent Name(s) ds8kmainhmc
Site Association Romania
Vendor Specific Identification IBM.2107-75DC980
WWNN 5005076308FFC804
Adding a mirror group

You are now ready to add the storage systems. To add a storage system, perform the
following steps:
Global Mirror Resources  Configure Mirror Groups  Add a Mirror Group.
3. Complete the menu appropriately and press Enter. Figure 13-12 show the configuration
for this scenario.
Add a Mirror Group

[Entry Fields]
* Mirror Group Name [texasmg]
* Storage System Name texasds8k romaniads8k +
* Vendor Specific Identifier [03] +
* Recovery Action automatic +
Maximum Coordination Time [50]
Maximum Drain Time [30]
Consistency Group Interval Time [0]
Figure 13-12 Add a Mirror Group SMIT menu
Vendor Specific Identifier field: For the Vendor Specific Identifier field, provide only the
Global Mirror session number.
Defining a resource group and Global Mirror resources

Now that you have all the components configured that are required for the DS8700 replicated
resource, you can create a resource group and add your resources to it.

Adding a resource group
To add a resource group, follow these steps:
Configuration  HACMP Extended Resources Group Configuration  Add a
Resource Group.
in this scenario. Notice that for the Inter-Site Management Policy, we chose Prefer
Primary Site. This option ensures that resource group starts automatically when the
cluster is started in the primary Texas site.
Add a Resource Group (extended)

[Entry Fields]
* Resource Group Name [ds8kgmrg]
Inter-Site Management Policy [Prefer Primary Site] +

* Participating Nodes from Primary Site [jordan leeann] +
Participating Nodes from Secondary Site [robert] +
Startup Policy Online On Home Node Only+

Fallover Policy Fallover To Next Priority Node > +
Figure 13-13 Add a Resource Group SMIT menu
Adding resources to a resource group

To add resources to a resource group, perform the following steps:
Configuration  Change/Show Resources and Attributes for a Resource Group.
3. Choose the resource group, which in this example is ds8kgmrg.
for this scenario.
In this scenario, we only added a service IP label, the volume group, and the DS8000 Global
Mirror Replicated Resources as shown in the streamlined clshowres command output in
Example 13-21.
Volume group: The volume group names must be listed in the same order as the DS8700
mirror group names in the resource group.
Example 13-21 Resource group attributes and resources

Resource Group Name ds8kgmrg
Inter-site Management Policy Prefer Primary Site
Participating Nodes from Primary Site jordan leeann

Participating Nodes from Secondary Site robert
Service IP Label serviceip_2
Volume Groups txvg +
GENXD Replicated Resources texasmg +
DS8000 Global Mirror Replicated Resources field: In the SMIT menu for adding
resources to the resource group, notice that the appropriate field is named DS8000 Global
Mirror Replicated Resources. However, when viewing the menu by using the clshowres
command (Example 13-21 on page 392), the field is called GENXD Replicated Resources.
You can now synchronize the cluster, start the cluster, and begin testing it.
13.7 Failover testing

This section takes you through basic failover testing scenarios with the DS8000 Global Mirror
replicated resources locally within the site and across sites. You must carefully plan the
testing of a site cluster failover because more time is required to manipulate the secondary
target LUNs at the recovery site. Also when testing the asynchronous replication, because of
the nature of the asynchronous replication, it can also impact the data.
In these scenarios, redundancy tests, such as on IP networks that have only a single network,
cannot be performed. Instead you must configure redundant IP or non-IP communication
paths to avoid isolation of the sites. The loss of all the communication paths between sites
leads to a partitioned state of the cluster. Such a loss also leads to data divergence between
sites if the replication links are also unavailable.
Another specific failure scenario is the loss of replication paths between the storage
subsystems while the cluster is running on both sites. To avoid this type of loss, configure a
redundant PPRC path or links for the replication. You must manually recover the status of the
pairs after the storage links are operational again.
Important: If the PPRC path or link between Global Mirror volumes breaks down, the
PowerHA Enterprise Edition is unaware. The reason is that PowerHA does not process
SNMP for volumes that use DS8700 Global Mirror technology for mirroring. In such a case,
you must identify and correct the PPRC path failure. Depending upon some timing
conditions, such an event can result in the corresponding Global Mirror session going into
a fatal state. In this situation, you must manually stop and restart the corresponding Global
Mirror session (by using the rmgmir and mkgmir DSCLI commands) or an equivalent
DS8700 interface.
This topic takes you through the following tests:

Graceful site failover
Rolling site failure
Site re-integration
Each test, other than the re-integration test, begins in the same initial state of the primary site
hosting the ds8kgmrg resource group on the primary node as shown in Example 13-22 on
page 394. Before each test, we start copying data from another file system to the replicated
file systems. After each test, we verify that the service IP address is online and that new data

is in the file systems. We also had a script that inserted the current time and date, along with
the local node name, into a file on each file system.
Example 13-22 Beginning of the test cluster resource group states

jordan# clRGinfo
-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
ds8kgmrg ONLINE jordan@Texas
OFFLINE leeann@Texas
ONLINE SECONDARY robert@Romania
After each test, we show the Global Mirror states. Example 13-23 shows the normal running
production status of the Global Mirror pairs from each site.
Example 13-23 Beginning states of the Global Mirror pairs

*******************From node jordan at site Texas***************************
dscli> lssession 26 2E
AllowCascading
===========================================================================================================
==============
26 03 CG In Progress 2600 Active Primary Copy Pending Secondary Simplex True
Disable
2E 03 CG In Progress 2E00 Active Primary Copy Pending Secondary Simplex True
Disable
dscli> lspprc 2600 2E00

==================================================================================================
*******************From remote node robert at site Romania***************************
AllowCascading
===========================================================================================================
======

=========================================================================================================

13.7.1 Graceful site failover
Performing a controlled move of a production environment across sites is a basic test to
ensure that the remote site can bring the production environment online. This test is done
only during initial implementation testing or during a planned production outage of the site. In
this test, we perform the graceful failover operation between sites by performing a resource
group move.
In a true maintenance scenario, you might most likely perform a graceful site failover by
stopping the cluster on the local standby node first. Then you stop the cluster on the
production node by using Move Resource Group.
Moving the resource group to another site: In this scenario, because we only have one
node at the Romania site, we use the option to move the resource group to another site. If
multiple remote nodes are members of the resource, use the option to move the resource
group to another node instead.
During this move, the following operations are performed:

Release the primary online instance of ds8kgmrg at the Texas site. This operation entails
the following tasks:
– Executes the application server stop.
– Unmounts the file systems.
– Varies off the volume group.
– Removes the service IP address.
Release the secondary online instance of ds8kgmrg at the Romania site.
Acquire ds8kgmrg in the secondary online state at the Texas site.
Acquire ds8kgmrg in the online primary state at the Romania site.
To perform the resource group move by using SMIT, follow these steps:
2. In SMIT, select the path System Management (C-SPOC)  Resource Groups and
Applications  Move a Resource Group to Another Node / Site  Move Resource
Groups to Another Site.

3. Select the ONLINE instance of ds8kgmrg to be moved as shown in Figure 13-14.
Move a Resource Group to Another Node / Site
Move Resource Groups to Another Node

Move +--------------------------------------------------------------------------+
| Select a Resource Group |
| |
| Move cursor to desired item and press Enter. Use arrow keys to scroll. |
| |
| # |
| # Resource Group State Node(s) / Site |
| # |
| ds8kgmrg ONLINE jordan / Texas |
| ds8kgmrg ONLINE SECONDARY robert / Romani |
| |
| # |
| # Resource groups in node or site collocation configuration: |
| # Resource Group(s) State Node / Site |
| # |
| |
F1=Help| /=Find n=Find Next |
F9=Shel+--------------------------------------------------------------------------+
Figure 13-14 Selecting a resource group
4. Select the Romania site from the next menu as shown in Figure 13-15.
+--------------------------------------------------------------------------+
| Select a Destination Site |
| |
| |
| # *Denotes Originally Configured Primary Site |
| Romania |
| |
| /=Find n=Find Next |
+--------------------------------------------------------------------------+
Figure 13-15 Selecting a site for a resource group move
5. Verify the information in the final menu and Press Enter.

Upon completion of the move, ds8kgmrg is online on the node robert as shown
Example 13-24.
Attention: During our testing, a problem was encountered. After performing the first
resource group move between sites, we are unable to move it back due to the pick list
for destination site is empty. We can move it back by node. Later in our testing, the
by-site option started working. However, it moved the resource group to the standby
node at the primary site instead of the original primary node. If you encounter similar
problems, contact IBM support.
Example 13-24 Resource group status after the site move to Romania
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
ds8kgmrg ONLINE SECONDARY jordan@Texas
ONLINE robert@Romania
6. Repeat the resource group move to move it back to its original primary site, Texas, and
node, jordan, to return to the original starting state. However, instead of using the option
to move it another site, use the option to move it to another node.
Example 13-25 shows that the Global Mirror statuses are now swapped, and the local site is
showing the LUNs now as the target volumes.
Example 13-25 Global Mirror status after the resource group move
AllowCascading
===========================================================================================================
======
26 03 Normal 2600 Active Primary Simplex Secondary Copy Pending True Disable
2E 03 Normal 2E00 Active Primary Simplex Secondary Copy Pending True Disable

=========================================================================================================
2800:2E00 Target Copy Pending - Global Copy 28 unknown Disabled Invalid
2C00:2600 Target Copy Pending - Global Copy 2C unknown Disabled Invalid
dscli> lssession 28 2C
AllowCascading
===========================================================================================================
==============
Disable
2C 03 CG In Progress 2C00 Active Primary Copy Pending Secondary Simplex True
Disable

dscli> lspprc 2800 2C00
==================================================================================================
2800:2E00 Copy Pending - Global Copy 28 60 Disabled True
2C00:2600 Copy Pending - Global Copy 2C 60 Disabled True
13.7.2 Rolling site failure

This scenario entails performing a rolling site failure of the Texas site by using the following
steps:
1. Halt the primary production node jordan at the Texas site.
2. Verify that the resource group ds8kgmrg is acquired locally by the node leeann.
3. Verify that the Global Mirror pairs are in the same status as before the system failure.
4. Halt the node leeann to produce a site down.
5. Verify that the resource group ds8kgmrg is acquired remotely by the robert node.
6. Verify that the Global Mirror pair states are changed.
Begin with all three nodes active in the cluster and the resource group online on the primary
node as shown in Example 13-22 on page 394.
On the node jordan, we run the reboot -q command. The node leeann acquires the
ds8kgmrg resource group as shown in Example 13-26.
Example 13-26 Local node failover within the site Texas

root@leeann: clRGinfo
------------------------------------------------------------------------------
-----------------------------------------------------------------------------
ds8kgmrg OFFLINE jordan@Texas
ONLINE leeann@Texas
Example 13-27 shows that the statuses are the same as when we started.
Example 13-27 Global Mirror pair status after a local failover

*******************From node leeann at site Texas***************************
AllowCascading
===========================================================================================================
==============
Disable
Disable

==================================================================================================

AllowCascading
===========================================================================================================

=========================================================================================================
Upon the cluster stabilization, we run the reboot -q command on the leeann node invoking a
site_down event. The robert node at the Romania site acquires the ds8kgmrg resource group
Example 13-28 Hard failover between sites

root@robert: clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
ds8kgmrg OFFLINE jordan@Texas
ONLINE robert@Romania
You can also see that the replicated pairs are now in the suspended state at the remote site as
Example 13-29 Global Mirror pair status after site failover

AllowCascading
===========================================================================================================

====================================================================================================
2800:2E00 Suspended Host Source Global Copy 28 60 Disabled False
2C00:2600 Suspended Host Source Global Copy 2C 60 Disabled False

Important: Although the testing resulted in a site_down event, we never lost access to the
primary storage subsystem. PowerHA does not check storage connectivity back to the
primary site during this event. Before moving back to the primary site, re-establish the
replicated pairs and get them all back in sync. If you replace the storage, you might also
have to change the storage agent, storage subsystem, and mirror groups to ensure that
the new configuration is correct for the cluster.
13.7.3 Site re-integration

Before bringing the primary site node back into the cluster, the Global Mirror pairs must be
placed back in sync by using the following steps:
Tip: Follow these steps “as is” because you can accomplish the same results using various
methods:
1. Verify that the Global Mirror statuses at the primary site are suspended.
2. Fail back PPRC from the secondary site.
3. Verify that the Global Mirror status at the primary site shows the target status.
4. Verify that out-of-sync tracks are 0.
5. Stop the cluster to ensure that the volume group I/O is stopped.
6. Fail over the PPRC on the primary site.
7. Fail back the PPRC on the primary site.
8. Start the cluster.
Failing back the PPRC pairs to the secondary site

To fail back the PPRC pairs to the secondary site, follow these steps:
1. Verify the current state of the Global Mirror pairs at the primary site from the jordan node.
The pairs are suspended as shown in Example 13-30.
Example 13-30 Suspended pair status in Global Mirror on the primary site after node restart
dscli> lspprc 2600 2e00
====================================================================================================
2600:2C00 Suspended Host Source Global Copy 26 60 Disabled True
2E00:2800 Suspended Host Source Global Copy 2E 60 Disabled True
2. On the remote node robert, fail back the PPRC pairs as shown in Example 13-31.
Example 13-31 Failing back PPRC pairs at the remote site

*******************From node robert at site Romania***************************
dscli> failbackpprc -type gcp 2C00:2600 2800:2E00
IBM.2107-75DC980
CMUC00197I failbackpprc: Remote Mirror and Copy pair 2C00:2600 successfully failed back.
CMUC00197I failbackpprc: Remote Mirror and Copy pair 2800:2E00 successfully

3. After executing the fallback, check the status again of the pairs from the primary site to
ensure that they are now shown as Target (Example 13-32).
Example 13-32 Verifying that the primary site LUNs are now target LUNs
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First
Pass Status
================================================================================================
=========
2800:2E00 Target Copy Pending - Global Copy 28 unknown Disabled Invalid
2C00:2600 Target Copy Pending - Global Copy 2C unknown Disabled Invalid
4. Monitor that the status of replication at the remote site by watching the Out of Sync Tracks
field by using the lspprc -l command. After they are at 0, as shown in Example 13-33,
they are in sync. Then you can stop the remote site in preparation to move production
back to the primary site.
Example 13-33 Verifying that the Global Mirror pairs are back in sync
dscli> lspprc -l 2800 2c00
Suspended SourceLSS
===========================================================================================================
============
2800:2E00 Copy Pending - Global Copy 0 Disabled Disabled Invalid -
28
2C00:2600 Copy Pending - Global Copy 0 Disabled Disabled Invalid -
2C 6
Failing over the PPRC pairs back to the primary site

To fail over the PPRC pairs back to the primary site, follow these steps:
1. Stop the cluster on node robert by using the smitty clstop command to bring the
resource group down.
2. After the resources are offline, continue to fail over the PPRC on the primary site jordan
node as shown Example 13-34.
Example 13-34 Failover PPRC pairs at local primary site

dscli> failoverpprc -type gcp 2600:2c00 2E00:2800
IBM.2107-75DC890
CMUC00196I failoverpprc: Remote Mirror and Copy pair 2600:2C00 successfully
reversed.
CMUC00196I failoverpprc: Remote Mirror and Copy pair 2E00:2800 successfully
reversed.

3. Again verify that the status is in the suspended state on the primary site and that the
remote site shows the copy state as shown in Example 13-35.
Example 13-35 Global Mirror pairs suspended on the primary site

ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass
Status
================================================================================================
====
2600:2C00 Suspended Host Source Global Copy 26 60 Disabled True
2E00:2800 Suspended Host Source Global Copy 2E 60 Disabled True
******************From node robert at site Romania***************************

ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass
Status
================================================================================================
==
2800:2E00 Copy Pending - Global Copy 28 60 Disabled True
2C00:2600 Copy Pending - Global Copy 2C 60 Disabled True
Failing back the PPRC pairs to the primary site

You cannot complete the switchback to the primary site by performing a failback of the Global
Mirror pairs to the primary site by running the failbackpprc command as shown in
Example 13-36.
Example 13-36 Failing back the PPRC pairs on the primary site
dscli> failbackpprc -type gcp 2600:2c00 2E00:2800
IBM.2107-75DC890
CMUC00197I failbackpprc: Remote Mirror and Copy pair 2600:2C00 successfully failed back.
CMUC00197I failbackpprc: Remote Mirror and Copy pair 2E00:2800 successfully failed back.
Verify the status of the pairs at each site as shown in Example 13-37.
Example 13-37 Global Mirror pairs failed back to the primary site
==================================================================================================


=========================================================================================================
Starting the cluster

To start the cluster, follow these steps:
1. Start all nodes in the cluster by using the smitty clstart command as shown
Figure 13-16.
Start Cluster Services

[Entry Fields]
* Start now, on system restart or both now +
Start Cluster Services on these nodes [jordan,leeann,robert] +
* Manage Resource Groups Automatically +
BROADCAST message at startup? true +
Startup Cluster Information Daemon? true +
Ignore verification errors? false +
Automatically correct errors found during Interactively +
cluster start?
Figure 13-16 Restarting a cluster after a site failure
Upon startup of the primary node jordan, the resource group is automatically started on
jordan and back to the original starting point as shown in Example 13-38.
Example 13-38 Resource group status after restart

-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
ds8kgmrg ONLINE jordan@Texas
2. Verify the pair and session status on each site as shown in Example 13-39.
Example 13-39 Global Mirror pairs back to normal

dscli>lssession 26 2e
AllowCascading
===========================================================================================================
==============
Disable
Disable

==================================================================================================

dscli>lssession 28 2C
AllowCascading
===========================================================================================================
======
28 03 Normal 2800 Active Primary Simplex Secondary Copy Pending True Disable
2C 03 Normal 2C00 Active Primary Simplex Secondary Copy Pending True Disable

=========================================================================================================
13.8 LVM administration of DS8000 Global Mirror replicated

resources
This section provides the common scenarios for adding additional storage to an existing
Global Mirror replicated environment. These scenarios work primarily with the Texas site and
the ds8kgmrg resource group. You perform the following tasks:
Adding a new Global Mirror pair to an existing volume group
Adding a Global Mirror pair into a new volume group
Dynamically expanding a volume: This topic does not provide information about
dynamically expanding a volume because this option is not supported.
13.8.1 Adding a new Global Mirror pair to an existing volume group

To add a new Global Mirror pair to an existing volume group, follow these steps:
1. Assign a new LUN to each site, add the FlashCopy devices, and add the new pair into the
existing session as explained in 13.4.3, “Configuring the Global Mirror relationships” on
page 377. Table 13-2 summarizes the LUNs that are used from each site.
Table 13-2 Summary of the LUNs used on each site
Texas Romania
AIX DISK LSS/VOL ID AIX DISK LSS/VOL ID
hdisk11 2605 hdisk10 2C06

2. Define the new LUNs:
a. Run the cfgmgr command on the primary node jordan.
b. Assign the PVID on the node jordan.
chdev -l hdisk11 -a pv=yes
c. Configure disk and PVID on local node leeann by using the cfgmgr command.
d. Verify that the PVID is displayed by running the lspv command.
e. Pause the PPRC on the primary site.
f. Fail over the PPRC to the secondary site.
g. Configure the disk and PVID on the remote node robert with the cfgmgr command.
h. Verify that the PVID is displayed by running the lspv command.
i. Fail back the PPRC to the primary site.
3. Add the new disk into the volume group by using C-SPOC as follows:
Important: C-SPOC cannot perform the certain LVM operations on nodes at the
remote site (that contain the target volumes). Such operations include those operations
that require nodes at the target site to read from the target volumes. These operations
cause an error message in C-SPOC. This includes functions such as changing file
system size, changing mount point, and adding LVM mirrors. However, nodes on the
same site as the source volumes can successfully perform these tasks. The changes
can be propagated later to the other site by using a lazy update.
For C-SPOC operations to work on all other LVM operations, perform all C-SPOC
operations with the Global Mirror volume pairs in synchronized or consistent states or
the ACTIVE cluster on all nodes.
a. From the command line, type the smitty cl_admin command.

b. In SMIT, select the path System Management (C-SPOC)  Storage  Volume
Groups  Add a Volume to a Volume Group.
c. Select the txvg volume group from the pop-up menu.

d. Select the disk or disks by PVID as shown in Figure 13-17.
Add a Volume to a Volume Group

Change/Show characteristics of a Volume Group
Remove a Volume from a Volume Group
Enable/Disable a Volume Group for Cross-Site LVM Mirroring Verification
+--------------------------------------------------------------------------+
| Physical Volume Names |
| |
| |
| 000a624a987825c8 ( hdisk10 on node robert ) |
| 000a624a987825c8 ( hdisk11 on nodes jordan,leeann ) |
| |
F9+--------------------------------------------------------------------------+
Figure 13-17 Disk selection to add to the volume group
e. Verify the menu information, as shown in Figure 13-18, and press Enter.

[Entry Fields]
VOLUME GROUP name txvg
Node List jordan,leeann,robert
Reference node robert
VOLUME names hdisk10
Figure 13-18 Add a Volume C-SPOC SMIT menu
Upon completion of the C-SPOC operation, the local nodes have been updated but the
remote node has not been updated as shown in Example 13-40. This node was not updated
because the target volumes are not readable until the relationship is swapped. You receive an
error message from C-SPOC, as shown in the note after Example 13-40. However, the lazy
update procedure at the time of failover pulls in the remaining volume group information.
Example 13-40 New disk added to volume group on all nodes

root@jordan: lspv |grep txvg
hdisk11 000a624a987825c8 txvg

root@leeann: lspv |grep txvg
hdisk11 000a624a987825c8 txvg
root@robert: lspv
hdisk10 000a624a987825c8 none
Attention: When using C-SPOC to modify a volume group containing a Global Mirror
replicated resource, you can expect to see the following error message:
cl_extendvg: Error executing clupdatevg txvg 000a624a833e440f on node robert
You do not need to synchronize the cluster because all of these changes are made to an
existing volume group. However, consider running a verification.
Adding a new logical volume

Again you use C-SPOC to add a new logical volume. As noted earlier, this process updates
the local nodes within the site. For the remote site, when a failover occurs, the lazy update
process updates the volume group information as needed. This process also adds a bit of
extra time to the failover time.
To add a new logical volume, follow these steps:

1. From the command line, type the smitty cl_admin command.
2. In SMIT, select the path System Management (C-SPOC)  Storage  Logical
Volumes  Add a Logical Volume.
3. Select the txvg volume group from the pop-up menu.

4. Select the newly added disk hdisk11 as shown in Figure 13-19.
Logical Volumes

+--------------------------------------------------------------------------+
| |
| Move cursor to desired item and press F7. |
| ONE OR MORE items can be selected. |
| Press Enter AFTER making all selections. |
| |
| Auto-select |
| jordan hdisk6 |
| jordan hdisk10 |
| jordan hdisk11 |
| |
| F7=Select F8=Image F10=Exit |
F1| Enter=Do /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 13-19 Choose disk for new logical volume creation

5. Complete the information in the final menu (Figure 13-20), and press Enter.
We added a new logical volume, named pattilv, which consists of 100 logical partitions
(LPARs) and selected raw for the type. We left all other values with their defaults.


VOLUME GROUP name txvg
Node List jordan,leeann,robert
Reference node jordan
* Number of LOGICAL PARTITIONS [100] #
PHYSICAL VOLUME names hdisk11
Logical volume NAME [pattilv]
Logical volume TYPE [raw] +
POSITION on physical volume outer_middle +
RANGE of physical volumes minimum +
MAXIMUM NUMBER of PHYSICAL VOLUMES [] #
to use for allocation
Number of COPIES of each logical 1 +
[MORE...15]
Figure 13-20 New logical volume C-SPOC SMIT menu
6. Upon completion of the C-SPOC operation, verify that the new logical volume is created
locally on node jordan as shown in Example 13-41.
Example 13-41 Newly created logical volume

root@jordan:lsvg -l txvg
txvg:
pattilv raw 100 100 1 closed/syncd N/A
Similar to when you create the volume group, you see an error message (Figure 13-21) about
being unable to update the remote node.
COMMAND STATUS
jordan: pattilv
cl_mklv: Error executing clupdatevg txvg 000a625afe2a4958 on node robert
Figure 13-21 C-SPOC normal error upon logical volume creation

Increasing the size of an existing file system
Again you use C-SPOC to perform this operation. As noted previously, this process updates
the local nodes within the site. For the remote site, when a failover occurs, the lazy update
process updates the volume group information as needed. This process also adds a bit of
extra time to the failover time.
To increase the size of an existing file system, follow these steps:

2. In SMIT, select the path System Management (C-SPOC)  Storage  File Systems 
Change / Show Characteristics of a File System.
3. Select the txro file system from the pop-up menu.
4. Complete the information in the final menu, and press Enter. In the example in
Figure 13-22, notice that we change the size from 1024 MB to 1250 MB.
Change/Show Characteristics of a Enhanced Journaled File System


Volume group name txvg
* Node Names robert,leeann,jordan
* File system name /txro

NEW mount point [/txro] /
SIZE of file system
Unit Size Megabytes +
Number of Units [1250] #
Mount GROUP []
Mount AUTOMATICALLY at system restart? no +
PERMISSIONS read/write +
Mount OPTIONS [] +
[MORE...7]
Figure 13-22 Changing the file system size on the final C-SPOC menu
5. Upon completion of the C-SPOC operation, verify that the new file system size locally on
node jordan has increased from 250 LPAR as shown in Example 13-41 on page 409 to
313 LPAR as shown Example 13-42.
Example 13-42 Newly increased file system size

root@jordan:lsvg -l txvg
txvg:
pattilv raw 100 100 1 closed/syncd N/A
A cluster synchronization is not required, because technically the resources have not
changed. All of the changes were made to an existing volume group that is already a resource
in the resource group.

Testing the fallover after making the LVM changes
Because you do not know if the cluster is going to work when you need it, repeat the steps
from 13.7.2, “Rolling site failure” on page 398. The new logical volume pattilv and additional
space on /txro show up on each node. However, a noticeable difference is on the site failover
when the lazy update is performed to update the volume group changes.
13.8.2 Adding a Global Mirror pair into a new volume group

The steps to add a new volume begin the same as the steps in 13.5, “Configuring AIX volume
groups” on page 381. However, for completeness, this section provides an overview of the
steps again and then provide details about the new LUNs to be used.
In this scenario, we re-use the LUNs from the previous section. We removed them from the
volume group and removed the disks for all nodes except the main primary node jordan. In
our process, we cleared the PVID and then assigned a new PVID for a clean start.
Table 13-3 provides a summary of the LUNs that we implemented in each site.
Table 13-3 Summary of the LUNs implemented in each site

Texas Romania
AIX dISK LSS/VOL ID AIX dISK LSS/VOL ID
hdisk11 2605 hdisk10 2C06
Now continue with the following steps, which are the same as those steps for defining new
LUNs:
1. Run the cfgmgr command on the primary node jordan.
2. Assign the PVID on the node jordan:
3. Configure the disk and PVID on the local node leeann by using the cfgmgr command.
4. Verify that PVID shows up by using the lspv command.
5. Pause the PPRC on the primary site.
6. Fail over the PPRC to the secondary site.
7. Fail back the PPRC to the secondary site.
8. Configure the disk and PVID on the remote node robert by using the cfgmgr command.
9. Verify that PVID shows up by using the lspv command.
10.Pause the PPRC on the secondary site.
11.Fail over the PPRC to the primary site.
12.Fail back the PPRC to the primary site.
The main difference between adding a new volume group and extending an existing one is
that, when adding a new volume group, you must swap the pairs twice. When extending an
existing volume group, you can get away with only swapping once.
The main difference between adding a new volume group and extending an existing one is
similar to the original setup where we created all LVM components on the primary site and
swap the PPRC pairs to the remote site to import the volume group and then swap it back.
You can avoid performing two swaps, as we showed, by not choosing to include the third node
when creating the volume group. Then you can swap the pairs, run cfgmgr on the new disk
with the PVID, import the volume group, and swap the pairs back.

Creating a volume group
Create a volume group by using C-SPOC:
2. In SMIT, select the path System Management (C-SPOC)  Storage  Volume
Groups  Create a Volume to a Volume Group.
3. Select the specific nodes. In this case, we chose all three nodes as shown in Figure 13-23.
Volume Groups

+--------------------------------------------------------------------------+
| Node Names |
| |
| |
| > jordan |
| > leeann |
| > robert |
| |
| |
F9+--------------------------------------------------------------------------+
Figure 13-23 Adding a volume group node pick list

4. Select the disk or disks by PVID as shown in Figure 13-24.
Volume Groups


+--------------------------------------------------------------------------+
| |
| |
| 000a624a9bb74ac3 ( hdisk10 on node robert ) |
| 000a624a9bb74ac3 ( hdisk11 on nodes jordan,leeann ) |
| |
F9+--------------------------------------------------------------------------+
Figure 13-24 Selecting the disk or disks for the new volume group pick list

5. Select the volume group type. In this scenario, we select scalable as shown in
Figure 13-25.
Volume Groups


+--------------------------------------------------------------------------+
| Volume Group Type |
| |
| |
| Legacy |
| Original |
| Big |
| Scalable |
| |
F9+--------------------------------------------------------------------------+
Figure 13-25 Choosing the volume group type for the new volume group pick list
6. Select the proper resource group. We select ds8kgmrg as shown in Figure 13-26.
Create a Scalable Volume Group


Node Names jordan,leeann,robert
Resource Group Name [ds8kgmrg] +
PVID 000a624a9bb74ac3
VOLUME GROUP name [princessvg]
Enable Fast Disk Takeover or Concurrent Access Fast Disk Takeover or> +
Volume Group Type Scalable
Maximum Physical Partitions in units of 1024 32 +
Maximum Number of Logical Volumes 256 +
Figure 13-26 Create a Scalable Volume Group (final) menu
7. Select a volume group name. We select princessvg. Then press Enter.

Instead of using C-SPOC, you can perform the steps manually and then import the volume
groups on each node as needed. However, remember to add the volume group into the
resource group after creating it. With C-SPOC, you can automatically add it to the resource
group while you are creating the volume group.
You can also use the C-SPOC CLI commands (Example 13-43). These commands are in the
/usr/es/sbin/cluster/cspoc directory, and all begin with the cli_ prefix. Similar to the SMIT
menus, their operation output is also saved in the cspoc.log file.
Example 13-43 C-SPOC CLI commands

root@jordan: ls cli_*
cli_assign_pvids cli_extendlv cli_mkvg cli_rmlv
cli_chfs cli_extendvg cli_on_cluster cli_rmlvcopy
cli_chlv cli_importvg cli_on_node cli_syncvg
cli_chvg cli_mirrorvg cli_reducevg cli_unmirrorvg
cli_crfs cli_mklv cli_replacepv cli_updatevg
cli_crlvfs cli_mklvcopy cli_rmfs
Upon completion of the C-SPOC operation, the local nodes are updated, but the remote node
is not as shown in Example 13-44. The remote nodes are not updated because the target
volumes are not readable until the relationship is swapped. You see an error message from
C-SPOC as shown in the note following Example 13-44. After you create all LVM structures,
you swap the pairs back to the remote node and import the new volume group and logical
volume.
Example 13-44 New disk added to volume group on all nodes

root@jordan: lspv |grep princessvg
hdisk11 000a624a9bb74ac3 princessvg
root@leeann: lspv |grep princessvg

hdisk11 000a624a9bb74ac3 princessvg
root@robert: lspv |grep princessvg
Attention: When using C-SPOC to add a new volume group that contains a Global Mirror
replicated resource, you might see the following error message:
cl_importvg: Error executing climportvg -V 51 -c -y princessvg -Q
000a624a9bb74ac3 on node robert
While this message is normal, if you select any remote nodes, you can omit the remote
nodes and then you do not see the error message. This step is allowed because you
manually import it anyway.
When creating the volume group, it usually is automatically added to the resource group as
shown in Example 13-45 on page 416. However, with the error message indicted in the
previous attention box, it might not be automatically added. Therefore, double check that the
volume group is added into the resource group before continuing. Otherwise we do not have
to change the resource group any further. The new LUN pairs are added to the same storage
subsystems and the same session (3) that is already defined in the mirror group texasmg.

Example 13-45 New volume group added to existing resource group
Inter-site Management Policy Prefer Primary Site
Participating Nodes from Primary Site jordan leeann
Participating Nodes from Secondary Site robert
Service IP Label serviceip_2
Volume Groups txvg princessvg +
GENXD Replicated Resources texasmg
Adding a new logical volume on the new volume group

You repeat the steps in “Adding a new logical volume” on page 407 to create a new logical
volume, named princesslv, on the newly created volume group, princessvg, as shown in
Example 13-46.
Example 13-46 New logical volume on the newly added volume group
root@jordan: lsvg -l princessvg
princessvg:
princesslv raw 38 38 1 closed/syncd N/A
Importing the new volume group to the remote site

To import the volume group, follow the steps in 13.5.2, “Importing the volume groups in the
remote site” on page 383. As a review, we perform the following steps:
1. Vary off the volume group on the local site.
2. Pause the PPRC pairs on the local site.
3. Fail over the PPRC pairs on the remote site.
4. Fail back the PPRC pairs on the remote site.
5. Import the volume group.
6. Vary off the volume group on the remote site.
7. Pause the PPRC pairs on the remote site.
8. Fail over the PPRC pairs on the local site.
9. Fail back the PPRC pairs on the local site.
Synchronizing and verifying the cluster configuration

You now synchronize the resource group change to include the new volume group that was
added. However, first run a verification only to check for errors. If you find errors, you must fix
them manually because they are not automatically fixed in a running environment.
Then synchronize and verify it:

2. In SMIT, select the path Extended Configuration  Extended Verification and
Synchronization and Verification.

3. Select the options as shown in Figure 13-27.
HACMP Verification and Synchronization (Active Cluster Nodes Exist)

[Entry Fields]

F5=Reset F6=Command F7=Edit F8=Image
Figure 13-27 Extended Verification and Synchronization SMIT menu
4. Verify that the information is correct, and press Enter.
Upon completion, the cluster configuration is synchronize and can now be tested.
Testing the failover after adding a new volume group

Because you do not know if the cluster is going to work when needed, repeat the steps from
13.7.2, “Rolling site failure” on page 398. The new volume group princessvg and logical
volume princesslv are showing up in each node.

14
Chapter 14. Disaster recovery using Hitachi

TrueCopy and Universal
Replicator
This chapter explains how to configure disaster recovery based on IBM PowerHA
SystemMirror for AIX Enterprise Edition using Hitachi TrueCopy/Hitachi Universal Replicator
(HUR) replication services. This support is added in version 6.1 with service pack 3 (SP3).

Planning for TrueCopy/HUR management
Overview of TrueCopy/HUR management
Scenario description
Configuring the TrueCopy/HUR resources
Failover testing
LVM administration of TrueCopy/HUR replicated pairs

14.1 Planning for TrueCopy/HUR management
Proper planning is crucial to the success of any implementation. Plan the storage deployment
and replication necessary for your environment. This process is related to the applications
and middleware that are being deployed in the environment, which can eventually be
managed by PowerHA SystemMirror Enterprise Edition. This topic lightly covers site, network,
storage area network (SAN), and storage planning, which are all key factors. However, the
primary focus of this topic is the software prerequisites and support considerations.
14.1.1 Software prerequisites

The following software is required:
One of the following AIX levels or later:
– AIX 5.3 TL9 and RSCT 2.4.12.0
– AIX 6.1 TL2 SP3 and RSCT 2.5.4.0
Multipathing software
– AIX MPIO
– Hitachi Dynamic Link Manager (HDLM)
PowerHA 6.1 Enterprise Edition with SP3
The following additional file sets are included in SP3, must be installed separately, and
require the acceptance of the license during the installation:
– cluster.es.tc
6.1.0.0 ES HACMP - Hitachi support - Runtime Commands
6.1.0.0 ES HACMP - Hitachi support Commands
– cluster.msg.en_US.tc (optional)
6.1.0.0 HACMP Hitachi support Messages - U.S. English
6.1.0.0 HACMP Hitachi Messages - U.S. English IBM-850
6.1.0.0 HACMP Hitachi Messages – Japanese
6.1.0.0 HACMP Hitachi Messages - Japanese IBM-eucJP
Hitachi Command Control Interface (CCI) Version 01-23-03/06 or later
USPV Microcode Level 60-06-05/00 or later
14.1.2 Minimum connectivity requirements for TrueCopy/HUR

For TrueCopy/HUR connectivity, you must have the following minimum requirements in place:
Ensure connectivity from the local Universal Storage Platform VM (USP VM) to the AIX
host ports.
The external storage ports on the local USP VMs (Data Center 1 and Data Center 2) are
zoned and cabled to their corresponding existing storage systems.
Present both the primary and secondary source devices to the local USP VMs.
Primary and secondary source volumes in the migration group are presented from the
existing storage systems to the corresponding local USP VMs. This step is transparent to
the servers in the migration set. No devices are imported or accessed by the local USP
VMs at this stage.

Establish replication connectivity between the target storage systems.
TrueCopy initiator and MCU target ports are configured on the pair of target USP VMs,
and an MCU/RCU pairing is established to validate the configuration.
Ensure replication connectivity from the local USP VMs to the remote USP VM
TrueCopy/HUR initiator. Also ensure that MCU target ports are configured on the local and
remote USP VMs. In addition, confirm that MCU and RCU pairing is established to
validate the configuration.
For HUR, configure Universal Replicator Journal Groups on local and remote USP VM
storage systems.
Configure the target devices.
Logical devices on the target USP VM devices are formatted and presented to front-end
ports or host storage domains. This way, device sizes, logical unit numbers, host modes,
and presentation worldwide names (WWNs) are identical on the source and target storage
systems. Devices are presented to host storage domains that correspond to both
production and disaster recovery standby servers.
Configure the target zoning.
Zones are defined between servers in the migration group and the target storage system
front-end ports, but new zones are not activated at this point.
Ideally the connectivity is through redundant links, switches, and fabrics to the hosts and
between the storage units themselves.
14.1.3 Considerations
Keep in mind the following considerations for mirroring PowerHA SystemMirror Enterprise
Edition with TrueCopy/HUR:
AIX Virtual SCSI is not supported in this initial release.
Logical Unit Size Expansion (LUSE) for Hitachi is not supported.
Only fence-level NEVER is supported for synchronous mirroring.
Only HUR is supported for asynchronous mirroring.
The dev_name must map to a logical devices, and the dev_group must be defined in the
HORCM_LDEV section of the horcm.conf file.
The PowerHA SystemMirror Enterprise Edition TrueCopy/HUR solution uses dev_group
for any basic operation, such as the pairresync, pairevtwait, or horctakeover operation.
If several dev_names are in a dev_group, the dev_group must be enabled for consistency.
PowerHA SystemMirror Enterprise Edition does not trap Simple Network Management
Protocol (SNMP) notification events for TrueCopy/HUR storage. If a TrueCopy link goes
down when the cluster is up and later the link is repaired, you must manually
resynchronize the pairs.
The creation of pairs is done outside the cluster control. You must create the pairs before
you start the cluster services.
Resource groups that are managed by PowerHA SystemMirror Enterprise Edition cannot
contain volume groups with both TrueCopy/HUR-protected and
non-TrueCopy/HUR-protected disks.
All nodes in the PowerHA SystemMirror Enterprise Edition cluster must use same horcm
instance.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 421
You cannot use Cluster Single Point Of Control (C-SPOC) for the following Logical Volume
Manager (LVM) operations to configure nodes at the remote site that contain the target
volume:
– Creating a volume group
– Operations that require nodes at the target site to write to the target volumes
For example, changing the file system size, changing the mount point, or adding LVM
mirrors cause an error message in C-SPOC. However, nodes on the same site as the
source volumes can successfully perform these tasks. The changes are then
propagated to the other site by using a lazy update.
C-SPOC on other LVM operations: For C-SPOC operations to work on all other LVM
operations, perform all C-SPOC operations when the cluster is active on all PowerHA
SystemMirror Enterprise Edition nodes and the underlying TrueCopy/HUR PAIRs are in
a PAIR state.
14.2 Overview of TrueCopy/HUR management

Hitachi TrueCopy/HUR storage management uses Command Control Interface (CCI)
operations from the AIX operating system and PowerHA SystemMirror Enterprise Edition
environment. PowerHA SystemMirror Enterprise Edition uses these interfaces to discover and
integrate the Hitachi Storage replicated storage into the framework of PowerHA SystemMirror
Enterprise Edition. With this integration, you can manage high availability disaster recovery
(HADR) for applications by using the mirrored storage.
Integration of TrueCopy/HUR and PowerHA SystemMirror Enterprise Edition provides the

following benefits:
Support for the Inter-site Management policy of Prefer Primary Site or Online on Either
Site
Flexible user-customizable resource group policies
Support for cluster verification and synchronization
Limited support for the C-SPOC in PowerHA SystemMirror Enterprise Edition
Automatic failover and re-integration of server nodes attached to pairs of TrueCopy/HUR
disk subsystem within sites and across sites
Automatic management for TrueCopy/HUR links
Management for switching the direction of the TrueCopy/HUR relationships when a site
failure occurs. With this process, the backup site can take control of the managed
resource groups in PowerHA SystemMirror Enterprise Edition from the primary site
14.2.1 Installing the Hitachi CCI software

Use the following steps as a guideline to help you install the Hitachi CCI on the AIX cluster
nodes. You can also find this information in the /usr/sbin/cluster/release_notes_xd file.
However, the release notes only exist if you already have the PowerHA SystemMirror
Enterprise Edition software installed. Always consult the latest version of the Hitachi
Command Control Interface (CCI) User and Reference Guide, MK-90RD011, which you can
download from:
http://communities.vmware.com/servlet/JiveServlet/download/1183307-19474

If you are installing CCI from a CD, use the RMinstsh and RMuninst scripts on the CD to
automatically install and uninstall the CCI software.
Important: You must install the Hitachi CCI software into the /HORCM/usr/bin directory.
Otherwise, you must create a symbolic link to this directory.
For other media, use the instructions in the following sections.
Installing the Hitachi CCI software into a root directory

To install the Hitachi CCI software into the root directory, follow these steps:
1. Insert the installation medium into the proper I/O device.
2. Move to the current root directory:
# cd /
3. Copy all files from the installation medium by using the cpio command:
# cpio -idmu < /dev/XXXX XXXX = I/O device
Preserve the directory structure (d flag) and file modification times (m flag), and copy
unconditionally (u flag). For diskettes, load them sequentially, and repeat the command.
An I/O device name of “floppy disk” designates a surface partition of the raw device file
(unpartitioned raw device file).
4. Execute the Hitachi Open Remote Copy Manager (HORCM) installation command:
# /HORCM/horcminstall.sh
5. Verify installation of the proper version by using the raidqry command:
# raidqry -h
Model: RAID-Manager/AIX
Ver&Rev: 01-23-03/06
Usage: raidqry [options] for HORC
Installing the Hitachi CCI software into a non-root directory

To install the Hitachi CCI software into a non-root directory, follow these steps:
1. Insert the installation medium, such as a CD, into the proper I/O device.
2. Move to the desired directory for CCI. The specified directory must be mounted by a
partition of except root disk or an external disk.
# cd /Specified Directory
Preserve the directory structure (d flag) and file modification times (m flag), and copy
4. Make a symbolic link to the /HORCM directory:
# ln -s /Specified Directory/HORCM /HORCM
5. Run the HORCM installation command:
# raidqry -h
Ver&Rev: 01-23-03/06
Installing a newer version of the Hitachi CCI software

To install a newer version of the CCI software:
1. Confirm that HORCM is not running. If it is running, shut it down:
One CCI instance: # horcmshutdown.sh
Two CCI instances: # horcmshutdown.sh 0 1
If Hitachi TrueCopy commands are running in the interactive mode, terminate the
interactive mode and exit these commands by using the -q option.
2. Insert the installation medium, such as a CD, into the proper I/O device.
3. Move to the directory that contains the HORCM directory as in the following example for
the root directory:
# cd /
Preserve the directory structure (d flag) and file modification times (m flag) and copy
5. Execute the HORCM installation command:
# raidqry -h
Ver&Rev: 01-23-03/06
14.2.2 Overview of the CCI instance

The CCI components on the storage system include the command device or devices and the
Hitachi TrueCopy volumes, ShadowImage volumes, or both. Each CCI instance on a
UNIX/PC server includes the following components:
HORCM:
– Log and trace files
– A command server
– Error monitoring and event reporting files
– A configuration management feature
Configuration definition file that is defined by the user
The Hitachi TrueCopy user execution environment, ShadowImage user execution
environment, or both, which contain the TrueCopy/ShadowImage commands, a command
log, and a monitoring function.

14.2.3 Creating and editing the horcm.conf files
The configuration definition file is a text file that is created and edited by using any standard
text editor, such as the vi editor. A sample configuration definition file, HORCM_CONF
(/HORCM/etc/horcm.conf), is included with the CCI software. Use this file as the basis for
creating your configuration definition files. The system administrator must copy the sample
file, set the necessary parameters in the copied file, and place the copied file in the proper
directory. For detailed descriptions of the configuration definition files for sample CCI
configurations, see the Hitachi Command Control Interface (CCI) User and Reference Guide,
MK-90RD011, which you can download from:
Important: Do not edit the configuration definition file while HORCM is running. Shut down
HORCM, edit the configuration file as needed, and then restart HORCM.
You might have multiple CCI instances, each of which uses its own specific horcm#.conf file.
For example, instance 0 might be horcm0.conf, instance 1 (Example 14-1) might be
horcm1.conf, and so on. The test scenario presented later in this chapter uses instance 2 and
provides examples of the horcm2.conf file on each cluster node.
Example 14-1 The hormc.conf file
Example configuration files:
horcm1.conf file on local node

------------------------------
HORCM_MON
#ip_address => Address of the local node
#ip_address service poll(10ms) timeout(10ms)
10.15.11.194 horcm1 12000 3000
HORCM_CMD
#dev_name => hdisk of Command Device
#UnitID 0 (Serial# eg. 45306)
/dev/hdisk19
HORCM_DEV
#Map dev_grp to LDEV#
#dev_group dev_name port# TargetID LU# MU#
VG01 test01 CL1-B 1 5 0
VG01 work01 CL1-B 1 24 0
VG01 work02 CL1-B 1 25 0
HORCM_INST
#dev_group ip_address service
VG01 10.15.11.195 horcm1
horcm1.conf file on remote node

-------------------------------
HORCM_MON
#ip_address => Address of the local node
10.15.11.195 horcm1 12000 3000
HORCM_CMD
#dev_name => hdisk of Command Device
#UnitID 0 (Serial# eg. 45306)
/dev/hdisk19
HORCM_DEV
#Map dev_grp to LDEV#
#dev_group dev_name port# TargetID LU# MU#
VG01 test01 CL1-B 1 5 0
VG01 work01 CL1-B 1 21 0
VG01 work02 CL1-B 1 22 0
HORCM_INST
VG01 10.15.11.194 horcm1
NOTE 1: For the horcm instance to use any available command device, in case one of
them fails, it is RECOMMENDED that, in your horcm file, under HORCM_CMD
section, the command device, is presented in the format below,
where 10133 is the serial # of the array:
\\.\CMD-10133:/dev/hdisk/
For example:
\\.\CMD-10133:/dev/rhdisk19 /dev/rhdisk20 ( note space in between).
NOTE 2: The Device_File will show "-----" for the "pairdisplay -fd" command,
which will also cause verification to fail, if the ShadowImage license
has not been activated on the storage system and the MU# column is not
empty.
It is therefore recommended that the MU# column be left blank if the
ShadowImage license is NOT activated on the storage system.
Starting the HORCM instances

To start one instance of the CCI, follow these steps:
1. Modify the /etc/services file to register the port name/number (service) of the
configuration definition file. Make the port name/number the same on all servers.
horcm xxxxx/udp xxxxx = the port name/number of horcm.conf
2. Optional: If you want HORCM to start automatically each time the system starts, add
/etc/horcmstart.sh to the system automatic startup file (such as the /sbin/rc file).
3. Run the horcmstart.sh script manually to start the CCI instance:
# horcmstart.sh
4. Set the log directory (HORCC_LOG) in the command execution environment as needed.
5. Optional: If you want to perform Hitachi TrueCopy operations, do not set the HORCC_MRCF
environment variable.
– For the B shell:
# HORCC_MRCF=1
# export HORCC_MRCF

– For the C shell:
# setenv HORCC_MRCF 1
# pairdisplay -g xxxx xxxx = group name
To start two instances of the CCI, follow these steps:

1. Modify the /etc/services file to register the port name/number (service) of each
configuration definition file. The port name/number must be different for each CCI
instance.
horcm0 xxxxx/udp xxxxx = the port name/number for horcm0.conf
horcm1 yyyyy/udp yyyyy = the port name/number for horcm1.conf
2. If you want HORCM to start automatically each time the system starts, add
/etc/horcmstart.sh 0 1 to the system automatic startup file (such as the /sbin/rc file).
3. Run the horcmstart.sh script manually to start the CCI instances:
# horcmstart.sh 0 1
4. Set an instance number to the environment that executes a command:
For the B shell:
# HORCMINST=X X = instance number = 0 or 1
# export HORCMINST
For the C shell:
# setenv HORCMINST X
5. Set the log directory (HORCC_LOG) in the command execution environment as needed.
6. If you want to perform Hitachi TrueCopy operations, do not set the HORCC_MRCF
environment variable.
For B shell:
# HORCC_MRCF=1
# export HORCC_MRCF
For C shell:
# setenv HORCC_MRCF 1
# pairdisplay -g xxxx xxxx = group name
14.3 Scenario description

This scenario uses four nodes, two in each of the two sites: Austin and Miami. Nodes
jessica and bina are in the Austin site, and nodes krod and maddi are in the Miami site. Each
site provides local automatic failover, along with remote recovery for the other site, which is
often referred to as a mutual takeover configuration. Figure 14-1 on page 428 provides a
software and hardware overview of the tested configuration between the two sites.
Austin Miami
AIX 6.1 TL6 AIX 6.1 TL6

PowerHA PowerHA
6.1 SP3 6.1 SP3
CCI01-23-03/06 CCI01-23-03/06
Jessica Bina Krod Maddi
USPV Microcode USPV Microcode

60-06-05/00 60-06-16/00
FC Links
USP-V Ser#45306 USP-VM Ser#35754

hdisc38 hdisc39 hdisc38 hdisc39
truesyncvg TrueCopy sync truesyncvg
hdisc40 hdisc41 hdisc40 hdisc41
ursasyncvg URS async

ursasyncvg
Figure 14-1 Hitachi replication lab environment test configuration1
Each site consists of two type Ethernet networks. In this case, both networks are used for a
public Ethernet and for cross-site networks. Usually the cross-site network is on separate
segments and is an XD_ip network. It is also common to use site-specific service IP labels.
Example 14-2 shows the interlace list from the cluster topology.
Example 14-2 Test topology information

root@jessica: llsif
Adapter Type Network Net Type Attribute Node IP Address
jessica boot net_ether_02 ether public jessica 9.3.207.24
jessicaalt boot net_ether_03 ether public jessica 207.24.1.1
service_1 service net_ether_03 ether public jessica 1.2.3.4
service_2 service net_ether_03 ether public jessica 1.2.3.5
bina boot net_ether_02 ether public bina 9.3.207.77
bina alt boot net_ether_03 ether public bina 207.24.1.2
service_1 service net_ether_03 ether public bina 1.2.3.4
service_2 service net_ether_03 ether public bina 1.2.3.5
krod boot net_ether_02 ether public krod 9.3.207.79
krod alt boot net_ether_03 ether public krod 207.24.1.3
service_1 service net_ether_03 ether public krod 1.2.3.4
service_2 service net_ether_03 ether public krod 1.2.3.5
maddi boot net_ether_02 ether public maddi 9.3.207.78
maddi alt boot net_ether_03 ether public maddi 207.24.1.4
service_1 service net_ether_03 ether public maddi 1.2.3.4
service_2 service net_ether_03 ether public maddi 1.2.3.5
1 Courtesy of Hitachi Data Systems

In this scenario, each node or site has four unique disks defined through each of the two
separate Hitachi storage units. The jessica and bina nodes at the Austin site have two disks,
hdisk38 and hdisk3. These disks are the primary source volumes that use TrueCopy
synchronous replication for the truesyncvg volume group. The other two disks, hdisk40 and
hdisk41, are to be used as the target secondary volumes that use HUR for asynchronous
replication from the Miami site for the ursasyncvg volume group.
The krod and bina nodes at the Miami site have two disks, hdisk38 and hdisk39. These disks
are the secondary target volumes for the TrueCopy synchronous replication of the truesyncvg
volume group from the Austin site. The other two disks, hdisk40 and hdisk41, are to be used
as the primary source volumes for the ursasyncvg volume group that uses HUR for
asynchronous replication.
14.4 Configuring the TrueCopy/HUR resources

This topic explains how to perform the following tasks to configure the resources for
TrueCopy/HUR:
Assigning LUNs to the hosts (host groups)
Creating replicated pairs
Configuring an AIX disk and dev_group association
For each of these tasks, the Hitachi storage units have been added to the SAN fabric and
zoned appropriately. Also, the host groups have been created for the appropriate node
adapters, and the LUNs have been created within the storage unit.
14.4.1 Assigning LUNs to the hosts (host groups)

In this task, you assign LUNs by using the Hitachi Storage Navigator. Although an overview of
the steps is provided, always refer to the official Hitachi documentation for your version as
needed.
To begin, the Hitachi USP-V storage unit is at the Austin site. The host group, JessBina, is
assigned to port CL1-E on the Hitachi storage unit with the serial number 45306. Usually the
host group is assigned to multiple ports for full multipath redundancy.
To assign the LUNs to the hosts, follow these steps:

1. Locate the free LUNs and assign them to the proper host group.
a. Verify whether LUNs are currently assigned by checking the number of paths
associated with the LUN. If the fields are blank, the LUN is currently unassigned.
b. Assign the LUNs. To assign one LUN, click and drag it to a free LUN/LDEV location. To
assign multiple LUNs, hold down the Shift key and click each LUN. Then right-click the
selected LUNs and drag them to a free location.
This free location is indicated by a black and white disk image that also contains no
information in the corresponding attribute columns of LDEV/UUID/Emulation as shown
in Figure 14-2 on page 430.
Figure 14-2 Assigning LUNs to the Austin site nodes2
2. In the path verification window (Figure 14-3), check the information and record the LUN
number and LDEV numbers. You use this information later. However, you can also retrieve
this information from the AIX system after the devices are configured by the host. Click
OK.
Figure 14-3 Checking the paths for the Austin LUNs3

2
Courtesy of Hitachi Data Systems

3. Back on the LUN Manager tab (Figure 14-4), click Apply for these paths to become active
and the assignment to be completed.
Figure 14-4 Applying LUN assignments for Austin4
You have completed assigning four more LUNs for the nodes at the Austin site. However the
lab environment already had several LUNs, including both command and journaling LUNs in
the cluster nodes. These LUNs were added solely for this test scenario.
Important: If these LUNs are the first ones to be allocated to the hosts, you must also
assign the command LUNs. See the appropriate Hitachi documentation as needed.
For the storage unit at the Miami site, repeat the steps that you performed for the Austin site.
The host group, KrodMaddi, is assigned to port CL1-B on the Hitachi UPS-VM storage unit
with the serial number 35764. Usually the host group is assigned to multiple ports for full
multipath redundancy. Figure 14-5 on page 432 shows the result of these steps.
Again record both the LUN numbers and LDEV numbers so that you can easily refer to them
as needed when creating the replicated pairs. The numbers are also required when you add
the LUNs into device groups in the appropriate horcm.conf file.
Figure 14-5 Miami site LUNs assigned5
14.4.2 Creating replicated pairs

PowerHA SystemMirror Enterprise Edition does not create replication pairs by using the
Hitachi interfaces. You must use the Hitachi Storage interfaces to create the same replicated
pairs before using PowerHA SystemMirror Enterprise Edition to achieve an HADR solution.
For information about setting up TrueCopy/HUR pairs, see the Hitachi Command Control
Interface (CCI) User and Reference Guide, MK-90RD011, which you can download from:
You must know exactly which LUNs from each storage unit will be paired together. They must
be the same size. In this case, all of the LUNs that are used are 2 GB in size. The pairing of
LUNs also uses the LDEV numbers. The LDEV numbers are hexadecimal values that also
show up as decimal values on the AIX host.

Table 14-1 translates the LDEV hex values of each LUN and its corresponding decimal value.
Table 14-1 LUN number to LDEV number comparison

Austin - 45306 Miami - 35764
LUN number LDEV-HEX LDEV-DEC number LUN number LDEV-HEX LDEV-DEC number
000A 00:01:10 272 001C 00:01:0C 268
000B 00:01:11 273 001D 00:01:0D 269
000C 00:01:12 274 001E 00:01:0E 271
000D 00:01:13 275 001F 00:01:0E 272
Although the pairing can be done by using the CCI, the example in this section shows how to
create the replicated pairs through the Hitachi Storage Navigator. The appropriate commands
are in the /HORCM/usr/bin directory. In this scenario, none of the devices have been
configured to the AIX cluster nodes.
Creating TrueCopy synchronous pairings

Beginning with the Austin Hitachi unit, create two synchronous TrueCopy replicated pairings.
1. From within Storage Navigator (Figure 14-6), select Go  TrueCopy  Pair Operation.
Figure 14-6 Storage Navigator menu options to perform a pair operation6
2. In the TrueCopy Pair Operation window (Figure 14-7), select the appropriate port, CL-1E,
and find the specific LUNs to use (00-00A and 00-00B).
In this scenario, we have predetermined that we want to pair these LUNs with 00-01C and
00-01D from the Miami Hitachi storage unit on port CL1-B. Notice in the occurrence of
SMPL in the Status column next to the LUNs. SMPL indicates simplex, meaning that no
mirroring is being used with that LUN.
3. Right-click the first Austin LUN (00-00A), and select Paircreate  Synchronize
(Figure 14-7).
Figure 14-7 Creating a TrueCopy synchronous pairing7
6
Courtesy of Hitachi Data Systems

4. In the full synchronous Paircreate menu (Figure 14-8), select the proper port and LUN that
you previously created and recorded. Click Set.
Because we have only one additional remote storage unit, the RCU field already shows
the proper one for Miami.
5. Repeat step 4 for the second LUN pairing. Figure 14-8 shows details of the two pairings.
Figure 14-8 TrueCopy pairings8
6. After you complete the pairing selections, on the Pair Operation tab, verify that the
information is correct and click Apply to apply them all at one time.
Figure 14-9 shows both of the source LUNs in the middle of the pane. It also shows an
overview of which remote LUNs they are to be paired with.
Figure 14-9 Applying TrueCopy pairings9

This step automatically starts copying the LUNs from the local Austin primary source to the
remote Miami secondary source LUNs. You can also right-click a LUN and select Detailed
Information as shown in Figure 14-10.
Figure 14-10 Detailed LUN pairing and copy status information10
After the copy has completed, the status is displayed as PAIR as shown in Figure 14-11. You
can also view this status from the management interface of either one of the storage units.
Figure 14-11 TrueCopy pairing and copy completed11

Creating a Universal Replicator asynchronous pairing
Now switch over to the Miami Hitachi storage unit to create the asynchronous replicated
pairings.
1. From the Storage Navigator, select Go  Universal Replicator  Pair Operation
(Figure 14-12).
Figure 14-12 Menu selection to perform the pair operation12
2. In the Universal Replicator Pair Operation window (Figure 14-13), select the appropriate
port CL-1B and find the specific LUNs that you want to use, which are 00-01E and 00-01F
in this example). We have already predetermined that we want to pair these LUNs with
00-0C and 00-00D from the Austin Hitachi storage unit on port CL1-E.
Right-click one of the desired LUNs and select Paircreate.
Figure 14-13 Selecting Paircreate in the Universal Replicator13

3. In the full synchronous Paircreate window, complete these steps:
a. Select the proper port and LUN that you previously created and recorded.
b. Because we only have one additional remote storage unit, the RCU field already shows
the proper one for Austin.
c. Unlike when using TrueCopy synchronous replication, when using Universal
Replicator, specify a master journal volume (M-JNL), a remote journal volume
(R-JNL), and a consistency (CT) group.
Important: If these are the first Universal Replicator LUNs to be allocated, you must
also assign journaling groups and LUNs for both storage units. Refer to the
appropriate Hitachi Universal Replicator documentation as needed.
We chose ones that have been already previously created in the environment.
d. Click Set
e. Repeat these steps for the second LUN pairing.
Figure 14-14 shows details of the two pairings.
Figure 14-14 Paircreate details in Universal Replicator14
4. After you complete the pairing selections, on the Pair Operation tab, verify that the
information is correct and click Apply to apply them all at one time.
When the pairing is established, the copy automatically begins to synchronize with the
remote LUNs at the Austin site. The status changes to COPY, as shown in Figure 14-15,
until the pairs are in sync. After the pairs are synchronized, their status changes to PAIR.
Figure 14-15 Asynchronous copy in progress in Universal Replicator15

5. Upon completion of the synchronization of the LUNs, configure the LUNs into the AIX
cluster nodes. Figure 14-16 shows an overview of the Hitachi replicated environment.
Figure 14-16 Replicated Hitachi LUN overview16
14.4.3 Configuring an AIX disk and dev_group association

Before you continue with the steps in this section, you must ensure that the Hitachi hdisks are
made available to your nodes. You can run the cfgmgr command to configure the new hdisks.
Also the CCI must already be installed on each cluster node. If you must install the CCI, see
14.2.1, “Installing the Hitachi CCI software” on page 422.
In the test environment, we already have hdisk0-37 on each of the four cluster nodes. After
running the cfgmgr command one each node, one at a time, we now have four additional
disks, hdisk38-hdisk41, as shown in Example 14-3.
Example 14-3 New Hitachi disks

root@jessica:
hdisk38 none None
hdisk39 none None
hdisk40 none None
hdisk41 none None
Although the LUN and LDEV numbers were written down during the initial LUN assignments,
you must identify the correct LDEV numbers of the Hitachi disks and the corresponding AIX
hdisks by performing the following steps:
1. On the PowerHA SystemMirror Enterprise Edition nodes, select the Hitachi disks and the
disks that will be used in the TrueCopy/HUR relationships by running the inqraid
command. Example 14-4 shows hdisk38-hdisk41, which are the Hitachi disks that we just
added.
Example 14-4 Hitachi disks added

root@jessica:
# lsdev -Cc disk|grep hdisk|/HORCM/usr/bin/inqraid
hdisk38 -> [SQ] CL1-E Ser = 45306 LDEV = 272 [HITACHI ] [OPEN-V ]
HORC = P-VOL HOMRCF[MU#0 = SMPL MU#1 = SMPL MU#2 = SMPL]
RAID5[Group 1- 2] SSID = 0x0005
HORC = P-VOL HOMRCF[MU#0 = SMPL MU#1 = SMPL MU#2 = SMPL]
RAID5[Group 1- 2] SSID = 0x0005
HORC = S-VOL HOMRCF[MU#0 = SMPL MU#1 = SMPL MU#2 = SMPL]
RAID5[Group 1- 2] SSID = 0x0005 CTGID = 10
HORC = S-VOL HOMRCF[MU#0 = SMPL MU#1 = SMPL MU#2 = SMPL]
RAID5[Group 1- 2] SSID = 0x0005 CTGID = 10
2. Edit the HORCM LDEV section in the horcm#.conf file to identify the dev_group that will be
managed by PowerHA SystemMirror Enterprise Edition. In this example, we use the
horcm2.conf file.
Hdisk38 (ldev 272) and hdisk39 (ldev 273) are the pair for the synchronous replicated
resource group, which is primary at the Austin site. Hdisk40 (ldev 275) and hdisk41
(ldev276) are the pair for an asynchronous replicated resource, which is primary at the
Miami site.
Specify the device groups (dev_group) in the horcm#.conf file. We are using dev_group
htcdg01 with dev_names htcd01 and htcd02 for the synchronous replicated pairs. For the
asynchronous pairs, we are using dev_group hurdg01 and dev_names hurd01 and hurd02.
The device group names are needed later when checking that status of the replicated
pairs and when defining the replicated pairs as a resource for PowerHA Enterprise Edition
to control.
Important: Do not edit the configuration definition file while HORCM is running. Shut
down HORCM, edit the configuration file as needed, and then restart HORCM.
Example 14-5 shows the horcm2.conf file from the jessica node, at the Austin site.
Because two nodes are at the Austin site, the same updates were performed to the
/etc/horcm2.conf file on the bina node. Notice that you can use either the decimal value
of the LDEV or the hexidecimal value.
We specifically did one pair each way just to show it and to demonstrate that it works.
Although several groups were already defined, only those that are relevant to this scenario
are shown.
Example 14-5 Horcm2.conf file used for the Austin site nodes
root@jessica:
/etc/horcm2.conf
HORCM_MON
#Address of local node...

r9r3m11.austin.ibm.com 52323 1000 3000
HORCM_CMD
#hdisk of Command Device...
#dev_name dev_name dev_name
#UnitID 0 (Serial# 45306)
#/dev/rhdisk10
\\.\CMD-45306:/dev/rhdisk10 /dev/rhdisk14
HORCM_LDEV
#Map dev_grp to LDEV#...
#dev_group dev_name Serial# CU:LDEV MU# siteA siteB
# (LDEV#) hdisk -> hdisk
#--------- --------- ------- -------- --- --------------------
htcdg01 htcd01 45306 272
htcdg01 htcd02 45306 273
hurdg01 hurd01 45306 01:12
hurdg01 hurd02 45306 01:13
# Address of remote node for each dev_grp...

HORCM_INST
htcdg01 maddi.austin.ibm.com 52323
hurdg01 maddi.austin.ibm.com 52323
For the krod and maddi nodes at the Miami site, the dev_groups, dev_names, and the
LDEV numbers are the same. The difference is the specific serial number of the storage
unit at that site. Also, the remote system or IP address for the appropriate system in the
Austin site.
Example 14-6 shows the horcm2.conf file that we used for both nodes in the Miami site.
Notice that, for the ip_address fields, fully qualified names are used instead of the IP
address. As long as these names are resolvable, the format is still valid. However, the
format is seen using the actual addresses as shown in Example 14-1 on page 425.
Example 14-6 The horcm2.conf file used for the nodes in the Miami site
root@krod:
horcm2.conf
HORCM_MON
#Address of local node...
r9r3m13.austin.ibm.com 52323 1000 3000
HORCM_CMD
#hdisk of Command Device...
#dev_name dev_name dev_name
#UnitID 0 (Serial# 35764)
#/dev/rhdisk10
# /dev/hdisk19
\\.\CMD-45306:/dev/rhdisk11 /dev/rhdisk19
#HUR_GROUP HUR_103_153 45306 01:53 0

htcdg01 htcd01 35764 268
htcdg01 htcd02 35764 269
hurdg01 hurd01 35764 01:0E
hurdg01 hurd02 35764 01:0F
# Address of remote node for each dev_grp...
HORCM_INST
htcdg01 bina.austin.ibm.com 52323
hurdg01 bina.austin.ibm.com 52323
3. Map the TrueCopy-protected hdisks to the TrueCopy device groups by using the raidscan
command. In the following example, 2 is the HORCM instance number:
lsdev -Cc disk|grep hdisk | /HORCM/usr/bin/raidscan -IH2 -find inst
The -find inst option of the raidscan command registers the device file name (hdisk) to
all mirror descriptors of the LDEV map table for HORCM. This option also permits the
matching volumes on the horcm.conf file in protection mode and is started automatically
by using the /etc/horcmgr command. Therefore you do not need to use this option
normally. This option is terminated to avoid wasteful scanning when the registration has
been finished based on HORCM.
Therefore, if HORCM no longer needs the registration, then no further action is taken and
it exits. You can use the -find inst option with the -fx option to view LDEV numbers in
the hexadecimal format.
4. Verify that the PAIRs are established by running either the pairvdisplay command or the
pairvolchk command against the device groups htcdg01 and hurdg01.
Example 14-7 shows how we use the pairvdisplay command. For device group htcdg01,
the status of PAIR and fence of NEVER indicates that they are a synchronous pair. For
device group hurdg01, the ASYNC fence option clearly indicates that it is in an
asynchronous pair. Also notice that the CTG field shows the consistency group number for
the asynchronous pair managed by HUR.
Example 14-7 The pairdisplay command to verify that the pair status is synchronized
# pairdisplay -g htcdg01 -IH2 -fe
Group PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M CTG JID AP
htcdg01 htcd01(L) (CL1-E-0, 0, 10)45306 272.P-VOL PAIR NEVER ,35764 268 - - - 1
htcdg01 htcd01(R) (CL1-B-0, 0, 28)35764 268.S-VOL PAIR NEVER ,----- 272 - - - -
# pairdisplay -g hurdg01 -IH2 -fe

hurdg01 hurd01(L) (CL1-E-0, 0, 12)45306 274.S-VOL PAIR ASYNC ,----- 270 - 10 3 1
hurdg01 hurd01(R) (CL1-B-0, 0, 30)35764 270.P-VOL PAIR ASYNC ,45306 274 - 10 3 2
hurdg01 hurd02(L) (CL1-E-0, 0, 13)45306 275.S-VOL PAIR ASYNC ,----- 271 - 10 3 1
hurdg01 hurd02(R) (CL1-B-0, 0, 31)35764 271.P-VOL PAIR ASYNC ,45306 275 - 10 3 2
To show the output in Example 14-7, we removed the last three columns of the output
because it was not relevant to what we are checking.

Unestablished pairs: If pairs are not yet established, the status is displayed as SMPL. To
continue, you must create the pairs. For instructions about creating pairs from the
command line, see the Hitachi Command Control Interface (CCI) User and Reference
Guide, MK-90RD011, which you can download from:
Otherwise, if you are using Storage Navigator, see 14.4.2, “Creating replicated pairs” on
page 432.
Creating volume groups and file systems on replicated disks

After identifying the hdisks and dev_groups that will be managed by PowerHA SystemMirror
Enterprise Edition, you must create the volume groups and file systems. To set up volume
groups and file systems in the replicated disks, follow these steps:
1. On each of the four PowerHA SystemMirror Enterprise Edition cluster nodes, verify the
next free major number by running the lvlstmajor command on each cluster node. Also
verify that the physical volume name for the file system can also be used across sites.
In this scenario, we use the major numbers 56 for the truesyncvg volume group and 57 for
the ursasyncvg volume group. We use these numbers later when importing the volume to
the other cluster nodes. Although the major numbers are not required to match, it is a
preferred practice.
We create the truesyncvg scalable volume group on the jessica node where the primary
LUNs are located. We also create the logical volumes, jfslog, and file systems as shown
in Example 14-8.
Example 14-8 Details about the truesyncvg volume group

root@jessica:lsvg truesyncvg
VOLUME GROUP: truesyncvg VG IDENTIFIER:
00cb14ce00004c000000012b564c41b9
VG STATE: active PP SIZE: 4 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 988 (3952 megabytes)
MAX LVs: 256 FREE PPs: 737 (2948 megabytes)
LVs: 3 USED PPs: 251 (1004 megabytes)
OPEN LVs: 3 QUORUM: 2 (Enabled)
TOTAL PVs: 2 VG DESCRIPTORS: 3
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 2 AUTO ON: no
MAX PPs per VG: 32768 MAX PVs: 1024
LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
PV RESTRICTION: none
root@jessica:lsvg -l truesyncvg
lsvg -l truesyncvg
truesyncvg:
oreolv jfs2 125 125 1 closed/syncd /oreofs
majorlv jfs2 125 125 1 closed/syncd /majorfs
truefsloglv jfs2log 1 1 1 closed/syncd N/A
We create the ursasyncvg big volume group on the krod node where the primary LUNs
are located. We also create the logical volumes, jfslog, and file systems as shown in
Example 14-9.
Example 14-9 Ursasyncvg volume group information

root@krod:lspv
hdisk40 00cb14ce5676ad24 ursasyncvg active
hdisk41 00cb14ce5676afcf ursasyncvg active
root@krod:lsvg ursasyncvg
VOLUME GROUP: ursasyncvg VG IDENTIFIER:
00cb14ce00004c000000012b5676b11e
VG STATE: active PP SIZE: 4 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 1018 (4072 megabytes)
MAX LVs: 512 FREE PPs: 596 (2384 megabytes)
LVs: 3 USED PPs: 422 (1688 megabytes)
OPEN LVs: 3 QUORUM: 2 (Enabled)
TOTAL PVs: 2 VG DESCRIPTORS: 3
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 2 AUTO ON: no
MAX PPs per VG: 130048
MAX PPs per PV: 1016 MAX PVs: 128
LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
root@krod:lsvg -l ursasyncvg
ursasyncvg:
ursfsloglv jfs2log 2 2 1 closed/syncd N/A
hannahlv jfs2 200 200 1 closed/syncd /hannahfs
julielv jfs2 220 220 1 closed/syncd /juliefs
2. Vary off the newly created volume groups by running the varyoffvg command. To import
the volume groups onto the other three systems, the pairs must be in sync.
We execute the pairresync command as shown in Example 14-10 on the local disks and
make sure that they are in the PAIR state. This process verifies that the local disk
information has been copied to the remote storage. Notice that the command is being run
on the respective node that contains the primary source LUNs and where the volume
groups are created.
Example 14-10 Pairresync command

#root@jessica:pairresync -g htcdg01 -IH2
#root@krod:pairresync -g hurdg01 -IH2
Verify that the pairs are in sync with the pairdisplay command as shown in Example 14-7
on page 446.

3. Split the pair relationship so that the remote systems can import the volume groups as
needed on each node. Run the pairsplit command against the device group as shown in
Example 14-11.
Example 14-11 The pairsplit command to suspend replication

root@jessica: pairsplit -g htcdg01 -IH2
root@krod: pairsplit -g hurdg01 -IH2
To verify that the pairs are split, check the status by using the pairdisplay command.
Example 14-12 shows that the pairs are in a suspended state.
Example 14-12 Pairdisplay shows pairs suspended

root@jessica: pairdisplay -g htcdg01 -IH2 -fe
htcdg01 htcd01(L) (CL1-E-0, 0, 10)45306 272.P-VOL PSUS NEVER ,35764 268 - - - 1
htcdg01 htcd01(R) (CL1-B-0, 0, 28)35764 268.S-VOL SSUS NEVER ,----- 272 - - - -
htcdg01 htcd02(L) (CL1-E-0, 0, 11)45306 273.P-VOL PSUS NEVER ,35764 269 - - - 1
htcdg01 htcd02(R) (CL1-B-0, 0, 29)35764 269.S-VOL SSUS NEVER ,----- 273 - - - -
root@krod: pairdisplay -g hurdg01 -IH2 -fe

hurdg01 hurd01(L) (CL1-B-0, 0, 30)35764 270.P-VOL PSUS ASYNC ,45306 274 - 10 3 2
hurdg01 hurd01(R) (CL1-E-0, 0, 12)45306 274.S-VOL SSUS ASYNC ,----- 270 - 10 3 1
hurdg01 hurd02(L) (CL1-B-0, 0, 31)35764 271.P-VOL PSUS ASYNC ,45306 275 - 10 3 2
hurdg01 hurd02(R) (CL1-E-0, 0, 13)45306 275.S-VOL SSUS ASYNC ,----- 271 - 10 3 1
4. To import the volume groups on the remaining nodes, ensure that the PVID is present on
the disks by using one of the following options:
– Run the rmdev -dl command for each hdisk and then run the cfgmgr command.
– Run the appropriate chdev command against each disk to pull in the PVID.
As shown in Example 14-13, we use the chdev command on each of the three additional
nodes.
Example 14-13 The chdev command to acquire the PVIDs

root@jessica: chdev -l hdisk40 -a pv=yes
root@jessica: chdev -l hdisk41 -a pv=yes
root@bina: chdev -l hdisk38 -a pv=yes

root@krod: chdev -l hdisk38 -a pv=yes

root@krod: chdev -l hdisk39 -a pv=yes
root@maddi: chdev -l hdisk38 -a pv=yes

5. Verify that the PVIDs are correctly showing on each system by running the lspv command
as shown in Example 14-14. Because all four of the nodes have the exact hdisk
numbering, we show the output only from one node, the bina node.
Example 14-14 LSPV listing to verify PVIDs are present

bina@root: lspv
hdisk38 00cb14ce564c3f44 none
hdisk39 00cb14ce564c40fb none
hdisk40 00cb14ce5676ad24 none
hdisk41 00cb14ce5676afcf none
6. Import the volume groups on each node as needed by using the importvg command.
Specify the major number that you used earlier.
7. Disable both the auto varyon and quorum settings of the volume groups by using the chvg
command.
8. Vary off the volume group as shown in Example 14-15.
Attention: PowerHA SystemMirror Enterprise Edition attempts to automatically set the

AUTO VARYON to NO during verification, except in the case of remote TrueCopy/HUR.
Example 14-15 Importing the replicated volume groups

root@jessica: importvg -y ursasyncvg -V 57 hdisk40
root@jessica: chvg -a n -Q n ursasyncvg
root@jessica: varyoffvg ursasyncvg
root@bina: importvg -y truesyncvg -V 56 hdisk38

root@bina: importvg -y ursasyncvg -V 57 hdisk40
root@bina: chvg -a n -Q n truesyncvg
root@bina: chvg -a n -Q n ursasyncvg
root@bina: varyoffvg truesyncvg
root@bina: varyoffvg ursasyncvg
root@krod: importvg -y truesyncvg -V 56 hdisk38

root@krod: chvg -a n -Q n truesyncvg
root@krod: varyoffvg truesyncvg
root@maddi: importvg -y truesyncvg -V 56 hdisk38

root@maddi: importvg -y ursasyncvg -V 57 hdisk40
root@maddi: chvg -a n -Q n truesyncvg
root@maddi: chvg -a n -Q n ursasyncvg
root@maddi: varyoffvg truesyncvg
root@maddi: varyoffvg ursasyncvg
9. Re-establish the pairs that you split in step 3 on page 449 by running the pairresync
command again as shown in Example 14-10 on page 448.
10.Verify again if they are in sync by using the pairdisplay command as shown in

14.4.4 Defining TrueCopy/HUR managed replicated resource to PowerHA
To add a replicated resource to be controlled by PowerHA consists of two specific steps per
device group, and four steps overall:
Adding TrueCopy/HUR replicated resources
Adding the TrueCopy/HUR replicated resources to a resource group
Verifying the TrueCopy/HUR configuration
Synchronizing the cluster configuration
In these steps, the cluster topology has been configured, including all four nodes, both sites,
and networks.
Adding TrueCopy/HUR replicated resources

To define a TrueCopy replicated resource, follow these steps:
Configuration  TrueCopy Replicated Resources  Add Hitachi TrueCopy/HUR
Replicated Resource.
3. In the Ad Hitachi TrueCopy/HUR Replication Resource panel, press Enter.
4. Complete the available fields appropriately and press Enter.
In this configuration, we created two replicated resources. One resource is for the
synchronous device group, htcdg01, named trulee. The second resource for the
asynchronous device group, hurdg01, named ursasyncRR. Example 14-16 shows both of the
replicated resources.
Example 14-16 TrueCopy/HUR replicated resource definitions

Add a HITACHI TRUECOPY(R)/HUR Replicated Resource

[Entry Fields]
* TRUECOPY(R)/HUR Resource Name [truelee]
* TRUECOPY(R)/HUR Mode SYNC +
* Device Groups [htcdg01] +
* Recovery Action AUTO +
* Horcm Instance [horcm2]
* Horctakeover Timeout Value [300] #
* Pairevtwait Timeout Value [3600] #
Add a HITACHI TRUECOPY(R)/HUR Replicated Resource

[Entry Fields]
* TRUECOPY(R)/HUR Resource Name [ursasyncRR]
* TRUECOPY(R)/HUR Mode ASYNC +
* Device Groups [hurdg01] +
* Recovery Action AUTO +
* Horcm Instance [horcm2]
* Horctakeover Timeout Value [300] #
* Pairevtwait Timeout Value [3600] #
For a complete list of all of defined TrueCopy/HUR replicated resources, run the cllstc
command, which is in the /usr/es/sbin/cluster/tc/cmds directory. Example 14-17 shows
the output of the cllstc command.
Example 14-17 The cllstc command to list the TrueCopy/HUR replicated resources
root@jessica: cllstc -a
Name CopyMode DeviceGrps RecoveryAction HorcmInstance HorcTimeOut PairevtTimeout
truelee SYNC htcdg01 AUTO horcm2 300 3600
ursasyncRR ASYNC hurdg01 AUTO horcm2 300 3600
Adding the TrueCopy/HUR replicated resources to a resource group

To add a TrueCopy replicated resource to a resource group, follow these steps:
Configuration  Extended Resource Group Configuration.
Depending on whether you are working with an existing resource group or creating a
resource group, the TrueCopy Replicated Resources entry is displayed at the bottom of
the page in SMIT. This entry is a pick list that shows the resource names that are created
in the previous task.
3. Ensure that the volume groups that are selected on the Resource Group configuration
display match the volume groups that are used in the TrueCopy/HUR Replicated
Resource:
– If you are changing an existing resource group, select Change/Show Resource
Group.
– If you are adding a resource group, select Add a Resource Group.
4. In the TrueCopy Replicated Resources field, press F4 for a list of the TrueCopy/HUR
replicated resources that were previously added. Verify that this resource matches the
volume group that is specified.
Important: You cannot mix regular (non-replicated) volume groups and TrueCopy/HUR
replicated volume groups in the same resource group.
Press Enter.
In this scenario, we changed an existing resource group, emlecRG, for the Austin site and
specifically chose a site relationship, also known as an Inter-site Management Policy of Prefer
Primary Site. We added a new resource group, valhallarg, for the Miami site and chose to
use the same site relationship. We also added the additional nodes from each site. We
configured both to failover locally within a site and failover between sites. If a site failure
occurs, the node falls over to the remote site standby node, but never to the remote
production node.

Example 14-18 shows the relevant resource group information.
Example 14-18 Resource groups for the TrueCopy/HUR replicated resources

Resource Group Name emlecRG
Participating Node Name(s) jessica bina maddi
Site Relationship Prefer Primary Site
Node Priority
Service IP Label service_1
Volume Groups truesyncvg
Hitachi TrueCopy Replicated Resources truelee
Resource Group Name valhallaRG

Participating Node Name(s) krod maddi bina
Node Priority
Volume Groups ursasyncvg
Hitachi TrueCopy Replicated Resources ursasyncRR
Verifying the TrueCopy/HUR configuration

Before synchronizing the new cluster configuration, verify the TrueCopy/HUR configuration:
1. To verify the configuration, run the following command:
/usr/es/sbin/cluster/tc/utils/cl_verify_tc_config
2. Correct any configuration errors that are shown.
If you see error messages such as those shown in Figure 14-17, usually these types of
messages indicate that the raidscan command was not run or was run incorrectly. See
step 3 on page 449 in “Creating volume groups and file systems on replicated disks” on
page 447.
3. Run the script again.
Figure 14-17 Error messages found during TrueCopy/HUR replicated resource verification
Synchronizing the cluster configuration
You must verify the PowerHA SystemMirror Enterprise Edition cluster and the TrueCopy/HUR
configuration before you can synchronize the cluster. To propagate the new TrueCopy/HUR
configuration information and the additional resource group that were created across the
cluster, follow these steps:
2. In SMIT, select Extended Configuration  Extended Verification and
Synchronization.
3. In the Verify Synchronize or Both field select Synchronize. In the Automatically correct
errors found during verification field select No. Press Enter.
14.5 Failover testing

This topic explains the basic failover testing of the TrueCopy/HUR replicated resources locally
within the site and across sites. You must carefully plan the testing of the site cluster failover
because it requires more time to manipulate the secondary target LUNs at the recovery site.
Also when testing the asynchronous replication, because of the nature of asynchronous
replication, testing can also impact the data.
These scenarios do not entail performing a redundancy test with the IP networks. Instead you
configure redundant IP or non-IP communication paths to avoid isolation of the sites. The loss
of all the communication paths between sites leads to a partitioned state of the cluster and to
data divergence between sites if the replication links are also unavailable.
Another specific failure scenario is the loss of the replication paths between the storage
subsystems while the cluster is running on both sites. To avoid this situation, configure
redundant communication links for TrueCopy/HUR replication. You must manually recover the
status of the pairs after the storage links are operational again.
Important: PowerHA SystemMirror Enterprise Edition does not trap SNMP notification
events for TrueCopy/HUR storage. If a TrueCopy link goes down when the cluster is up and
the link is repaired later, you must manually resynchronize the pairs.
This topic explains how to perform the following tests for each site and resource group:
Graceful site failover for the Austin site
Rolling site failure of the Austin site
Site re-integration for the Austin site
Graceful site failover for the Miami site
Rolling site failure of the Miami site
Site re-integration for the Miami site

Each test, except for the last re-integration test, begins in the same initial state of each site
hosting its own production resource group on the primary node as shown in Example 14-19.
Example 14-19 Beginning of test cluster resource group states

clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
emlecRG ONLINE jessica@Austin
OFFLINE bina@Austin
ONLINE SECONDARY maddi@Miami
valhallaRG ONLINE krod@Miami
OFFLINE maddi@Miami
ONLINE SECONDARY bina@Austin
Before each test, we start copying data from another file system to the replicated file systems.
After each test, we verify that the site service IP address is online and new data is in the file
systems. We also had a script that inserts the current time and date into a file on each file
system. Because of the small amounts of I/O in our environment, we were unable to
determine to have lost any data in the asynchronous replication either.
14.5.1 Graceful site failover for the Austin site

Performing a controlled move of a production environment across sites is a basic test to
ensure that the remote site can bring the production environment online. However, this task is
done only during initial implementation testing or during a planned production outage of the
site. You perform the graceful failover operation between sites by performing a resource group
move.
In a true maintenance scenario, you most likely perform this task by stopping the cluster on
the local standby node first. Then you stop the cluster on the production node by using the
Move Resource Group. You perform the following operations during this move:
Releasing the primary online instance of emlecRG at the Austin site
– Executes application server stop
– Unmounts the file systems
– Varies off the volume group
– Removes the service IP address
Releasing the secondary online instance of emlecRG at the Miami site.
Acquire the emlecRG resource group in the secondary online state at Austin site.
Acquire the emlecRG resource group in the online primary state at the Miami site.
To move the resource group by using SMIT, follow these steps:

3. In the Move a Resource Group to Another Node / Site panel (Figure 14-18), select the
ONLINE instance of the emlecRG resource group to be moved.
Move a Resource Group to Another Node / Site

+--------------------------------------------------------------------------+
| Select Resource Group(s) |
| |
| Move cursor to desired item and press Enter. Use arrow keys to scroll. |
| |
| # |
| # Resource Group State Node(s) / Site |
| # |
| emlecRG ONLINE jessica / Austi |
| emlecRG ONLINE SECONDARY maddi / Miami |
| valhallarg ONLINE krod / Miami |
| |
| # |
| # Resource groups in node or site collocation configuration: |
| # Resource Group(s) State Node / Site |
| # |
| |
F9+--------------------------------------------------------------------------+
Figure 14-18 Moving the Austin resource group across to site Miami
4. In the Select a Destination Site panel, select the Miami site as shown in Figure 14-19.
+--------------------------------------------------------------------------+
| Select a Destination Site |
| |
| |
| # *Denotes Originally Configured Primary Site |
| Miami |
| |
F9+--------------------------------------------------------------------------+
Figure 14-19 Selecting the site for resource group move

5. Verify the information in the final menu and Press Enter.
Upon completion of the move, emlecRG is online on the maddi node at the Miami site as
Example 14-20 Resource group status after a move to the Miami site
root@maddi# clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
emlecRG ONLINE SECONDARY jessica@Austin
OFFLINE bina@Austin
ONLINE maddi@Miami
valhallarg ONLINE krod@Miami

OFFLINE maddi@Miami
OFFLINE bina@Austin
6. Repeat the resource group move to move it back to its original primary site and node to
return to the original starting state.
Attention: In our environment, after the first resource group move between sites, we were
unable to move the resource group back without leaving the pick list for the destination site
empty. However, we were able to move it back by node, instead of by site. Later in our
testing, the by-site option started working, but it moved it to the standby node at the
primary site instead of the original primary node. If you encounter similar problems, contact
IBM support.
14.5.2 Rolling site failure of the Austin site

In this scenario, you perform a rolling site failure of the Austin site by performing the following
tasks:
1. Halt the primary production node jessica at the Austin site.
2. Verify that the resource group emlecRG is acquired locally by the bina node.
3. Halt the bina node to produce a site down.
4. Verify that the resource group emlecRG is acquired remotely by the maddi node.
To begin, all four nodes are active in the cluster and the resource groups are online on the
primary node as shown in Example 14-19 on page 455.
1. On the jessica node, run the reboot -q command. The bina node acquires the emlecRG
resource group as shown in Example 14-21.
Example 14-21 Local node failover within the Austin site

root@bina: clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
emlecRG OFFLINE jessica@Austin
ONLINE bina@Austin
OFFLINE maddi@Miami
OFFLINE maddi@Miami
2. Run the pairdisplay command (as shown in Example 14-22) to verify that the pairs are
still established because the volume group is still active on the primary site.
Example 14-22 Pairdisplay status after a local site failover

root@bina: pairdisplay -g htcdg01 -IH2 -fe
3. Upon cluster stabilization, run the reboot -q command on the bina node. The maddi node
at the Miami site acquires the emlecRG resource group as shown in Example 14-23.
Example 14-23 Hard failover between sites

root@maddi: clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
emlecRG OFFLINE jessica@Austin
OFFLINE bina@Austin
ONLINE maddi@Miami

OFFLINE maddi@Miami
OFFLINE bina@Austin
4. Verify that the replicated pairs are now in the suspended state from the command line as
Example 14-24 Pairdisplay status after a hard site failover

root@maddi: pairdisplay -g htcdg01 -IH2 -fe
htcdg01 htcd01(L) (CL1-B-0, 0, 28)35764 268.S-VOL SSUS NEVER ,----- 272 W - - 1
htcdg01 htcd01(R) (CL1-E-0, 0, 10)45306 272.P-VOL PSUS NEVER ,35764 268 - - - 1
htcdg01 htcd02(L) (CL1-B-0, 0, 29)35764 269.S-VOL SSUS NEVER ,----- 273 W - - 1
htcdg01 htcd02(R) (CL1-E-0, 0, 11)45306 273.P-VOL PSUS NEVER ,35764 269 - - - 1

You can also verify that the replicated pairs are in the suspended state by using the
Storage Navigator (Figure 14-20).
Important: Although our testing resulted in a site_down event, we never lost access to
the primary storage subsystem. In a true site failure, including loss of storage,
re-establish the replicated pairs, and synchronize them before moving back to the
primary site. If you must change the storage LUNs, modify the horcm.conf file, and use
the same device group and device names. You do not have to change the cluster
resource configuration.
Figure 14-20 Pairs suspended after a site failover17
14.5.3 Site re-integration for the Austin site

In this scenario, we restart both cluster nodes at the Austin site by using the smitty clstart
command. Upon startup of the primary node jessica, the emlecRG resource group is
automatically gracefully moved back to and returns to the original starting point as shown in
Important: The resource group settings of the Inter-site Management Policy, also known
as the site relationship, dictate the behavior of what occurs upon re-integration of the
primary node. Because we chose Prefer Primary Site, the automatic fallback occurred.
Initially we are unable to restart the cluster on the jessica node because of verification errors
at startup, which are similar to the errors shown in Figure 14-17 on page 453. Of the two
possible reasons for these errors, one reason is that we failed to include starting the horcm
instance on bootup. The second is reason is that we also had to re-map the copy protected
device groups by running the raidscan command again.
Important: Always ensure that the horcm instance is running before rejoining a node into
the cluster. In some cases, if all instances, cluster nodes, or both have been down, you
might need to run the raidscan command again.
14.5.4 Graceful site failover for the Miami site

This move scenario starts from the states shown in Example 14-19 on page 455. You repeat
the steps from the previous three sections, one section at a time. However these steps are
performed to test the asynchronous replication of the Miami site.
The following tasks are performed during this move:

1. Release the primary online instance of valhallaRG at the Miami site.
– Executes the application server stop.
– Unmounts the file systems
– Varies off the volume group
– Removes the service IP address
2. Release the secondary online instance of valhallaRG at the Austin site.
3. Acquire valhallaRG in the secondary online state at the Miami site.
4. Acquire valhallaRG in the online primary state at the Austin site.
Perform the resource group move by using SMIT as follows:

3. Select the ONLINE instance of valhallaRG to be moved.
4. Select the Austin site from the pop-up menu.
5. Verify the information in the final menu and press Enter.
Upon completion of the move, the valhallaRG resource group is online on the bina node at
the Austin site. The resource group is online secondary on the local production krod node
at the Miami site as shown in Example 14-25.
Example 14-25 Resource group status after moving to the Austin site
root@bina: clRGinfo
-----------------------------------------------------------------------------
OFFLINE bina@Austin

valhallarg ONLINE SECONDARY krod@Miami
OFFLINE maddi@Miami
ONLINE bina@Austin
6. Repeat these steps to move a resource group back to the original primary krod node at
the Miami site.
Attention: In our environment, after the first resource group move between sites, we were
unable to move the resource group back without leaving the pick list for the destination site
empty. However, we were able to move it back by node, instead of by site. Later in our
testing, the by-site option started working, but it moved it to the standby node at the
primary site instead of the original primary node. If you encounter similar problems, contact
IBM support.
14.5.5 Rolling site failure of the Miami site

In this scenario, you perform a rolling site failure of the Miami site by performing the following
tasks:
1. Halt primary production node krod at site Miami
2. Verify resource group valhallaRG is acquired locally by node maddi
3. Halt node maddi to produce a site down
4. Verify resource group valhallaRG is acquired remotely by node bina
To begin, all four nodes are active in the cluster, and the resource groups are online on the
primary node as shown in Example 14-19 on page 455. Follow these steps:
1. On the krod node, run the reboot -q command. The maddi node brings the valhallaRG
resource group online, and the remote bina node maintains the online secondary status as
shown in Example 14-26. This time the failover time was noticeably longer, specifically in
the fsck portion. The longer amount of time is most likely a symptom of the asynchronous
replication.
Example 14-26 Local node fallover within the Miami site

root@maddi: clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
OFFLINE bina@Austin
valhallarg OFFLINE krod@Miami

ONLINE maddi@Miami
2. Run the pairdisplay command as shown in Example 14-27 to verify that the pairs are still
established because the volume group is still active on the primary site.
Example 14-27 Status using the pairdisplay command after the local Miami site fallover
root@maddi: pairdisplay -fd -g hurdg01 -IH2 -CLI
Group PairVol L/R Device_File Seq# LDEV# P/S Status Fence Seq# P-LDEV# M
hurdg01 hurd01 L hdisk40 35764 270 P-VOL PAIR ASYNC 45306 274 -
hurdg01 hurd01 R hdisk40 45306 274 S-VOL PAIR ASYNC - 270 -
hurdg01 hurd02 L hdisk41 35764 271 P-VOL PAIR ASYNC 45306 275 -
hurdg01 hurd02 R hdisk41 45306 275 S-VOL PAIR ASYNC - 271 -
3. Upon cluster stabilization, run the reboot -q command on the maddi node. The bina node
at the Austin sites acquires the valhallaRG resource group as shown in Example 14-28.
Example 14-28 Hard failover from Miami site to Austin site

root@bina: clRGinfo
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
OFFLINE bina@Austin
OFFLINE maddi@Miami
valhallarg OFFLINE krod@Miami

OFFLINE maddi@Miami
ONLINE bina@Austin
Important: Although our testing resulted in a site_down event, we never lost access to
the primary storage subsystem. In a true site failure, including loss of storage,
re-establish the replicated pairs, and synchronize them before moving back to the
primary site. If you must change the storage LUNs, modify the horcm.conf file, and use
the same device group and device names. You do not have to change the cluster
resource configuration.
14.5.6 Site re-integration for the Miami site

In this scenario, we restart both cluster nodes at the Miami site by using the smitty clstart
command. Upon startup of the primary node krod, the valhallaRG resource group is
automatically gracefully moved back to and returns to the original starting point as shown in
Important: The resource group settings of the Inter-site Management Policy, also known
as the site relationship, dictate the behavior of what occurs upon re-integration of the
primary node. Because we chose Prefer Primary Site policy, the automatic fallback
occurred.
Initially we are unable to restart the cluster on the jessica node because of verification errors
at startup, which are similar to the errors shown in Figure 14-17 on page 453. Of the two
possible reasons for these errors, the first reason is that we failed to include starting the horcm
instance on bootup. The second is reason is that we also had to re-map the copy protected
device groups by running the raidscan command again.

Important: Always ensure that the horcm instance is running before rejoining a node into
the cluster. In some cases, if all instances, cluster nodes, or both have been down, you
might need to run the raidscan command again.
14.6 LVM administration of TrueCopy/HUR replicated pairs

This topic explains common scenarios for adding additional storage to an existing replicated
environment using Hitachi TrueCopy/HUR. In this scenario, you only work with the Austin site
and the emlecRG resource group in a TrueCopy synchronous replication. Overall the steps are
the same for both types of replication. The difference is the initial pair creation. You perform
the following tasks:
Adding LUN pairs to an existing volume group
Adding a new logical volume
Increasing the size of an existing file system
Adding a LUN pair to a new volume group
Important: This topic does not explain how to dynamically expand a volume through
Hitachi Logical Unit Size Expansion (LUSE) because this option is not supported.
14.6.1 Adding LUN pairs to an existing volume group

In this task, you assign a new LUN to each site as you did in 14.4.1, “Assigning LUNs to the
hosts (host groups)” on page 429. Table 14-2 shows a summary of the LUNs that are used.
Before continuing, the LUNs must already be established in a paired relationship, and the
LUNs or hdisk must be available on the appropriate cluster nodes.
Table 14-2 Summary of the LUNs implemented

Austin - Hitachi USPV - 45306 Miami - Hitachi USPVM - 35764
Port CL1-E Port CL-1B
CU 01 CU 01
LUN 000E LUN 001B
LDEV 01:14 LDEV 01:1F
jessica hdisk# hdisk42 krod hdisk# hdisk42
bina hdisk# hdisk42 maddi hdisk# hdisk42
Then follow the same steps from of defining new LUNs as follows:
1. Run the cfgmgr command on the primary node jessica.
2. Assign the PVID on the jessica node.
3. Run the pairsplit command on the replicated LUNs.
4. Run the cfgmgr command on each of the remaining three nodes.
5. Verify that the PVID shows up on each node by using the lspv command.
6. Run the pairresync command on the replicated LUNs.
7. Shut down the horcm2 instance on each node:
/HORCM/usr/bin/horcmshutdown.sh 2
8. Edit the /etc/horcm2.conf file on each node as appropriate for each site:
– The krod and maddi nodes on the Miami site added the following new line:
htcdg01 htcd03 35764 01:1F
– The jessica and bina nodes on the Austin site added the following new line:
htcdg01 htcd03 45306 01:14
9. Restart horcm2 instance on each node:
/HORCM/usr/bin/horcmstart.sh 2
10.Map the devices and device group on any node:
lsdev -Cc disk|grep hdisk|/HORCM/usr/bin/raidscan -IH2 -find inst
We ran this command on the jessica node.
11.Verify that the htcgd01 device group pairs are now showing the new pairs, which consist of
hdisk42 on each system as shown in Example 14-29.
Example 14-29 New LUN pairs in the htcgd01 device group

root@jessica: pairdisplay -fd -g htcdg01 -IH2 -CLI
htcdg01 htcd01 L hdisk38 45306 272 P-VOL PAIR NEVER 35764 268 -
htcdg01 htcd01 R hdisk38 35764 268 S-VOL PAIR NEVER - 272 -
You are now ready to use C-SPOC to add the new disk into the volume group:
Important: You cannot use C-SPOC for the following LVM operations to configure nodes at
the remote site that contain the target volume:
Creating a volume group
Operations that require nodes at the target site to write to the target volumes
For example, changing the file system size, changing the mount point, or adding LVM
mirrors cause an error message in C-SPOC. However, nodes on the same site as the
source volumes can successfully perform these tasks. The changes are then
propagated to the other site by using a lazy update.
For C-SPOC operations to work on all other LVM operations, perform all C-SPOC
operations with the (TrueCopy/HUR) volume pairs in the Synchronized or Consistent states
or the cluster ACTIVE on all nodes.

Groups  Add a Volume to a Volume Group
3. Select the volume group truesyncvg from the pop-up menu.

4. Select hdisk42 as shown in Figure 14-21.

Change/Show characteristics of a Volume Group
Remove a Volume from a Volume Group
+--------------------------------------------------------------------------+
| |
| |
| 000a621aaf47ce83 ( hdisk2 on nodes bina,jessica ) |
| 000a621aaf47ce83 ( hdisk3 on nodes krod,maddi ) |
| 000cf1da43e72fc2 ( hdisk5 on nodes bina,jessica ) |
| 000cf1da43e72fc2 ( hdisk6 on nodes krod,maddi ) |
| 00cb14ce74090ef3 ( hdisk42 on all selected nodes ) |
| 00cb14ceb0f5bd25 ( hdisk4 on nodes bina,jessica ) |
| 00cb14ceb0f5bd25 ( hdisk14 on nodes krod,maddi ) |
| |
F9+--------------------------------------------------------------------------+
Figure 14-21 Selecting a disk to add to the volume group
5. Verify the menu information, as shown in Figure 14-22, and press Enter.

[Entry Fields]
VOLUME GROUP name truesyncvg
Node List bina,jessica,krod,mad>
Reference node bina
VOLUME names hdisk42
Figure 14-22 Adding a volume to a volume group
The krod node does not need the volume group because it is not a member of the resource
group. However, we started with all four nodes seeing all volume groups and decided to leave
the configuration that way. This way we have additional flexibility later if we need to change
the cluster configuration to allow the krod node to take over as a last resort.
Upon completion of the C-SPOC operation, all four nodes now have the new disk as a
member of the volume group as shown in Example 14-30.
Example 14-30 New disk added to the volume group on all nodes
root@jessica: lspv |grep truesyncvg
hdisk38 00cb14ce564c3f44 truesyncvg active
hdisk39 00cb14ce564c40fb truesyncvg active
hdisk42 00cb14ce74090ef3 truesyncvg active
root@bina: lspv |grep truesyncvg

hdisk38 00cb14ce564c3f44 truesyncvg
hdisk39 00cb14ce564c40fb truesyncvg
hdisk42 00cb14ce74090ef3 truesyncvg
root@krod: lspv |grep truesyncvg

root@maddi: lspv |grep truesyncvg

We do not need to synchronize the cluster because all of these changes are made to an
existing volume group. However, you might want to run the cl_verify_tc_config command to
verify the resources replicated correctly.
14.6.2 Adding a new logical volume

To perform this task, again you use C-SPOC, which updates the local nodes within the site.
For the remote site, when a failover occurs, the lazy update process updates the volume
group information as needed. This process also adds a bit of extra time to the failover time.
To add a new logical volume:

2. In SMIT, select the path System Management (C-SPOC)  Storage  Logical
Volumes  Add a Logical Volume.
3. Select the truesyncvg volume group from the pop-up menu.

4. Choose the newly added disk hdisk42 as shown in Figure 14-23.
Logical Volumes

+--------------------------------------------------------------------------+
| |
| |
| Auto-select |
| jessica hdisk38 |
| jessica hdisk39 |
| jessica hdisk42 |
| |
F9+--------------------------------------------------------------------------+
Figure 14-23 Selecting a disk for new logical volume creation
5. Complete the information in the final menu and press Enter.

We added a new logical volume, named micah, which consists of 50 logical partitions
(LPARs) and selected a type of raw. We accepted the default values for all other fields as


VOLUME GROUP name truesyncvg
Node List bina,jessica,krod,mad>
Reference node jessica
* Number of LOGICAL PARTITIONS [50] #
PHYSICAL VOLUME names hdisk42
Logical volume NAME [micah]
Logical volume TYPE [raw] +
POSITION on physical volume outer_middle +
RANGE of physical volumes minimum +
MAXIMUM NUMBER of PHYSICAL VOLUMES [] #
to use for allocation
Number of COPIES of each logical 1 +
Figure 14-24 Defining a new logical volume
6. Upon completion of the C-SPOC operation, verify that the new logical was created locally
on the jessica node as shown Example 14-31.
Example 14-31 Newly created logical volume

root@jessica: lsvg -l truesyncvg
truesyncvg:
micah raw 50 50 1 closed/syncd N/A
14.6.3 Increasing the size of an existing file system

To perform this task, again you use C-SPOC, which updates the local nodes within the site.
For the remote site, when a failover occurs, the lazy update process updates the volume
group information as needed. This process also adds a bit of extra time to the failover time.
To increase the size of an existing file system, follow these steps:

2. In SMIT, select the path System Management (C-SPOC)  Storage  File Systems 
Change / Show Characteristics of a File System.
3. Select the oreofs file system from the pop-up menu.
4. Complete the information in the final menu as desired and press Enter.
In this scenario, we roughly tripled the size of the file system from 500 MB (125 LPARs), as
shown in Example 14-31, to 1536 MB as shown in Figure 14-25.
Change/Show Characteristics of a Enhanced Journaled File System


Volume group name truesyncvg
* Node Names krod,maddi,bina,jessi>
* File system name /oreofs

NEW mount point [/oreofs] /
SIZE of file system
Unit Size M +
Number of units [1536] #
Mount GROUP []
Mount AUTOMATICALLY at system restart? no +
PERMISSIONS read/write +
Mount OPTIONS []
Figure 14-25 Changing the file system size

5. Upon completion of the C-SPOC operation, verify the new file system size locally on the
jessica node as shown in Example 14-32.
Example 14-32 Newly increased file system size

root@jessica: lsvg -l truesyncvg
truesyncvg:
micah raw 50 50 1 closed/syncd N/A
You do not need to synchronize the cluster because all of these changes are made to an
existing volume group. However, you might want to make sure that the replicated resources
verify correctly. Use the cl_verify_tc_config command first to isolate the replicated
resources specifically.
Testing failover after making the LVM changes

Because you do not know if the cluster is going to work when needed, repeat the steps from
14.5.2, “Rolling site failure of the Austin site” on page 457. The new logical volume micah and
the additional space on /oreofs show up on each node. However, there is a noticeable
difference in the total time involved during the site failover when the lazy update was
performed to update the volume group changes.
14.6.4 Adding a LUN pair to a new volume group

The steps for adding a new volume are the same as the steps in 14.6.1, “Adding LUN pairs to
an existing volume group” on page 463. The differences are that you are creating a volume
group, which is required to add a new volume group into a resource group. For completeness,
the initial steps are documented here along with an overview of the new LUNs to be used:
1. Run the cfgmgr command on the primary node jessica.
2. Assign the PVID on the jessica node:
3. Run the pairsplit command on the replicated LUNs.
4. Run the cfgmgr command on each of the remaining three nodes.
5. Verify that the PVID shows up on each node by using the lspv command.
6. Run the pairresync command on the replicated LUNs.
7. Shut down the horcm2 instance on each node:
/HORCM/usr/bin/horcmshutdown.sh 2
8. Edit the /etc/horcm2.conf file on each node as appropriate for each site:
– On the Miami site, the krod and maddi nodes added the following new line:
htcdg01 htcd04 45306 00:20
– On the Austin site, the jessica and bina nodes added the following new line:
htcdg01 htcd04 35764 00:0A
9. Restart the horcm2 instance on each node:
/HORCM/usr/bin/horcmstart.sh 2
10.Map the devices and device group on any node. We ran the raidscan command on the
jessica node. See Table 14-3 for additional configuration details.
lsdev -Cc disk|grep hdisk|/HORCM/usr/bin/raidscan -IH2 -find inst
Table 14-3 Details on the Austin and Miami LUNs
Austin - Hitachi USPV - 45306 Miami - Hitachi USPVM - 35764
Port CL1-E Port CL-1B
CU 00 CU 00
LUN 000F LUN 0021
LDEV 00:20 LDEV 00:0A
jessica hdisk# hdisk43 krod hdisk# hdisk43
bina hdisk# hdisk43 maddi hdisk# hdisk43
11.Verify that the htcgd01 device group pairs are now showing the new pairs that consist of
hdisk42 on each system as shown in Example 14-33.
Example 14-33 New LUN pairs add to htcgd01 device group

root@jessica: pairdisplay -fd -g htcdg01 -IH2 -CLI
You are now ready to use C-SPOC to create a volume group:

Groups  Create a Volume to a Volume Group.

3. In the Node Names panel, select the specific nodes. We chose all four as shown in
Figure 14-26.
Volume Groups

+--------------------------------------------------------------------------+
| Node Names |
| |
| |
| > bina |
| > jessica |
| > krod |
| > maddi |
| |
F9+--------------------------------------------------------------------------+
Figure 14-26 Selecting a volume group node
4. In the Physical Volume Names panel (Figure 14-27), select hdisk43.
Volume Groups

+--------------------------------------------------------------------------+
| |
| |
| 000a621aaf47ce83 ( hdisk2 on nodes bina,jessica ) |
| 000a621aaf47ce83 ( hdisk3 on nodes krod,maddi ) |
| 000cf1da43e72fc2 ( hdisk5 on nodes bina,jessica ) |
| 000cf1da43e72fc2 ( hdisk6 on nodes krod,maddi ) |
| 00cb14ce75bab41a ( hdisk43 on all selected nodes ) |
| 00cb14ceb0f5bd25 ( hdisk4 on nodes bina,jessica ) |
| 00cb14ceb0f5bd25 ( hdisk14 on nodes krod,maddi ) |
| |
F9+--------------------------------------------------------------------------+
Figure 14-27 Selecting an hdisk for a new volume group

5. In the Volume Group Type panel, select the volume group type. We chose Scalable as
Volume Groups


+--------------------------------------------------------------------------+
| Volume Group Type |
| |
| |
| Legacy |
| Original |
| Big |
| Scalable |
| |
F9+--------------------------------------------------------------------------+
Figure 14-28 Selecting the volume group type for a new volume group
6. In the Create a Scalable Volume Group panel, select the proper resource group. We chose
emlecRG as shown in Figure 14-29.
Create a Scalable Volume Group


Node Names bina,jessica,krod,mad>
Resource Group Name [emlecRG] +
PVID 00cb14ce75bab41a
VOLUME GROUP name [truetarahvg]
Enable Fast Disk Takeover or Concurrent Access no +
Volume Group Type Scalable
Maximum Physical Partitions in units of 1024 32 +
Maximum Number of Logical Volumes 256 +
Figure 14-29 Create a volume group final C-SPOC SMIT menu
7. Choose a volume group name. We chose truetarahvg. Press Enter.
8. Verify that the volume group is successfully created, which we do on all four nodes as
Example 14-34 Newly created volume group on all nodes

root@jessica: lspv |grep truetarahvg
hdisk43 00cb14ce75bab41a truetarahvg
root@bina: lspv |grep truetarahvg

root@krod: lspv |grep truetarahvg

root@maddi: lspv |grep truetarahvg

When creating the volume group, the volume group is automatically added to the resource
group as shown in Example 14-35. However, we do not have to change the resource
group any further, because the new disk and device are added to the same device group
and TrueCopy/HUR replicated resource.
Example 14-35 Newly added volume group also added to the resource group
Participating Node Name(s) jessica bina maddi
Node Priority
Volume Groups truesyncvg truetarahvg
Hitachi TrueCopy Replicated Resources truelee
9. Repeat the steps in 14.6.2, “Adding a new logical volume” on page 466, to create a new
logical volume, named tarahlv on the newly created volume group truetarahvg.
Example 14-36 shows the new logical volume.
Example 14-36 New logical volume on newly added volume group

root@jessica: lsvg -l truetarahvg
truetarahvg:
tarahlv raw 25 25 1 closed/syncd N/A
10.Manually run the cl_verify_tc_config command to verify that the new addition of the
replicated resources is complete.

Important: During our testing, we encountered a defect after the second volume group
was added to the resource group. The cl_verify_tc_config command produced the
following error messages:
cl_verify_tc_config: ERROR - Disk hdisk38 included in Device Group htcdg01 does
not match any hdisk in Volume Group truetarahvg.
Errors found verifying the HACMP TRUECOPY/HUR configuration. Status=3
These results incorrectly imply a one to one relationship between the device
group/replicated resource and the volume group, which is not intended. To work around
this problem, ensure that the cluster is down, do a forced synchronization, and then start
the cluster but ignore the verification errors. Usually performing both a forced
synchronization and then starting the cluster ignoring errors is not recommended. Contact
IBM support to see if a fix is available.
Synchronize the resource group change to include the new volume that you just added.
Usually you can perform this task within a running cluster. However, because of the defect
mentioned in the previous Important box, we had to have the cluster down to synchronize it.
To perform this task, follow these steps:
2. In SMIT, select the path Extended Configuration  Extended Verification and
Synchronization and Verification
3. In the HACMP Verification and Synchronization display (Figure 14-30), for Force
synchronization if verification fails, select Yes.
HACMP Verification and Synchronization

[Entry Fields]
* Automatically correct errors found during [No] +
verification?
* Force synchronization if verification fails? [Yes] +


F5=Reset F6=Command F7=Edit F8=Image
Figure 14-30 Extended Verification and Synchronization SMIT menu
4. Verify the information is correct, and press Enter. Upon completion, the cluster
configuration is in sync and can now be tested.
5. Repeat the steps for a rolling system failure as explained in 14.5.2, “Rolling site failure of
the Austin site” on page 457. In this scenario, the tests are successful.
Testing failover after adding a new volume group
Because you do not know if the cluster is going to work when needed, repeat the steps of a
rolling site failure as explained in 14.5.2, “Rolling site failure of the Austin site” on page 457.
The new volume group truetarahvg and new logical volume tarahlv are displayed on each
node. However, there is a noticeable difference in total time involved during the site failover
when the lazy update is performed to update the volume group changes.

A
Appendix A. CAA cluster commands

This appendix provides a list of the Cluster Aware AIX (CAA) administration commands, and
examples of how to use them. The information about these commands has been gathered
from the new AIX man pages and placed in this appendix for your reference. This list is not an
exhaustive list of all new commands, but focuses on commands that you might come across
during the administration of your PowerHA cluster.
This appendix includes the following topics:

The lscluster command
The mkcluster command
The rmcluster command
The chcluster command
The clusterconf command

The lscluster command
The lscluster command lists the cluster configuration information.
Syntax
lscluster -i [ -n ] | -s | -m | -d | -c
Description
The lscluster command shows the attributes that are associated with the cluster and the
cluster configuration.
Flags
-i Lists the cluster configuration interfaces on the local node.
-n Allows the cluster name to be queried for all interfaces (applicable only with the -i
flag).
-s Lists the cluster network statistics on the local node.
-m Lists the cluster node configuration information.
-d Lists the cluster storage interfaces.
-c Lists the cluster configuration.
Examples
To list the cluster configuration for all nodes, enter the following command:
lscluster -m
To list the cluster statistics for the local node, enter the following command:
lscluster -s
To list the interface information for the local node, enter the following command:
lscluster -i
To list the interface information for the cluster, enter the following command:
lscluster -i -n mycluster
To list the storage interface information for the cluster, enter the following command:
lscluster -d
To list the cluster configuration, enter the following command:
lscluster -c
The mkcluster command

The mkcluster command creates a cluster.
Syntax
mkcluster [ -n clustername ] [ -m node[,...] ] -r reposdev [-d shareddisk [,...]]
[-s multaddr_local ] [-v ]

Description
The mkcluster command creates a cluster. Each node that is added to the cluster must have
common storage area network (SAN) storage devices that are configured and zoned
appropriately. The SAN storage devices are used for the cluster repository disk and for any
clustered shared disks. (The shared disks that are added to a cluster configuration share the
same name across all the nodes in the cluster.)
A multicast address is used for cluster communications between the nodes in the cluster.
Therefore, if any network considerations must be reviewed before creating a cluster, consult
your network systems administrator.
Flags
-n clustername Sets the name of the local cluster being created. If no name is
specified when you first run the mkcluster command, a default of
SIRCOL_hostname is used, where hostname is the name
(gethostname()) of the local host.
-m node[,...] Lists the comma-separated resolvable host names or IP addresses for
nodes that are members of the cluster. The local host must be
included in the list. If the -m option is not used, the local host is implied,
causing a one-node local cluster to be created.
-r reposdev Specifies the name, such as hdisk10, of the SAN-shared storage
device that is used as the central repository for the cluster
configuration data. This device must be accessible from all nodes.
This device is required to be a minimum of 1 GB in size and backed by
a redundant and highly available SAN configuration. This flag is
required when you first run the mkcluster command within a Storage
Interconnected Resource Collection (SIRCOL), and cannot be used
thereafter.
-d shareddisk[,...] Specifies a comma-separated list of SAN-shared storage devices,
such as hdisk12,hdisk34, to be incorporated into the cluster
configuration.
These devices are renamed with a cldisk prefix. The same name is
assigned to this device on all cluster nodes from which the device is
accessible. Specified devices must not be open when the mkcluster
command is executed. This flag is used only when you first run the
mkcluster command.
-s multaddr_local Sets the multicast address of the local cluster that is being created.
This address is used for internal communication within the local
cluster. If the -s option is not specified when you first run the
mkcluster command within a SIRCOL, a multicast address is
automatically generated. This flag is used only when you first run the
mkcluster command within a SIRCOL.
-v Specifies the verbose mode.
Examples
To create a cluster of one node and use the default values, enter the following command:
mkcluster -r hdisk1
The output is a cluster named SIRCOL_myhostname with a single node in the cluster. The
multicast address is automatically generated, and no shared disks are created for this
cluster. The repository device is set up on hdisk1, and this disk cannot be used by the
Appendix A. CAA cluster commands 479

node for any other purpose. The repository device is now dedicated to being the cluster
repository disk.
To create a multinode cluster, enter the following command:
mkcluster -n mycluster -m nodeA,nodeB,nodeC -r hdisk1 -d
hdisk10,hdisk11,hdisk12
The output is a cluster of three nodes and uses the default values. The output also creates
a cluster with the specified name, and the multicast address is automatically created.
Three disks are created as shared clustered disks for this cluster, and these disks share
the same name across all the nodes in this cluster. You can run the lspv command to see
the new names after the cluster is created. The repository device is set up on hdisk1 and
cannot be used by any of the nodes for any other purpose. The repository device is now
dedicated to being the cluster repository disk. A volume group is created for the cluster
repository disk. These logical volumes are used exclusively by the clustering subsystem.
The rmcluster command

The rmcluster command removes the cluster configuration.
Syntax
rmcluster -n name [-f] [-v]
Description
The rmcluster command removes the cluster configuration. The repository disk and all SAN
Volume Controller (SVC) shared disks are released, and the SAN shared disks are
re-assigned to a generic hdisk name. The generic hdisk name cannot be the same name that
was initially used to add the disk to the cluster.
Flags
-n name Specifies the name of the cluster to be removed.
-f Forces certain errors to be ignored.
-v Specifies the verbose.
Example
To remove the cluster configuration, enter the following command:
rmcluster -n mycluster
The chcluster command

The chcluster command is used to change the cluster configuration.
Syntax
chcluster [ -n name ] [{ -d | -m } [+|-] name [,....]] ..... [ -q ][ -f ][ -v ]
Description
The chcluster command changes the cluster configuration. With this command, SAN shared
disks and nodes can be added and removed from the cluster configuration.

Flags
-d [+|-]shareddisk[,...]
Specifies a comma-separated list of shared storage-device names to
be added to or removed from a cluster configuration. The new shared
disks are renamed with a cldisk prefix. The same name is assigned to
this device on all cluster nodes from which the device can be
accessed. Deleted devices are re-assigned a generic hdisk name.
This newly reassigned hdisk name might not be the same as it was
before it was added to the cluster configuration. The shared disks
must not be open when the chcluster command is executed.
-m [+|-]node[,...] Specifies a comma-separated list of node names to be added or
removed from the cluster configuration.
-n name Specifies the name of the cluster to be changed. If omitted, the default
cluster is used.
-q The quick mode option, which performs the changes on the local node
only. If this option is used, the other nodes in the cluster configuration
are asynchronously contacted and the changes are performed.
-f The force option, which causes certain errors to be ignored.
-v Verbose mode
Examples
To add shared disks to the cluster configuration, enter the following command:
chcluster -n mycluster -d +hdisk20,+hdisk21
To remove shared disks from the cluster configuration, enter the following command:
chcluster -n mycluster -d -hdisk20,-hdisk21
To add nodes to the cluster configuration, enter the following command:
chcluster -n mycluster -m +nodeD,+nodeE
To remove nodes from the cluster configuration, enter the following command:
chcluster -n mycluster -m -nodeD,-nodeE
The clusterconf command

The clusterconf command is a service utility for administration of a cluster configuration.
Syntax
clusterconf [ -u [-f ] | -s | -r hdiskN ] [-v ]
Description
The clusterconf command allows administration of the cluster configuration. A node in a
cluster configuration might indicate a status of DOWN (viewable by issuing the lscluster -m
command). Alternatively, a node in a cluster might not be displayed in the cluster configuration,
and you know the node is part of the cluster configuration (viewable from another node in the
cluster by using the lscluster -m command). In these cases, the following flags allow the
node to search and read the repository disk and take self-correcting actions.
Do not use the clusterconf command option to remove a cluster configuration. Instead, use
the rmcluster command for normal removal of the cluster configuration.
Appendix A. CAA cluster commands 481

Flags
If no flags are specified, the clusterconf command performs a refresh operation by retrieving
the cluster repository configuration and performing the necessary actions. The following
actions might occur:
A cluster node joins a cluster of which the node is a member and for some reason was
disconnected from the cluster (either from network or SAN problems)
A cluster node might perform a resync with the cluster repository configuration (again from
some problems in the network or SAN)
A cluster node might leave the cluster configuration if the node was removed from the
cluster repository configuration.
The clusterconf command is a normal cluster service and is automatically handled during
normal operation. This following flags are possible for this command:
-r hdiskN Has the cluster subsystem read the repository device if you know where the
repository disk is (lspv and look for cvg). It causes the node to join the cluster if
the node is configured in the repository disk.
-s Performs an exhaustive search for a cluster repository disk on all configured
hdisk devices. It stops when a cluster repository disk is found. This option
searches all disks that are looking for the signature of a repository device. If a
disk is found with the signature identifying it as the cluster repository, the search
is stopped. If the node finds itself in the cluster configuration on the disk, the
node joins the cluster. If the storage network is dirty and multiple repositories
are in the storage network (not supported), it stops at the first repository disk. If
the node is not in that repository configuration, it does not join the cluster.
Use the -v flag to see which disk was found. Then use the other options on the
clusterconf command to clean up the storage network until the desired results
are achieved.
-u Performs the unconfigure operation for the local node. If the node is in the
cluster repository configuration on the shared disk to which the other nodes
have access, the other nodes in the cluster request this node to rejoin the
cluster. The -u option is used when cleanup must be performed on the local
node. (The node was removed from the cluster configuration. For some reason,
the local node was either down or inaccessible from the network to be removed
during normal removal operations such as when the chcluster -m -nodeA
command was run). The updates to clean up the environment on the local node
are performed by the unconfigure operation.
-f The force option, which performs the unconfigure operation and ignores errors.
-v Verbose mode.
Examples
To clean up the local node, the following command cleans up the nodes environment:
clusterconf -fu
To recover the cluster configuration and start cluster services, enter the following
command:
clusterconf -r hdisk1
To search for the cluster repository device and join the cluster, enter the following
command:
clusterconf -s

B
Appendix B. PowerHA SMIT tree

This appendix includes the PowerHA v7.1 SMIT tree. Depending on the version of PowerHA
that you have installed, you might notice some differences.
Note the following explanation to help you understand how to read the tree:
The number of right-pointing double quotation marks (») indicates the number of screens
that you have to go down in the PowerHA SMIT tree. For example, » » » means that you
must page down three screens.
The double en dashes (--) are used as a separator between the SMIT text and the SMIT
fast path.
The parentheses (()) indicate the fast path.
» Cluster Nodes and Networks -- (cm_cluster_nodes_networks)

» » Initial Cluster Setup (Typical) -- (cm_setup_menu)
» » » Setup a Cluster, Nodes and Networks -- (cm_setup_cluster_nodes_networks)
» » » Define Repository Disk and Cluster IP Address -- cm_define_repos_ip_addr)
» » » What are a repository disk and cluster IP address ? -- (cm_whatis_repos_ip_addr)
» » Manage the Cluster -- (cm_manage_cluster)
» » »PowerHA SystemMirror Configuration -- (cm_show_cluster_top)
» » »Remove the Cluster Definition -- (cm_remove_cluster)
» » » Snapshot Configuration -- (cm_cfg_snap_menu)
» » » » Create a Snapshot of the Cluster Configuration -- (cm_add_snap.dialog)
» » » » Change/Show a Snapshot of the Cluster Configuration -- (cm_show_snap.select)
» » » » Remove a Snapshot of the Cluster Configuration -- (cm_rm_snap.select)
» » » » Restore the Cluster Configuration From a Snapshot -- (cm_apply_snap.select)
» » » » Configure a Custom Snapshot Method -- (clsnapshot_custom_menu)
» » » » » Add a Custom Snapshot Method -- (clsnapshot_custom_dialog_add)
» » » » » Change/Show a Custom Snapshot Method -- (clsnapshot_custom_dialog_cha.select)
» » » » » Remove a Custom Snapshot Method -- (clsnapshot_custom_dialog_rem.select)
» » Manage Nodes -- (cm_manage_nodes)
» » » Show Topology Information by Node -- (cllsnode_menu)
» » » » Show All Nodes -- (cllsnode.dialog)
» » » » Select a Node to Show -- (cllsnode_select)
» » » Add a Node -- (cm_add_node)
» » » Change/Show a Node -- (cm_change_show_node)
» » » Remove Nodes -- (cm_remove_node)
» » » Configure Persistent Node IP Label/Addresses -- (cm_persistent_addresses)

» » » » Add a Persistent Node IP Label/Address --
(cm_add_a_persistent_node_ip_label_address_select)
» » » » Change/Show a Persistent Node IP Label/Address --
(cm_change_show_a_persistent_node_ip_label_address_select)
» » » » Remove a Persistent Node IP Label/Address --
(cm_delete_a_persistent_node_ip_label_address_select)
» » » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync)
» » Manage Networks and Network Interfaces -- (cm_manage_networks_interfaces)
» » » Networks -- (cm_manage_networks_menu)
» » » » Add a Network -- (cm_add_network)
» » » » Change/Show a Network -- (cm_change_show_network)
» » » » Remove a Network -- (cm_remove_network)
» » » Network Interfaces -- (cm_manage_interfaces_menu)
» » » » Add a Network Interface -- (cm_add_interfaces)
» » » » Change/Show a Network Interface -- (cm_change_show_interfaces)
» » » » Remove a Network Interface -- (cm_remove_interfaces)
» » » Show Topology Information by Network -- (cllsnw_menu)
» » » » Show All Networks -- (cllsnw.dialog)
» » » » Select a Network to Show -- (cllsnw_select)
» » » Show Topology Information by Network Interface -- (cllsif_menu)
» » » » Show All Network Interfaces -- (cllsif.dialog)
» » » » Select a Network Interface to Show -- (cllsif_select)
» » Discover Network Interfaces and Disks -- (cm_discover_nw_interfaces_and_disks)
» » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync)
» Cluster Applications and Resources -- (cm_apps_resources)

» » Make Applications Highly Available (Use Smart Assists) -- (clsa)
» » Resources -- (cm_resources_menu)
» » » Configure User Applications (Scripts and Monitors) -- (cm_user_apps)
» » » » Application Controller Scripts -- (cm_app_scripts)
» » » » » Add Application Controller Scripts -- (cm_add_app_scripts)
» » » » » Change/Show Application Controller Scripts -- (cm_change_show_app_scripts)
» » » » » Remove Application Controller Scripts -- (cm_remove_app_scripts)
» » » » » What is an "Application Controller" anyway ? -- (cm_app_controller_help)
» » » » Application Monitors -- (cm_appmon)
» » » » » Configure Process Application Monitors -- (cm_cfg_process_appmon)
» » » » » » Add a Process Application Monitor -- (cm_add_process_appmon)
» » » » » » Change/Show Process Application Monitor -- (cm_change_show_process_appmon)
» » » » » » Remove a Process Application Monitor -- (cm_remove_process_appmon)
» » » » » Configure Custom Application Monitors -- (cm_cfg_custom_appmon)
» » » » » » Add a Custom Application Monitor -- (cm_add_custom_appmon)
» » » » » » Change/Show Custom Application Monitor -- (cm_change_show_custom_appmon)
» » » » » » Remove a Custom Application Monitor -- (cm_remove_custom_appmon)
» » » » Configure Application for Dynamic LPAR and CoD Resources -- (cm_cfg_appondemand)
» » » » » Configure Communication Path to HMC -- (cm_cfg_apphmc)
» » » » » » Add HMC IP addresses for a node -- (cladd_apphmc.dialog)
» » » » » » Change/Show HMC IP addresses for a node -- (clch_apphmc.select)
» » » » » » Remove HMC IP addresses for a node -- (clrm_apphmc.select)
» » » » » Configure Dynamic LPAR and CoD Resources for Applications -- (cm_cfg_appdlpar)
» » » » » » Add Dynamic LPAR and CoD Resources for Applications -- (cm_add_appdlpar)
» » » » » » Change/Show Dynamic LPAR and CoD Resources for Applications --
(cm_change_show_appdlpar)
» » » » » » Remove Dynamic LPAR and CoD Resources for Applications -- (cm_remove_appdlpar)
» » » » Show Cluster Applications -- (cldisp.dialog)
» » » Configure Service IP Labels/Addresses -- (cm_service_ip)
» » » » Add a Service IP Label/Address -- (cm_add_a_service_ip_label_address.select_net)
» » » » Change/Show a Service IP Label/Address -- (cm_change_service_ip.select)
» » » » Remove Service IP Label(s)/Address(es) -- (cm_delete_service_ip.select)
» » » » Configure Service IP Label/Address Distribution Preferences --

(cm_change_show_service_ip_distribution_preference_select)
» » » Configure Tape Resources -- (cm_cfg_tape)
» » » » Add a Tape Resource -- (cm_add_tape)
» » » » Change/Show a Tape Resource -- (cm_change_tape)
» » » » Remove a Tape Resource -- (cm_remove_tape)
» » Resource Groups -- (cm_resource_groups)
» » » Add a Resource Group -- (cm_add_resource_group)
» » » Change/Show Nodes and Policies for a Resource Group --
(cm_change_show_rg_nodes_policies)
» » » Change/Show Resources and Attributes for a Resource Group --
(cm_change_show_rg_resources)
» » » Remove a Resource Group -- (cm_remove_resource_group)
» » » Configure Resource Group Run-Time Policies --
(cm_config_resource_group_run-time_policies_menu_dmn)
» » » » Configure Dependencies between Resource Groups -- (cm_rg_dependencies_menu)
» » » » » Configure Parent/Child Dependency -- (cm_rg_dependencies)
» » » » » » Add Parent/Child Dependency between Resource Groups --
(cm_rg_dependencies add.select)
» » » » » » Change/Show Parent/Child Dependency between Resource Groups --
(cm_rg_dependencies ch.select)
» » » » » » Remove Parent/Child Dependency between Resource Groups --
(cm_rg_dependencies rm.select)
» » » » » » Display All Parent/Child Resource Group Dependencies --
(cm_rg_dependencies display.select)
» » » » » Configure Start After Resource Group Dependency --
(cm_rg_dependencies_startafter_main_menu)
» » » » » » Add Start After Resource Group Dependency -- (cm_rg_dependencies add.select startafter)
» » » » » » Change/Show Start After Resource Group Dependency --
(cm_rg_dependencies ch.select startafter)
» » » » » » Remove Start After Resource Group Dependency --
(cm_rg_dependencies rm.select startafter)
» » » » » » Display Start After Resource Group Dependencies --
(cm_rg_dependencies display.select startafter)
» » » » » Configure Stop After Resource Group Dependency --
(cm_rg_dependencies_stopafter_main_menu)
» » » » » » Add Stop After Resource Group Dependency --
(cm_rg_dependencies add.select stopafter)
» » » » » » Change/Show Stop After Resource Group Dependency --
(cm_rg_dependencies ch.select stopafter)
» » » » » » Remove Stop After Resource Group Dependency --
(cm_rg_dependencies rm.select stopafter)
» » » » » » Display Stop After Resource Group Dependencies --
(cm_rg_dependencies display.select stopafter)
» » » » » Configure Online on the Same Node Dependency -- (cm_rg_osn_dependencies)
» » » » » » Add Online on the Same Node Dependency Between Resource Groups --
(cm_rg_osn_dependencies add.dialog)
» » » » » » Change/Show Online on the Same Node Dependency Between Resource Groups --
(cm_rg_osn_dependencies ch.select)
» » » » » » Remove Online on the Same Node Dependency Between Resource --
(cm_rg_osn_dependencies rm.select)
» » » » » Configure Online on Different Nodes Dependency -- (cm_rg_odn_dependencies.dialog)
» » » » Configure Resource Group Processing Ordering -- (cm_processing_order)
» » » » Configure PowerHA SystemMirror Workload Manager Parameters -- (cm_cfg_wlm_runtime)
» » » » Configure Delayed Fallback Timer Policies -- (cm_timer_menu)
» » » » » Add a Delayed Fallback Timer Policy -- (cm_timer_add.select)
» » » » » Change/Show a Delayed Fallback Timer Policy -- (cm_timer_update.select)
» » » » » Remove a Delayed Fallback Timer Policy -- (cm_timer_remove.select)
» » » » Configure Settling Time for Resource Groups -- (cm_settling_timer_menu)
» » » Show All Resources by Node or Resource Group --
Appendix B. PowerHA SMIT tree 485

(cm_show_all_resources_by_node_or_resource_group_menu_dmn
» » » » Show Resource Information by Node -- (cllsres.select)
» » » » Show Resource Information by Resource Group -- (clshowres.select)
» » » » Show Current State of Applications and Resource Groups --
(cm_show_current_state_application_resource_group_menu_dwn)
» » » What is a "Resource Group" anyway ? -- (cm_resource_group_help)
» » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync)
» System Management (C-SPOC) -- (cm_system_management_cspoc_menu_dmn)

» » Storage -- (cl_lvm)
» » » Volume Groups -- (cl_vg)
» » » » List All Volume Groups -- (cl_lsvgA)
» » » » Create a Volume Group -- (cl_createvg)
» » » » Create a Volume Group with Data Path Devices -- (cl_createvpathvg)
» » » » Set Characteristics of a Volume Group -- (cl_vgsc)
» » » » » Add a Volume to a Volume Group -- (cl_extendvg)
» » » » » Change/Show characteristics of a Volume Group -- (cl_chshsvg)
» » » » » Remove a Volume from a Volume Group -- (cl_reducevg)
» » » » » Enable/Disable a Volume Group for Cross-Site LVM Mirroring Verification --
(hacmp_sm_lv_svg_sc_ed)
» » » » Enable a Volume Group for Fast Disk Takeover or Concurrent Access -- (cl_vgforfdto)
» » » » Import a Volume Group -- (cl_importvg)
» » » » Mirror a Volume Group -- (cl_mirrorvg)
» » » » Unmirror a Volume Group -- (cl_unmirrorvg)
» » » » Manage Critical Volume Groups -- (cl_manage_critical_vgs)
» » » » » Mark a Volume Group as Critical -- (cl_mark_critical_vg.select)
» » » » » Show all Critical volume groups -- (cl_show_critical_vgs)
» » » » » Mark a Volume Group as non-Critical -- (cl_mark_noncritical_vg.select)
» » » » » Configure failure actions for Critical Volume Groups -- (cl_set_critical_vg_response)
» » » » Synchronize LVM Mirrors -- (cl_syncvg)
» » » » » Synchronize by Volume Group -- (cl_syncvg_vg)
» » » » » Synchronize by Logical Volume -- (cl_syncvg_lv)
» » » » Synchronize a Volume Group Definition -- (cl_updatevg)
» » » Logical Volumes -- (cl_lv)
» » » » List All Logical Volumes by Volume Group -- (cl_lslv0)
» » » » Add a Logical Volume -- (cl_mklv)
» » » » Show Characteristics of a Logical Volume -- (cl_lslv)
» » » » Set Characteristics of a Logical Volume -- (cl_lvsc)
» » » » » Rename a Logical Volume -- (cl_renamelv)
» » » » » Increase the Size of a Logical Volume -- (cl_extendlv)
» » » » » Add a Copy to a Logical Volume -- (cl_mklvcopy)
» » » » » Remove a Copy from a Logical Volume -- (cl_rmlvcopy)
» » » » Change a Logical Volume -- (cl_chlv1)
» » » » Remove a Logical Volume -- (cl_rmlv1)
» » » File Systems -- (cl_fs)
» » » » List All File Systems by Volume Group -- (cl_lsfs)
» » » » Add a File System -- (cl_mkfs)
» » » » Change / Show Characteristics of a File System -- (cl_chfs)
» » » » Remove a File System -- (cl_rmfs)
» » » Physical Volumes -- (cl_disk_man)
» » » » Add a Disk to the Cluster -- (cl_disk_man add nodes)
» » » » Remove a Disk From the Cluster -- (cl_disk_man rem nodes)
» » » » Cluster Disk Replacement -- (cl_disk_man.replace)
» » » » Cluster Data Path Device Management -- (cl_dpath_mgt)
» » » » » Display Data Path Device Configuration -- (cl_dpls_cfg.select)
» » » » » Display Data Path Device Status -- (cl_dp_stat.select)
» » » » » Display Data Path Device Adapter Status -- (cl_dpdadapter_stat.select)
» » » » » Define and Configure all Data Path Devices -- (cl_dpdefcfg_all.select)
» » » » » Add Paths to Available Data Path Devices -- (cl_dpaddpaths.select)

» » » » » Configure a Defined Data Path Device -- (cl_dpconfdef.select)
» » » » » Remove a Data Path Device -- (cl_dprmvp.select)
» » » » » Convert ESS hdisk Device Volume Group to an SDD VPATH Device --
(cl_dphd2vp.select)
» » » » » Convert SDD VPATH Device Volume Group to an ESS hdisk Device --
(cl_dpvp2hd.select)
» » » » Configure Disk/Site Locations for Cross-Site LVM Mirroring -- (hacmp_sm_pv_xsm_ds)
» » » » » Add Disk/Site Definition for Cross-Site LVM Mirroring -- (hacmp_sm_pv_xsm_ds_ad)
» » » » » Change/Show Disk/Site Definition for Cross-Site LVM Mirroring -- (hacmp_sm_pv_xsm_ds_cs)
» » » » » Remove Disk/Site Definition for Cross-Site LVM Mirroring -- (hacmp_sm_pv_xsm_ds_rm)
» » PowerHA SystemMirror Services -- (cl_cm_startstop_menu)
» » » Start Cluster Services -- (clstart)
» » » Stop Cluster Services -- (clstop)
» » » Show Cluster Services -- (clshowsrv.dialog)
» » Communication Interfaces --
(cm_hacmp_communication_interface_management_menu_dmn)
» » » Configure Communication Interfaces/Devices to the Operating System on a Node --
(cm_config_comm_dev_node.select)
» » » Update PowerHA SystemMirror Communication Interface with AIX Settings --
(cm_update_hacmp_interface_with_aix_settings)
» » » Swap IP Addresses between Communication Interfaces -- (cl_swap_adapter)
» » » PCI Hot Plug Replace a Network Interface Card --(cl_pcihp)
» » Resource Groups and Applications --
(cm_hacmp_resource_group_and_application_management_menu)
» » » Show the Current State of Applications and Resource Groups --
(cm_show_current_state_application_resource_group_menu_dwn)
» » » Bring a Resource Group Online -- (cl_resgrp_start.select)
» » » Bring a Resource Group Offline -- (cl_resgrp_stop.select)
» » » Move Resource Groups to Another Node -- (cl_resgrp_move_node.select)
» » » Suspend/Resume Application Monitoring -- (cm_suspend_resume_menu)
» » » » Suspend Application Monitoring -- (cm_suspend_appmon.select)
» » » » Resume Application Monitoring -- (cm_resume_appmon.select)
» » » Application Availability Analysis -- (cl_app_AAA.dialog)
» » PowerHA SystemMirror Logs -- (cm_hacmp_log_viewing_and_management_menu_dmn)
» » » View/Save/Delete PowerHA SystemMirror Event Summaries -- (cm_dsp_evs)
» » » » View Event Summaries -- (cm_show_evs)
» » » » Save Event Summaries to a file -- (dspevs.dialog)
» » » » Delete Event Summary History -- (cm_del_evs)
» » » View Detailed PowerHA SystemMirror Log Files -- (cm_log_menu)
» » » » Scan the PowerHA SystemMirror for AIX Scripts log -- (cm_scan_scripts_log_select)
» » » » Watch the PowerHA SystemMirror for AIX Scripts log -- (cm_watch_scripts_log.dialog)
» » » » Scan the PowerHA SystemMirror for AIX System log -- (cm_scan_syslog.dialog)
» » » » Watch the PowerHA SystemMirror for AIX System log -- (cm_watch_syslog.dialog)
» » » » Scan the C-SPOC System Log File -- (cl_scan_syslog.dialog)
» » » » Watch the C-SPOC System Log File -- (cl_watch_syslog.dialog)
» » » Change/Show PowerHA SystemMirror Log File Parameters -- (cm_run_time.select)
» » » Change/Show Cluster Manager Log File Parameters -- (cluster_manager_log_param)
» » » Change/Show a Cluster Log Directory -- (clusterlog_redir.select)
» » » Change All Cluster Logs Directory -- (clusterlog_redirall_cha)
» » » Collect Cluster log files for Problem Reporting -- (cm_clsnap_dialog)
» » File Collections -- (cm_filecollection_menu)
» » » Manage File Collections -- (cm_filecollection_mgt)
» » » » Add a File Collection -- (cm_filecollection_add)
» » » » Change/Show a File Collection -- (cm_filecollection_ch)
» » » » Remove a File Collection -- (cm_filecollection_rm)
» » » » Change/Show Automatic Update Time -- (cm_filecollection_time)
» » » Manage File in File Collections -- (cm_filesinfilecollection_mgt)
» » » » Add Files to a File Collection -- (cm_filesinfilecollection_add)
» » » » Remove Files from a File Collection -- (cm_filesfromfilecollection_selectfc)
» » » Propagate Files in File Collections -- (cm_filecollection_prop)

» » Security and Users -- (cl_usergroup)
» » » PowerHA SystemMirror Cluster Security -- (cm_config_security)
» » » » Configure Connection Authentication Mode -- (cm_config_security.connection)
» » » » Configure Message Authentication Mode and Key Management --
(cm_config_security.message)
» » » » » Configure Message Authentication Mode -- (cm_config_security.message_dialog)
» » » » » Generate/Distribute a Key -- (cm_config_security.message_key_dialog)
» » » » » Enable/Disable Automatic Key Distribution -- (cm_config_security.keydist_message_dialog)
» » » » » Activate the new key on all PowerHA SystemMirror cluster node --
(cm_config_security.keyrefr_message_dialog)
» » » Users in an PowerHA SystemMirror cluster -- (cl_users)
» » » » Add a User to the Cluster -- (cl_mkuser)
» » » » Change / Show Characteristics of a User in the Cluster -- (cl_chuser)
» » » » Remove a User from the Cluster -- (cl_rmuser)
» » » » List Users in the Cluster -- (cl_lsuser.hdr)
» » » Groups in an PowerHA SystemMirror cluster -- (cl_groups)
» » » » List All Groups in the Cluster -- (cl_lsgroup.hdr)
» » » » Add a Group to the Cluster -- (cl_mkgroup)
» » » » Change / Show Characteristics of a Group in the Cluster -- (cl_chgroup)
» » » » Remove a Group from the Cluster -- (cl_rmgroup)
» » » Passwords in an PowerHA SystemMirror cluster -- (cl_passwd)
» » » » Change a User's Password in the Cluster -- (cl_chpasswd)
» » » » Change Current Users Password -- (cl_chuserpasswd)
» » » » Manage List of Users Allowed to Change Password -- (cl_manageusers)
» » » » List Users Allowed to Change Password -- (cl_listmanageusers)
» » » » Modify System Password Utility -- (cl_modpasswdutil)
» » Open a SMIT Session on a Node -- (cm_open_a_smit_session_select)
» Problem Determination Tools -- (cm_problem_determination_tools_menu_dmn)

» » PowerHA SystemMirror Verification -- (cm_hacmp_verification_menu_dmn)
» » » Verify Cluster Configuration -- (clverify.dialog)
» » » Configure Custom Verification Method -- (clverify_custom_menu)
» » » » Add a Custom Verification Method -- (clverify_custom_dialog_add)
» » » » Change/Show a Custom Verification Method -- (clverify_custom_dialog_cha.select)
» » » » Remove a Custom Verification Method -- (clverify_custom_dialog_rem.select)
» » » Automatic Cluster Configuration Monitoring -- (clautover.dialog)
» » View Current State -- (cm_view_current_state_menu_dmn)
» » PowerHA SystemMirror Log Viewing and Management --
(cm_hacmp_log_viewing_and_management_menu_dmn)
» » » View/Save/Delete PowerHA SystemMirror Event Summaries -- (cm_dsp_evs)
» » » » View Event Summaries -- (cm_show_evs)
» » » » Save Event Summaries to a file -- (dspevs.dialog)
» » » » Delete Event Summary History -- (cm_del_evs)
» » » View Detailed PowerHA SystemMirror Log Files -- (cm_log_menu)
» » » » Scan the PowerHA SystemMirror for AIX Scripts log -- (cm_scan_scripts_log_select)
» » » » Watch the PowerHA SystemMirror for AIX Scripts log -- (cm_watch_scripts_log.dialog)
» » » » Scan the PowerHA SystemMirror for AIX System log -- (cm_scan_syslog.dialog)
» » » » Watch the PowerHA SystemMirror for AIX System log -- (cm_watch_syslog.dialog)
» » » » Scan the C-SPOC System Log File -- (cl_scan_syslog.dialog)
» » » » Watch the C-SPOC System Log File -- (cl_watch_syslog.dialog)
» » » Change/Show PowerHA SystemMirror Log File Parameters -- (cm_run_time.select)
» » » Change/Show Cluster Manager Log File Parameters -- (cluster_manager_log_param)
» » » Change/Show a Cluster Log Directory -- (clusterlog_redir.select)
» » » Change All Cluster Logs Directory -- (clusterlog_redirall_cha)
» » » Collect Cluster log files for Problem Reporting -- (cm_clsnap_dialog)
» » Recover From PowerHA SystemMirror Script Failure -- (clrecover.dialog.select)
» » Restore PowerHA SystemMirror Configuration Database from Active Configuration --
(cm_copy_acd_2dcd.dialog)
» » Release Locks Set By Dynamic Reconfiguration -- (cldarelock.dialog)
» » Cluster Test Tool -- (hacmp_testtool_menu)

» » » Execute Automated Test Procedure -- (hacmp_testtool_auto_extended)
» » » Execute Custom Test Procedure -- (hacmp_testtool_custom)
» » PowerHA SystemMirror Trace Facility -- (cm_trace_menu)
» » » Enable/Disable Tracing of PowerHA SystemMirror for AIX daemons -- (tracessys)
» » » » Start Trace -- (tracessyson)
» » » » Stop Trace -- (tracessysoff)
» » » Start/Stop/Report Tracing of PowerHA SystemMirror for AIX Service -- (trace)
» » » » START Trace -- (trcstart)
» » » » STOP Trace -- (trcstop)
» » » » Generate a Trace Report -- (trcrpt)
» » » » Manage Event Groups -- (grpmenu)
» » » » » List all Event Groups -- (lsgrp)
» » » » » Add an Event Group -- (addgrp)
» » » » » Change/Show an Event Group -- (chgrp)
» » » » » Remove Event Groups -- (delgrp.hdr)
» » » » Manage Trace -- (mngtrace)
» » » » » Change/Show Default Values -- (cngtrace)
» » » » » Reset Original Default Values -- (rstdflts)
» » PowerHA SystemMirror Error Notification -- (cm_EN_menu)
» » » Configure Automatic Error Notification -- (cm_AEN_menu)
» » » » List Error Notify Methods for Cluster Resources -- (cm_aen_list.dialog)
» » » » Add Error Notify Methods for Cluster Resources -- (cm_aen_add.dialog)
» » » » Remove Error Notify Methods for Cluster Resources -- (cm_aen_delete.dialog)
» » » Add a Notify Method -- (cm_add_notifymeth.dialog)
» » » Change/Show a Notify Method -- (cm_change_notifymeth_select)
» » » Remove a Notify Method -- (cm_del_notifymeth_select)
» » » Emulate Error Log Entry -- (show_err_emulate.select)
» » Stop RSCT Service -- (cm_manage_rsct_stop.dialog)
» » AIX Tracing for Cluster Resources -- (cm_trc_menu)
» » » Enable AIX Tracing for Cluster Resources -- (cm_trc_enable.select)
» » » Disable AIX Tracing for Cluster Resources -- (cm_trc_disable.dialog)
» » » Manage Command Groups for AIX Tracing for Cluster Resources -- (cm_trc_man_cmdgrp_menu)
» » » » List Command Groups for AIX Tracing for Cluster Resources -- (cm_trc_ls_cmdgrp.dialog)
» » » » Add a Command Group for AIX Tracing for Cluster Resources -- (cm_trc_add_cmdgrp.select)
» » » » Change / Show a Command Group for AIX Tracing for Cluster Resou --
(cm_trc_ch_cmdgrp.select)
» » » » Remove Command Groups for AIX Tracing for Cluster Resources -- (cm_trc_rm_cmdgrp.dialog)
» » Open a SMIT Session on a Node -- (cm_open_a_smit_session_select)
» Custom Cluster Configuration -- (cm_custom_menu)

» » Cluster Nodes and Networks -- (cm_custom_cluster_nodes_networks)
» » » Initial Cluster Setup (Custom) -- (cm_custom_setup_menu)
» » » » Cluster -- (cm_custom_setup_cluster_menu)
» » » » » Add/Change/Show a Cluster -- (cm_add_change_show_cluster)
» » » » » Remove the Cluster Definition -- (cm_remove_cluster)
» » » » Nodes -- (cm_custom_setup_nodes_menu)
» » » » » Add a Node -- (cm_add_node)
» » » » » Change/Show a Node -- (cm_change_show_node)
» » » » » Remove a Node -- (cm_remove_node)
» » » » Networks -- (cm_manage_networks_menu)
» » » » » Add a Network -- (cm_add_network)
» » » » » Change/Show a Network -- (cm_change_show_network)
» » » » » Remove a Network -- (cm_remove_network)
» » » » Network Interfaces -- (cm_manage_interfaces_menu)
» » » » » Add a Network Interface -- (cm_add_interfaces)
» » » » » Change/Show a Network Interface -- (cm_change_show_interfaces)
» » » » » Remove a Network Interface -- (cm_remove_interfaces)
» » » » Define Repository Disk and Cluster IP Address -- (cm_define_repos_ip_addr)
» » » Manage the Cluster -- (cm_custom_mgt_menu)
» » » » Cluster Startup Settings -- (cm_startup_options)

» » » » Reset Cluster Tunables -- (cm_reset_cluster_tunables)
» » » Verify and Synchronize Cluster Configuration (Advanced) -- (cm_adv_ver_and_sync)
» » Resources -- (cm_custom_apps_resources)
» » » Custom Disk Methods -- (cldisktype_custom_menu)
» » » » Add Custom Disk Methods -- (cldisktype_custom_dialog_add)
» » » » Change/Show Custom Disk Methods -- (cldisktype_custom_dialog_cha.select)
» » » » Remove Custom Disk Methods -- (cldisktype_custom_dialog_rem.select)
» » » Custom Volume Group Methods -- (cm_config_custom_volume_methods_menu_dmn)
» » » » Add Custom Volume Group Methods -- (cm_dialog_add_custom_volume_methods)
» » » » Change/Show Custom Volume Group Methods --
(cm_selector_change_custom_volume_methods)
» » » » Remove Custom Volume Group Methods -- (cm_dialog_delete_custom_volume_methods)
» » » Custom File System Methods -- (cm_config_custom_filesystem_methods_menu_dmn)
» » » » Add Custom File System Methods -- (cm_dialog_add_custom_filesystem_methods)
» » » » Change/Show Custom File System Methods --
(cm_selector_change_custom_filesystem_methods)
» » » » Remove Custom File System Methods -- (cm_dialog_delete_custom_filesystem_methods)
» » » Configure User Defined Resources and Types -- (cm_cludrestype_main_menu)
» » » » Configure User Defined Resource Types -- (cm_cludrestype_sub_menu)
» » » » » Add a User Defined Resource Type -- (cm_cludrestype_add)
» » » » » Change/Show a User Defined Resource Type -- (cm_cludrestype_change)
» » » » » Remove a User Defined Resource Type -- (cm_cludrestype_remove)
» » » » Configure User Defined Resources -- (cm_cludres_sub_menu)
» » » » » Add a User Defined Resource -- (cm_cludres_add)
» » » » » Change/Show a User Defined Resource -- (cm_cludres_change)
» » » » » Remove a User Defined Resource -- (cm_cludres_remove)
» » » » » Change/Show User Defined Resource Monitor -- (cm_cludres_chmonitor)
» » » » Import User Defined Resource Types and Resources Definition from XML file --
(cm_cludrestype_importxml)
» » » Customize Resource Recovery -- (_cm_change_show_resource_action_select)
» » Events -- (cm_events)
» » » Cluster Events -- (cm_cluster_events)
» » » » Configure Pre/Post-Event Commands -- (cm_defevent_menu)
» » » » » Add a Custom Cluster Event -- (cladd_event.dialog)
» » » » » Change/Show a Custom Cluster Event -- (clchsh_event.select)
» » » » » Remove a Custom Cluster Event -- (clrm_event.select)
» » » » Change/Show Pre-Defined Events -- (clcsclev.select)
» » » » User-Defined Events -- (clude_custom_menu)
» » » » » Add Custom User-Defined Events -- (clude_custom_dialog_add)
» » » » » Change/Show Custom User-Defined Events -- (clude_custom_dialog_cha.select)
» » » » » Remove Custom User-Defined Events -- (clude_custom_dialog_rem.select)
» » » » Remote Notification Methods -- (cm_def_cus_pager_menu)
» » » » » Configure a Node/Port Pair -- (define_node_port)
» » » » » Remove a Node/Port Pair -- (remove_node_port)
» » » » » Add a Custom Remote Notification Method -- (cladd_pager_notify.dialog)
» » » » » Change/Show a Custom Remote Notification Method -- (clch_pager_notify)
» » » » » Remove a Custom Remote Notification Method -- (cldel_pager_notify)
» » » » » Send a Test Remote Notification -- (cltest_pager_notify)
» » » » Change/Show Time Until Warning -- (cm_time_before_warning)
» » » System Events -- (cm_system_events)
» » » » Change/Show Event Response -- (cm_change_show_sys_event)
» » Verify and Synchronize Cluster Configuration (Advanced) -- (cm_adv_ver_and_sync)
» Can't find what you are looking for ? -- (cm_tree)
» Not sure where to start ? -- (cm_getting_started)

C
Appendix C. PowerHA supported hardware

Historically, newer versions of PowerHA inherited support from previous versions, unless
specific support was removed by the product. Over time, it has become uncommon to remove
support for old hardware. If the hardware was supported in the past and it can run a version of
AIX that is supported by the current version of PowerHA, the hardware is supported.
Because PowerHA 7.1 is not supported on any AIX level before 6.1.6, if the hardware is not
supported on 6.1.6, then by definition PowerHA 7.1 does not support it either. Also, if the
hardware manufacturer has not made any statement of support for AIX 7.1, it is not valid until
such support is stated. This is true even though the tables in this appendix might show that
PowerHA supports it.
This appendix contains information about IBM Power Systems, IBM storage, adapters, and
AIX levels supported by current versions of High-Availability Cluster Multi-Processing
(HACMP) 5.4.1 through PowerHA 7.1. It focuses on hardware support from around the last
five years and consists mainly of IBM POWER5 systems and later. At the time of writing, the
information was current and complete.
All POWER5 and later systems are supported on AIX 7.1 and HACMP 5.4.1 and later. AIX 7.1
support has the following specific requirements for HACMP and PowerHA:
HACMP 5.4.1, SP10
PowerHA 5.5, SP7
PowerHA 6.1, SP3
PowerHA 7.1
Full software support details are in the official support flash. The information in this appendix
is available and maintained in the “PowerHA hardware support matrix” at:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105638
Most of the devices in the online documentation are linked to their corresponding support flash.
This appendix includes the following topics:

IBM Power Systems
IBM storage
Adapters

IBM Power Systems
The following sections provide details about the IBM Power System servers and the levels of
PowerHA and AIX supported.
IBM POWER5 systems

Table C-1 lists the software versions for PowerHA with AIX supported on IBM POWER5
System p models.
Table C-1 POWER5 System p model support for HACMP and PowerHA
System p HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1
models
7037-A50 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9110-510 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
9110-51A AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
9111-520 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6 r
9116-561+ AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
9118-575 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6 r

Table C-2 lists the software versions for PowerHA with AIX supported on IBM POWER5
System i® models.
Table C-2 POWER5 System i model support for HACMP and PowerHA
System i models HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1
IBM POWER6 systems

Table C-3 lists the software versions for PowerHA with AIX supported on POWER6 System p
models.
Table C-3 POWER6 System p support for PowerHA and AIX

models
8203-E4A AIX 5.3 TL7 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP2 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
8203-E8A AIX 5.3 TL7 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX6.1 TL0 SP AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
8234-EMA AIX 5.3 TL8 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
9117-MMA AIX 5.3 TL6 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
9119-FHA AIX 5.3 TL8 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 SP1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
9125-F2A AIX 5.3 TL8 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 SP1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
Built-in serial ports: Built-in serial ports in POWER6 servers are not available for
PowerHA use. Instead, use disk heartbeating. However, note that the built-in Ethernet
(IVE) adapters are supported for PowerHA use.
Appendix C. PowerHA supported hardware 493

IBM POWER7 Systems
Table C-4 lists the software versions for HACMP and PowerHA with AIX supported on IBM
POWER7 System p models.
Table C-4 POWER7 System p support for HACMP and PowerHA

models
8202-E4B/720 AIX 5.3 TL11 SP1 AIX 5.3 TL12 AIX 5.3 TL12 AIX 6.1 TL6
AIX 6.1 TL4 SP2 AIX 6.1 TL5 AIX 6.1 TL5 AIX 7.1
8233-E8B/750 AIX 5.3 TL11 SP1 AIX 5.3 TL11 SP1 AIX 5.3 TL11 AIX 6.1 TL6 r
9117-MMB/770 AIX 5.3 TL11 SP1 AIX 5.3 TL11 AIX 5.3 TL11 AIX 6.1 TL6
9119-FHB/795 AIX 5.3 TL11 SP1 AIX 5.3 TL12 AIX 5.3 TL12 AIX 6.1 TL6
9179-FHB/780 AIX 5.3 TL11 SP1 AIX 5.3 TL11 AIX 5.3 TL11 or AIX 6.1 TL6
Built-in serial ports: Built-in serial ports in POWER7 Servers are not available for
PowerHA use. Instead, use disk heartbeating. However, note that the built-in Ethernet
(IVE) adapters are supported for PowerHA use.
IBM POWER Blade servers

Table C-5 lists the software versions for HACMP and PowerHA with AIX supported on IBM
POWER Blade servers.
Table C-5 IBM POWER Blade support for HACMP and PowerHA
models
7778-23X/JS23 HACMP SP2 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL7 AIX 6.1 TL2 SP1 AIX 6.1 TL2 SP1 AIX 7.1
AIX 6.1 TL0 SP2
AIX 6.1 TL0 SP2
AIX 5.3 TL7 AIX 6.1 TL2 SP AIX 6.1 TL2 SP1 AIX 7.1

models
8406-70Y/PS700 AIX 5.3 TL11 SP1 AIX 5.3 TL12 AIX 5.3 TL12 AIX 6.1 TL6
8406-71Y/PS701 AIX 5.3 TL11 SP1 AIX 5.3 TL12 AIX 5.3 TL12 AIX 6.1 TL6
PS702 AIX 6.1 TL4 SP2 AIX 6.1 TL5 AIX 6.1 TL5 AIX 7.1
8844-31U/JS21 AIX 5.3. TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
8844-51U/JS21 AIX 6.1 TL2 SP1 AIX 6.1 TL2 SP1 AIX 7.1
Blade support includes support for IVM and IVE on both POWER6 and POWER7 blades. The
following adapter cards are supported in the POWER6 and POWER7 blades:
8240 Emulex 8Gb FC Expansion Card (CIOv)
8241 QLogic 4Gb FC Expansion Card (CIOv)
8242 QLogic 8Gb Fibre Channel Expansion Card (CIOv)
8246 SAS Connectivity Card (CIOv)
8251 Emulex 4Gb FC Expansion Card (CFFv)
8252 QLogic combo Ethernet and 4 Gb Fibre Channel Expansion Card (CFFh)
8271 QLogic Ethernet/8Gb FC Expansion Card (CFFh)
IBM storage
It is common to use multipathing drivers with storage. If using MPIO, SDD, SDDPCM, or all
three types on any PowerHA controlled storage, you are required to use enhanced concurrent
volume groups (ECVGs). This requirement also applies to vSCSI and NPIV devices.
Fibre Channel adapters

This section provides information about support for fibre channel (FC) adapters.
DS storage units
Table C-6 lists the DS storage unit support for HACMP and PowerHA with AIX.
Table C-6 DS storage unit support for HACMP and PowerHA

Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1
DS3400 HACMP SP2 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL8 AIX TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL2
AIX 6.1 TL2
DS4100 AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6

AIX 6.1 AIX TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL0 SP2
DS6800 AIX 6.1 AIX TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL0 SP2
AIX 6.1 TL0 SP2
931,932,9B2 AIX 6.1 AIX TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL2
IBM XIV
Table C-7 lists the software versions for HAMCP and PowerHA with AIX supported on XIV
storage. PowerHA requires XIV microcode level 10.0.1 or later.
Table C-7 IBM XIV support for HACMP and PowerHA with AIX
XIV HACMP SP4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
2810-A14 AIX 5.3 TL7 SP6 AIX TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL0 SP2

SAN Volume Controller
Table C-8 shows the software versions for HACMP and PowerHA with AIX supported on the
SAN Volume Controller (SVC). SVC software levels are supported up through SVC v5.1. The
levels shown in the table are the absolute minimum requirements for v5.1.
Table C-8 SVC supported models for HACMP and PowerHA with AIX
2145-4F2 HACMP SP8 PowerHA SP6 PowerHA SP1 AIX 6.1 TL6
AIX 5.3 TL9 AIX 5.3 TL9 AIX 5.3 TL9 AIX 7.1
AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3
2145-8F2 HACMP SP8 PowerHA SP8 PowerHA SP1 AIX 6.1 TL6
AIX 5.3 TL9 AIX 5.3 TL9 AIX 5.3 TL9 AIX 7.1
AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3
Network-attached storage
Table C-9 shows the software versions for PowerHA and AIX supported on network-attached
storage (NAS).
Table C-9 NAS supported models for HACMP and PowerHA with AIX
N3700 (A20) AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
N5200 (G20) AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
N5300 HACMP SP3 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP2
N5600 HACMP SP3 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP2
N6040 AIX 5.3 TL7 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6

Serial-attached SCSI
Table C-10 lists the software versions for PowerHA and AIX supported on the serial-attached
SCSI (SAS) model.
Table C-10 SAS supported model for HACMP and PowerHA with AIX
5886 EXP12S HACMP SP5 HACMP SP2 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3
SCSI
Table C-11 shows the software versions for PowerHA and AIX supported on the SCSI model.
Table C-11 SCSI supported model for HACMP and PowerHA with AIX
7031-D24 AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
Adapters
This following sections contain information about the supported adapters for PowerHA.
Fibre Channel adapters

The following FC adapters are supported:
#1905 4 Gb Single Port Fibre Channel PCI-X 2.0 DDR Adapter
#1910 4 Gb Dual Port Fibre Channel PCI-X 2.0 DDR Adapter
#1957 2 Gigabit Fibre Channel PCI-X Adapter

#5273 LP 8 Gb PCI-Express Dual Port Fibre Channel Adapter*
#5276 LP 4 Gb PCI-Express Fibre Channel Adapter
#5735 8 Gb PCI-Express Dual Port Fibre Channel Adapter*
#5758 4 Gb Single Port Fibre Channel PCI-X 2.0 DDR Adapter
#5759 4 Gb Dual Port Fibre Channel PCI-X 2.0 DDR Adapter
#5773 Gigabit PCI Express Fibre Channel Adapter
#5774 Gigabit PCI Express Fibre Channel Adapter
#6228 1-and 2-Gigabit Fibre Channel Adapter for 64-bit PCI Bus
#6239 2 Gigabit FC PCI-X Adapter
#5273/#5735 PCI-Express Dual Port Fibre Channel Adapter: The 5273/5735 minimum
requirements are PowerHA 5.4.1 SP2 or 5.5 SP1.
SAS
The following SAS adapters are supported:
#5278 LP 2x4port PCI-Express SAS Adapter 3 Gb
#5901 PCI-Express SAS Adapter
#5902 PCI-X DDR Dual –x4 Port SAS RAID Adapter
#5903 PCI-Express SAS Adapters
#5912 PCI-X DDR External Dual – x4 Port SAS Adapter
Table C-12 lists the SAS software support requirements.
Table C-12 SAS software support for HACMP and PowerHA with AIX
HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1
HACMP SP5 HACMP SP2 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3
Ethernet
The following Ethernet adapters are supported with PowerHA:
#1954 4-Port 10/100/100 Base-TX PCI-X Adapter
#1959 IBM 10/100/1000 Base-TX Ethernet PCI-X Adapter
#1978 IBM Gigabit Ethernet-SX PCI-X Adapter
#1981 IBM 10 Gigabit Ethernet-SR PCI-X Adapter
#1982 IBM 10 Gigabit Ethernet-LR PCI-X Adapter
#1983 IBM 2-port 10/100/1000 Base-TX Ethernet PCI-X
#1984 IBM Dual Port Gigabit Ethernet-SX PCI-X Adapter
#1990 IBM 2-port 10/100/1000 Base-TX Ethernet PCI-X
#4961 IBM Universal 4-Port 10/100 Ethernet Adapter
#4962 IBM 10/100 Mbps Ethernet PCI Adapter II
#5271 LP 4-Port Ethernet 10/100/1000 Base-TX PCI-X Adapter
#5274 LP 2-Port Gigabit Ethernet-SX PCI Express
#5700 IBM Gigabit Ethernet-SX PCI-X Adapter
#5706 IBM 2-Port 10/100/1000 Base-TX Ethernet PCI-X Adapter
#5707 IBM 2-Port Gigabit Ethernet-SX PCI-X Adapter
#5717 IBM 4-Port Ethernet 10/100/1000 Base-TX PCI-X Adapter

#5718 IBM 10 Gigabit -SR/-LR Ethernet PCI-x adapters
#5719 IBM 10 Gigabit -SR/-LR Ethernet PCI-x adapters
#5721 IBM 10 Gigabit Ethernet-SR PCI-X 2.0 Adapter
#5722 IBM 10 Gigabit Ethernet-LR PCI-X 2.0 Adapter
#5740 4-Port 10/100/100 Base-TX PCI-X Adapter
#5767 Adapter 2-Port 10/100/1000 Base-TX Ethernet PCI Express
#5768 Adapter 2-Port Gigabit Ethernet-SX PCI Express
InfiniBand
The following InfiniBand adapters are supported with PowerHA:
#1809 IBM GX Dual-port 4x IB HCA
SCSI and iSCSI

The following SCSI and iSCSI adapters are supported with PowerHA:
#1912 IBM PCI-X DDR Dual Channel Ultra320 LVD SCSI Adapter
#1913 PCI-X DDR Dual Channel Ultra320 SCSI RAID Adapter
#1975 PCI-X Dual Channel Ultra320 SCSI RAID Adapter
#1986 1 Gigabit-TX iSCSI TOE PCI-X adapter (copper connector)
#1987 1 Gigabit-SX iSCSI TOE PCI-X adapter (optical connector)
#5703 PCI-X Dual Channel Ultra320 SCSI RAID Adapter
#5710 PCI-X Dual Channel Ultra320 SCSI Adapter
#5711 PCI-X Dual Channel Ultra320 SCSI RAID Blind Swap Adapter
#5712 PCI-X Dual Channel Ultra320 SCSI Adapter
#5713 1 Gigabit-TX iSCSI TOE PCI-X adapter (copper connector)
#5714 1 Gigabit-SX iSCSI TOE PCI-X adapter (optical connector)
#5736 IBM PCI-X DDR Dual Channel Ultra320 SCSI Adapter
#5737 PCI-X DDR Dual Channel Ultra320 SCSI RAID Adapter
PCI bus adapters

PowerHA 7.1 no longer supports RS-232 connections. Therefore, the following adapters are
supported up through PowerHA 6.1 only:
#2943 8-Port Asynchronous EIA-232/RS-422, PCI bus adapter
#2944 128-Port Asynchronous Controller, PCI bus adapter
#5277 IBM LP 4-Port Async EIA-232 PCIe Adapter
#5723 2-Port Asynchronous EIA-232/RS-422, PCI bus adapter
#5785 IBM 4-Port Async EIA-232 PCIe adapter
The 5785 adapter is only supported by PowerHA 5.5 and 6.1.

D
Appendix D. The clmgr man page

At time of writing, no documentation was available about the clmgr command except for the
related man pages. To make it easier for those of you who do not have the product installed
and want more details about the clmgr command, a copy of the man pages is provided as
follows:
clmgr command
************
Purpose
=======
clmgr: Provides a consistent, reliable interface for performing IBM PowerHA
SystemMirror cluster operations via a terminal or script. All clmgr
operations are logged in the "clutils.log" file, including the
command that was executed, its start/stop time, and what user
initiated the command.
The basic format for using clmgr is consistently as follows:
clmgr <ACTION> <CLASS> [<NAME>] [<ATTRIBUTES...>]
This consistency helps make clmgr easier to learn and use. Further
help is also available at each part of clmgr's commmand line. For
example, just executing "clmgr" by itself will result in a list of
the available ACTIONs supported by clmgr. Executing "clmgr ACTION"
with no CLASS provided will result in a list of all the available
CLASSes for the specified ACTION. Executing "clmgr ACTION CLASS"
with no NAME or ATTRIBUTES provided is slightly different, though,
since for some ACTION+CLASS combinations, that may be a valid
command format. So to get help in this scenario, it is necessary
to explicitly request it by appending the "-h" flag. So executing
"clmgr ACTION CLASS -h" will result in a listing of all known
attributes for that ACTION+CLASS combination being displayed.
That is where clmgr's ability to help ends, however; it can not
help with each individual attribute. If there is a question about
what a particular attribute is for, or when to use it, the product

documentation will need to be consulted.
Synopsis
========
clmgr [-c|-x] [-S] [-v] [-f] [-D] [-l {low|med|high|max}] [-T <ID>]
[-a {<ATTR#1>,<ATTR#2>,<ATTR#n>,...}] <ACTION> <CLASS> [<NAME>]
[-h | <ATTR#1>=<VALUE#1> <ATTR#2>=<VALUE#2> <ATTR#n>=<VALUE#n>]
ACTION={add|modify|delete|query|online|offline|...}
CLASS={cluster|site|node|network|resource_group|...}
clmgr {-h|-?} [-v]

clmgr [-v] help
ACTION a verb describing the operation to be performed
The following four ACTIONs are available on almost all the

supported CLASSes (there are a few exceptions):
add (Aliases: a)
query (Aliases: q, ls, get)
modify (Aliases: mod, ch, set)
delete (Aliases: de, rm, er)
The remaining ACTIONS are typically only supported on a small

subset of the supported CLASSes:
Cluster, Sites, Node, Resource Group:

online (Aliases: on, start)
offline (Aliases: off, stop)
Resource Group, Service IP, Persistent IP:

move (Aliases: mv)
Cluster, Log, Node, Snapshot:

manage (Aliases: mg)
Cluster, File Collection:

sync (Aliases: sy)
Cluster, Method:
verify (Aliases: ve)
Log, Report, Snapshot:

view (Aliases: vi)
NOTE: ACTION is *not* case-sensitive.

NOTE: all ACTIONs provide a shorter alias, such as "rm" in
place of "delete". These aliases are provided for
convenience/ease-of-use at a terminal, and are not
recommended for use in scripts.
CLASS the type of object upon which the ACTION will be performed.
The complete list of supported CLASSes is:

cluster (Aliases: cl)
site (Aliases: si)
node (Aliases: no)
interface (Aliases: in, if)
network (Aliases: ne, nw)
resource_group (Aliases: rg)
service_ip (Aliases: se)
persistent_ip (Aliases: pe)
application_controller (Aliases: ac, app)
application_monitor (Aliases: am, mon)
tape (Aliases: tp)
dependency (Aliases: de)
file_collection (Aliases: fi, fc)
snapshot (Aliases: sn, ss)
resource (Aliases: rs)
resource_type (Aliases: rt)
method (Aliases: me)
volume_group (Aliases: vg)
logical_volume (Aliases: lv)
file_system (Aliases: fs)
physical_volume (Aliases: pv)
NOTE: CLASS is *not* case-sensitive.

NOTE: all CLASSes provide a shorter alias, such as "fc" in
place of "file_collection". These aliases are provided
for convenience/ease-of-use at a terminal, and are not
recommended for use in scripts.
NAME the specific object, of type "CLASS", upon which the ACTION
is to be performed.
ATTR=VALUE optional, attribute/value pairs that are specific to the

ACTION+CLASS combination. These may be used to do specify
configuration settings, or adjust particular operations.
When used with the "query" action, ATTR=VALUE specifications

may be used to perform attribute-based searching/filtering.
When used for this purpose, simple wildcards may be used.
For example, "*" matches zero or more of any character, "?"
matches zero or one of any character.
NOTE: an ATTR may not always need to be fully typed. Only the
number of leading characters required to uniquely identify
the attribute from amongst the set of attributes available
for the specified operation need to be provided. So instead
of "FC_SYNC_INTERVAL", for the "add/modify cluster"
operation, "FC" could be used, and would have the same
result.
-a valid only with the "query", "add", and "modify" ACTIONs,

requests that only the specified attribute(s) be displayed.
NOTE: the specified order of these attributes is *not*

guaranteed to be preserved in the resulting output.
Appendix D. The clmgr man page 503

-c valid only with the "query", "add", and "modify" ACTIONs,
requests all data to be displayed in colon-delimited format.
-D disables the dependency mechanism in clmgr that will attempt to

create any requisite resources if they are not already defined
within the cluster.
-f requests an override of any interactive prompts, forcing the

current operation to be attempted (if forcing the operation
is a possibility).
-h requests that any available help information be displayed.

An attempt is made to provide context-sensitive assistance.
-l activates trace logging for serviceability:
low: logs function entry/exit

med: adds function entry parameters, as well as function
return values
high: adds tracing of every line of execution, only omitting
routine, "utility" functions
max: adds the routine/utility functions. Also adds a time/date
stamp to the function entry/exit messages.
All trace data is written into the "clutils.log" file.

This option is typically only of interest when troubleshooting.
-S valid only with the "query" ACTION and "-c" option,

requests that all column headers be suppressed.
-T a transaction ID to be applied to all logged output, to help

group one of more activities into a single body of output that
can be extracted from the log for analysis.
This option is typically only of interest when troubleshooting.
-v requests maximum verbosity in the output.
NOTE: when used with the "query" action and no specific

object name, queries all instances of the specified
class. For example, "clmgr -v query node" will query
and display *all* nodes and their attributes. When
used with the "add" or "modify" operations, the
final, resulting attributes after the operation is
complete will be displayed (only if the operation
was successful).
-x valid only with the "query", "add", and "modify" ACTIONs,

requests all data to be displayed in simple XML format.
Operations
==========
CLUSTER:
clmgr add cluster \
[ <cluster_label> ] \
REPOSITORY=<hdisk#> \

SHARED_DISKS=<hdisk#>[,<hdisk#>,...] \
[ NODES=<host>[,<host#2>,<host#n>,...] ] \
[ CLUSTER_IP=<IP_Address> ] \
[ FC_SYNC_INTERVAL=## ] \
[ RG_SETTLING_TIME=## ] \
[ MAX_EVENT_TIME=### ] \
[ MAX_RG_PROCESSING_TIME=### ] \
[ SITE_POLICY_FAILURE_ACTION={fallover|notify} ] \
[ SITE_POLICY_NOTIFY_METHOD="<FULL_PATH_TO_FILE>" ]
[ DAILY_VERIFICATION={Enabled|Disabled} ] \
[ VERIFICATION_NODE={Default|<node>} ] \
[ VERIFICATION_HOUR=<00..23> ] \
[ VERIFICATION_DEBUGGING={Enabled|Disabled} ]
clmgr modify cluster \
[ NEWNAME=<new_cluster_label> ] \
[ SHARED_DISKS=<disk>[,<disk#2>,<disk#n>,...] ] \
[ NODES=<host>[,<host#2>,<host#n>,...] ] \
[ CLUSTER_IP=<IP_Address> ] \
[ FC_SYNC_INTERVAL=## ] \
[ RG_SETTLING_TIME=## ] \
[ MAX_EVENT_TIME=### ] \
[ MAX_RG_PROCESSING_TIME=### ] \
[ SITE_POLICY_FAILURE_ACTION={fallover|notify} ] \
[ SITE_POLICY_NOTIFY_METHOD="<FULL_PATH_TO_FILE>" ]
[ DAILY_VERIFICATION={Enabled|Disabled} ] \
[ VERIFICATION_NODE={Default|<node>} ] \
[ VERIFICATION_HOUR=<00..23> ] \
[ VERIFICATION_DEBUGGING={Enabled|Disabled} ]
clmgr query cluster
clmgr delete cluster [ NODES={ALL|<node>[,<node#2>,<node#n>,...}] ]
clmgr recover cluster
NOTE: the "delete" action defaults to only deleting

the cluster on the local node.
clmgr sync cluster \

[ VERIFY={yes|no} ] \
[ CHANGES_ONLY={no|yes} ] \
[ DEFAULT_TESTS={yes|no} ] \
[ METHODS=<method#1>[,<method#n>,...] ] \
[ FIX={no|yes} ] \
[ LOGGING={standard|verbose} ] \
[ LOGFILE=<PATH_TO_LOG_FILE> ] \
[ MAX_ERRORS=## ] \
[ FORCE={no|yes} ]
NOTE: all options are verification parameters, so they
are only valid when "VERIFY" is set to "yes".
clmgr manage cluster {discover|reset|unlock}
clmgr manage cluster security \

LEVEL={Disable|Low|Med|High}
ALGORITHM={DES|3DES|AES} \
[ GRACE_PERIOD=<SECONDS> ] \

[ REFRESH=<SECONDS> ]
MECHANISM={OpenSSL|SelfSigned|SSH} \
[ CERTIFICATE=<PATH_TO_FILE> ] \
[ PRIVATE_KEY=<PATH_TO_FILE> ]
NOTE: "GRACE_PERIOD" defaults to 21600 seconds (6 hours).

NOTE: "REFRESH" defaults to 86400 seconds (24 hours).
clmgr verify cluster \

[ CHANGES_ONLY={no|yes} ] \
[ DEFAULT_TESTS={yes|no} ] \
[ METHODS=<method#1>[,<method#n>,...] ] \
[ FIX={no|yes} ] \
[ LOGGING={standard|verbose} ] \
[ LOGFILE=<PATH_TO_LOG_FILE> ] \
[ MAX_ERRORS=## ]
[ SYNC={no|yes} ] \
[ FORCE={no|yes} ]
NOTE: the "FORCE" option should only be used when "SYNC" is set
to "yes".
clmgr offline cluster \

[ WHEN={now|restart|both} ] \
[ MANAGE={offline|move|unmanage} ] \
[ BROADCAST={true|false} ] \
[ TIMEOUT=<seconds_to_wait_for_completion> ]
clmgr online cluster \
[ MANAGE={auto|manual} ] \
[ BROADCAST={false|true} ] \
[ CLINFO={false|true|consistent} ] \
[ FORCE={false|true} ] \
[ FIX={no|yes|interactively} ]
NOTE: the "RG_SETTLING_TIME" attribute only affects resource groups

with a startup policy of "Online On First Available Node".
NOTE: an alias for "cluster" is "cl".
SITE:
clmgr add site <sitename> \
[ NODES=<node>[,<node#2>,<node#n>,...] ]
clmgr modify site <sitename> \
[ NEWNAME=<new_site_label> ] \
[ {ADD|REPLACE}={ALL|<node>[,<node#2>,<node#n>,...}] ]
At least one modification option must be specified.
ADD attempts to append the specified nodes to the site.
REPLACE attempts to replace the sites current nodes with
the specified nodes.
clmgr query site [ <sitename>[,<sitename#2>,<sitename#n>,...] ]
clmgr delete site {<sitename>[,<sitename#2>,<sitename#n>,...] | ALL}
clmgr recover site <sitename>
clmgr offline site <sitename> \

clmgr online site <sitename> \
NOTE: an alias for "site" is "si".
NODE:
clmgr add node <node> \
[ COMMPATH=<ip_address_or_network-resolvable_name> ] \
[ RUN_DISCOVERY={true|false} ] \
[ PERSISTENT_IP=<IP> NETWORK=<network>
{NETMASK=<255.255.255.0 | PREFIX=1..128} ] \
[ START_ON_BOOT={false|true} ] \
[ BROADCAST_ON_START={true|false} ] \
[ CLINFO_ON_START={false|true|consistent} ] \
[ VERIFY_ON_START={true|false} ]
clmgr modify node <node> \
[ NEWNAME=<new_node_label> ] \
[ COMMPATH=<new_commpath> ] \
[ PERSISTENT_IP=<IP> NETWORK=<network>
{NETMASK=<255.255.255.0 | PREFIX=1..128} ] \
[ START_ON_BOOT={false|true} ] \
[ BROADCAST_ON_START={true|false} ] \
[ CLINFO_ON_START={false|true|consistent} ] \
[ VERIFY_ON_START={true|false} ]
clmgr query node [ {<node>|LOCAL}[,<node#2>,<node#n>,...] ]
clmgr delete node {<node>[,<node#2>,<node#n>,...] | ALL}
clmgr manage node undo_changes
clmgr recover node <node>[,<node#2>,<node#n>,...]
clmgr online node <node>[,<node#2>,<node#n>,...] \
clmgr offline node <node>[,<node#2>,<node#n>,...] \
NOTE: the "TIMEOUT" attribute defaults to 120 seconds.

NOTE: an alias for "node" is "no".
NETWORK:

clmgr add network <network> \
[ TYPE={ether|XD_data|XD_ip|infiniband} ] \
[ {NETMASK=<255.255.255.0 | PREFIX=1..128} ] \
[ IPALIASING={true|false} ]
clmgr modify network <network> \
[ NEWNAME=<new_network_label> ] \
[ TYPE={ether|XD_data|XD_ip|infiniband} ] \
[ {NETMASK=<###.###.###.###> | PREFIX=1..128} ] \
[ ENABLE_IPAT_ALIASING={true|false} ] \
[ PUBLIC={true|false} ] \
[ RESOURCE_DIST_PREF={AC|C|CPL|ACPL} ]
clmgr query network [ <network>[,<network#2>,<network#n>,...] ]
clmgr delete network {<network>[,<network#2>,<network#n>,...] | ALL}
NOTE: the TYPE defaults to "ether" if not specified.

NOTE: when adding, the default is to construct an IPv4
network using a netmask of "255.255.255.0". To
create an IPv6 network, specify a valid prefix.
NOTE: AC == Anti-Collocation
C == Collocation
CPL == Collocation with Persistent Label
ACPL == Anti-Collocation with Persistent Label
NOTE: aliases for "network" are "ne" and "nw".
INTERFACE:
clmgr add interface <interface> \
NETWORK=<network> \
[ NODE=<node> ] \
[ TYPE={ether|infiniband} ] \
[ INTERFACE=<network_interface> ]
clmgr modify interface <interface> \
NETWORK=<network>
clmgr query interface [ <interface>[,<if#2>,<if#n>,...] ]
clmgr delete interface {<interface>[,<if#2>,<if#n>,...] | ALL}
NOTE: the "interface" may be either an IP address or label

NOTE: the "NODE" attribute defaults to the local node name.
NOTE: the "TYPE" attribute defaults to "ether"
NOTE: the "<network_interface>" might look like "en1", "en2", ...
NOTE: aliases for "interface" are "in" and "if".
RESOURCE GROUP:
clmgr add resource_group <resource_group> \
NODES=nodeA1,nodeA2,... \
[ SECONDARYNODES=nodeB2,nodeB1,... ] \
[ STARTUP={OHN|OFAN|OAAN|OUDP} ] \
[ FALLOVER={FNPN|FUDNP|BO} ] \
[ FALLBACK={NFB|FBHPN} ] \
[ NODE_PRIORITY_POLICY={default|mem|cpu| \
disk|least|most} ] \
[ NODE_PRIORITY_POLICY_SCRIPT=</path/to/script> ] \
[ NODE_PRIORITY_POLICY_TIMEOUT=### ] \
[ SITE_POLICY={ignore|primary|either|both} ] \
[ SERVICE_LABEL=service_ip#1[,service_ip#2,...] ] \
[ APPLICATIONS=appctlr#1[,appctlr#2,...] ] \

[ SHARED_TAPE_RESOURCES=<TAPE>[,<TAPE#2>,...] ] \
[ VOLUME_GROUP=<VG>[,<VG#2>,...] ] \
[ FORCED_VARYON={true|false} ] \
[ VG_AUTO_IMPORT={true|false} ] \
[ FILESYSTEM=/file_system#1[,/file_system#2,...] ] \
[ DISK=<hdisk>[,<hdisk#2>,...] ] \
[ FS_BEFORE_IPADDR={true|false} ] \
[ WPAR_NAME="wpar_name" ] \
[ EXPORT_FILESYSTEM=/expfs#1[,/expfs#2,...] ] \
[ EXPORT_FILESYSTEM_V4=/expfs#1[,/expfs#2,...] ] \
[ STABLE_STORAGE_PATH="/fs3" ] \
[ NFS_NETWORK="nfs_network" ] \
[ MOUNT_FILESYSTEM=/nfs_fs1;/expfs1,/nfs_fs2;,... ]
STARTUP:
OHN ----- Online Home Node (default value)
OFAN ---- Online on First Available Node
OAAN ---- Online on All Available Nodes (concurrent)
OUDP ---- Online Using Node Distribution Policy
FALLOVER:
FNPN ---- Fallover to Next Priority Node (default value)
FUDNP --- Fallover Using Dynamic Node Priority
BO ------ Bring Offline (On Error Node Only)
FALLBACK:
NFB ----- Never Fallback
FBHPN --- Fallback to Higher Priority Node (default value)
NODE_PRIORITY_POLICY:
NOTE: this policy may only be established if if the FALLOVER
policy has been set to "FUDNP".
default - next node in the NODES list
mem ----- node with most available memory
disk ---- node with least disk activity
cpu ----- node with most available CPU cycles
least --- node where the dynamic node priority script
returns the lowest value
most ---- node where the dynamic node priority script
returns the highest value
SITE_POLICY:
ignore -- Ignore
primary - Prefer Primary Site
either -- Online On Either Site
both ---- Online On Both Sites
NOTE: "SECONDARYNODES" and "SITE_POLICY" only apply when sites are

configured within the cluster.
NOTE: "appctlr" is an abbreviation for "application_controller".
clmgr modify resource_group <resource_group> \

[ NEWNAME=<new_resource_group_label> ] \
[ NODES=nodeA1[,nodeA2,...] ] \
[ SECONDARYNODES=nodeB2[,nodeB1,...] ] \

[ STARTUP={OHN|OFAN|OAAN|OUDP} ] \
[ FALLOVER={FNPN|FUDNP|BO} ] \
[ FALLBACK={NFB|FBHPN} ] \
[ NODE_PRIORITY_POLICY={default|mem|cpu| \
disk|least|most} ] \
[ SITE_POLICY={ignore|primary|either|both} ] \
[ SERVICE_LABEL=service_ip#1[,service_ip#2,...] ] \
[ APPLICATIONS=appctlr#1[,appctlr#2,...] ] \
[ VOLUME_GROUP=volume_group#1[,volume_group#2,...]] \
[ FORCED_VARYON={true|false} ] \
[ VG_AUTO_IMPORT={true|false} ] \
[ FILESYSTEM=/file_system#1[,/file_system#2,...] ] \
[ DISK=hdisk#1[,hdisk#2,...] ] \
[ FS_BEFORE_IPADDR={true|false} ] \
[ WPAR_NAME="wpar_name" ] \
[ EXPORT_FILESYSTEM=/expfs#1[,/expfs#2,...] ] \
[ EXPORT_FILESYSTEM_V4=/expfs#1[,/expfs#2,...] ] \
[ STABLE_STORAGE_PATH="/fs3" ] \
[ NFS_NETWORK="nfs_network" ] \
[ MOUNT_FILESYSTEM=/nfs_fs1;/expfs1,/nfs_fs2;,... ]
NOTE: "SECONDARYNODES" and "SITE_POLICY" only apply when sites are

configured within the cluster.
clmgr query resource_group [ <resource_group>[,<rg#2>,<rg#n>,...] ]

clmgr delete resource_group {<resource_group>[,<rg#2>,<rg#n>,...] |
ALL}
clmgr online resource_group <resource_group>[,<rg#2>,<rg#n>,...] \
[ NODES=<node>[,<node#2>,...] ]
clmgr offline resource_group <resource_group>[,<rg#2>,<rg#n>,...] \
[ NODES=<node>[,<node#2>,...] ]
clmgr move resource_group <resource_group>[,<rg#2>,<rg#n>,...] \
{SITE|NODE}=<node_or_site_label> \
[ STATE={online|offline} ] \
[ SECONDARY={false|true} ]
NOTE: the "SITE" and "SECONDARY" attributes are only applicable

when sites are configured in the cluster.
NOTE: the "SECONDARY" attribute defaults to "false".
NOTE: the resource group STATE remains unchanged if "STATE" is
not explicitly specified.
NOTE: an alias for "resource_group" is "rg".
FALLBACK TIMER:
clmgr add fallback_timer <timer> \
[ YEAR=<####> ] \
[ MONTH=<{1..12 | Jan..Dec}> ] \
[ DAY_OF_MONTH=<{1..31}> ] \
[ DAY_OF_WEEK=<{0..6 | Sun..Sat}> ] \
HOUR=<{0..23}> \
MINUTE=<{0..59}>
clmgr modify fallback_timer <timer> \
[ YEAR=<{####}> ] \
[ MONTH=<{1..12 | Jan..Dec}> ] \

[DAY_OF_MONTH=<{1..31}> ] \
[DAY_OF_WEEK=<{0..6 | Sun..Sat}> ] \
[HOUR=<{0..23}> ] \
[MINUTE=<{0..59}> ] \
[REPEATS=<{0,1,2,3,4 |
Never,Daily,Weekly,Monthly,Yearly}> ]
clmgr query fallback_timer [<timer>[,<timer#2>,<timer#n>,...] ]
clmgr delete fallback_timer {<timer>[,<timer#2>,<timer#n>,...] |\
ALL}
NOTE: aliases for "fallback_timer" are "fa" and "timer".
PERSISTENT IP/LABEL:
clmgr add persistent_ip <persistent_IP> \
NETWORK=<network> \
[ NODE=<node> ]
clmgr modify persistent_ip <persistent_label> \
[ NEWNAME=<new_persistent_label> ] \
[ NETWORK=<new_network> ] \
[ PREFIX=<new_prefix_length> ]
clmgr query persistent_ip [ <persistent_IP>[,<pIP#2>,<pIP#n>,...] ]
clmgr delete persistent_ip {<persistent_IP>[,<pIP#2>,<pIP#n>,...] |
ALL}
clmgr move persistent_ip <persistent_IP> \
INTERFACE=<new_interface>
NOTE: an alias for "persistent_ip" is "pe".
SERVICE IP/LABEL:
clmgr add service_ip <service_ip> \
NETWORK=<network> \
[ {NETMASK=<255.255.255.0 | PREFIX=1..128} ] \
[ HWADDR=<new_hardware_address> ] \
[ SITE=<new_site> ]
clmgr modify service_ip <service_ip> \
[ NEWNAME=<new_service_ip> ] \
[ NETWORK=<new_network> ] \
[ {NETMASK=<###.###.###.###> | PREFIX=1..128} ] \
[ HWADDR=<new_hardware_address> ] \
[ SITE=<new_site> ]
clmgr query service_ip [ <service_ip>[,<service_ip#2>,...] ]
clmgr delete service_ip {<service_ip>[,<service_ip#2>,,...] | ALL}
clmgr move service_ip <service_ip> \
INTERFACE=<new_interface>
NOTE: if the "NETMASK/PREFIX" attributes are not specified,

the netmask/prefix value for the underlying network
is used.
NOTE: an alias for "service_ip" is "se".
APPLICATION CONTROLLER:
clmgr add application_controller <application_controller> \
STARTSCRIPT="/path/to/start/script" \
STOPSCRIPT ="/path/to/stop/script"
[ MONITORS=<monitor>[,<monitor#2>,<monitor#n>,...] ]

clmgr modify application_controller <application_controller> \
[ NEWNAME=<new_application_controller_label> ] \
[ STARTSCRIPT="/path/to/start/script" ] \
[ STOPSCRIPT ="/path/to/stop/script" ]
[ MONITORS=<monitor>[,<monitor#2>,<monitor#n>,...] ]
clmgr query application_controller [ <appctlr>[,<appctlr#2>,...] ]
clmgr delete application_controller {<appctlr>[,<appctlr#2>,...] | \
ALL}
NOTE: aliases for "application_controller" are "ac" and "app".
APPLICATION MONITOR:
clmgr add application_monitor <monitor> \
TYPE={Process|Custom} \
APPLICATIONS=<appctlr#1>[,<appctlr#2>,<appctlr#n>,...] \
MODE={continuous|startup|both} \
[ STABILIZATION="1 .. 3600" ] \
[ RESTARTCOUNT="0 .. 100" ] \
[ FAILUREACTION={notify|fallover} ] \
Process Arguments:
PROCESSES="pmon1,dbmon,..." \
OWNER="<processes_owner_name>" \
[ INSTANCECOUNT="1 .. 1024" ] \
[ RESTARTINTERVAL="1 .. 3600" ] \
[ NOTIFYMETHOD="</script/to/notify>" ] \
[ CLEANUPMETHOD="</script/to/cleanup>" ] \
[ RESTARTMETHOD="</script/to/restart>" ]
Custom Arguments:
MONITORMETHOD="/script/to/monitor" \
[ MONITORINTERVAL="1 .. 1024" ] \
[ HUNGSIGNAL="1 .. 63" ] \
[ RESTARTINTERVAL="1 .. 3600" ] \
[ NOTIFYMETHOD="</script/to/notify>" ] \
[ CLEANUPMETHOD="</script/to/cleanup>" ] \
[ RESTARTMETHOD="</script/to/restart>" ]
NOTE: "STABILIZATION" defaults to 180

NOTE: "RESTARTCOUNT" defaults to 3
clmgr modify application_monitor <monitor> \

[ NEWNAME=<new_monitor_label> ] \
[ See the "add" action, above, for a list
of supported modification attributes. ]
clmgr query application_monitor [ <monitor>[,<monitor#2>,...] ]
clmgr delete application_monitor {<monitor>[,<monitor#2>,...] | ALL}
NOTE: aliases for "application_monitor" are "am" and "mon".
DEPENDENCY:

# Temporal Dependency (parent ==> child)
clmgr add dependency \
PARENT=<rg#1> \
CHILD="<rg#2>[,<rg#2>,<rg#n>...]"
clmgr modify dependency <parent_child_dependency> \
[ TYPE=PARENT_CHILD ] \
[ PARENT=<rg#1> ] \
[ CHILD="<rg#2>[,<rg#2>,<rg#n>...]" ]
# Temporal Dependency (start/stop after)

{STOP|START}="<rg#2>[,<rg#2>,<rg#n>...]" \
AFTER=<rg#1>
clmgr modify dependency \
[ TYPE={STOP_AFTER|START_AFTER} ] \
[ {STOP|START}="<rg#2>[,<rg#2>,<rg#n>...]" ] \
[ AFTER=<rg#1> ]
# Location Dependency (colocation)

SAME={NODE|SITE} \
GROUPS="<rg1>,<rg2>[,<rg#n>...]"
clmgr modify dependency <colocation_dependency> \
[ TYPE=SAME_{NODE|SITE} ] \
GROUPS="<rg1>,<rg2>[,<rg#n>...]"
# Location Dependency (anti-colocation)

HIGH="<rg1>,<rg2>,..." \
INTERMEDIATE="<rg3>,<rg4>,..." \
LOW="<rg5>,<rg6>,..."
clmgr modify dependency <anti-colocation_dependency> \
[ TYPE=DIFFERENT_NODES ] \
[ HIGH="<rg1>,<rg2>,..." ] \
[ INTERMEDIATE="<rg3>,<rg4>,..." ] \
[ LOW="<rg5>,<rg6>,..." ]
# Acquisition/Release Order
TYPE={ACQUIRE|RELEASE} \
{ SERIAL="{<rg1>,<rg2>,...|ALL}" |
PARALLEL="{<rg1>,<rg2>,...|ALL}" }
clmgr modify dependency \
TYPE={ACQUIRE|RELEASE} \
{ SERIAL="{<rg1>,<rg2>,...|ALL}" |
PARALLEL="{<rg1>,<rg2>,...|ALL}" }
clmgr query dependency [ <dependency> ]

clmgr delete dependency {<dependency> | ALL} \
[ TYPE={PARENT_CHILD|STOP_AFTER|START_AFTER| \
SAME_NODE|SAME_SITE}|DIFFERENT_NODES} ]
clmgr delete dependency RG=<RESOURCE_GROUP>
NOTE: an alias for "dependency" is "de".

TAPE:
clmgr add tape <tape> \
DEVICE=<tape_device_name> \
[ DESCRIPTION=<tape_device_description> ] \
[ START="</script/to/start/tape/device>" ] \
[ START_SYNCHRONOUSLY={no|yes} ] \
[ STOP="</script/to/stop/tape/device>" ] \
[ STOP_SYNCHRONOUSLY={no|yes} ]
clmgr modify tape <tape> \
[ NEWNAME=<new_tape_label> ] \
[ DEVICE=<tape_device_name> ] \
[ DESCRIPTION=<tape_device_description> ] \
[ START="</script/to/start/tape/device>" ] \
[ START_SYNCHRONOUSLY={no|yes} ] \
[ STOP="</script/to/stop/tape/device>" ] \
[ STOP_SYNCHRONOUSLY={no|yes} ]
clmgr query tape [ <tape>[,<tape#2>,<tape#n>,...] ]
clmgr delete tape {<tape> | ALL}
NOTE: an alias for "tape" is "tp".
FILE COLLECTION:
clmgr add file_collection <file_collection> \
FILES="/path/to/file1,/path/to/file2,..." \
[ SYNC_WITH_CLUSTER={no|yes} ] \
[ SYNC_WHEN_CHANGED={no|yes} ] \
[ DESCRIPTION="<file_collection_description>" ]
clmgr modify file_collection <file_collection> \
[ NEWNAME="<new_file_collection_label>" ] \
[ ADD="/path/to/file1,/path/to/file2,..." ] \
[ DELETE={"/path/to/file1,/path/to/file2,..."|ALL} ] \
[ REPLACE={"/path/to/file1,/path/to/file2,..."|""} ] \
[ SYNC_WITH_CLUSTER={no|yes} ] \
[ SYNC_WHEN_CHANGED={no|yes} ] \
[ DESCRIPTION="<file_collection_description>" ]
clmgr query file_collection [ <file_collection>[,<fc#2>,<fc#n>,...]]
clmgr delete file_collection {<file_collection>[,<fc#2>,<fc#n>,...]|
ALL}
clmgr sync file_collection <file_collection>
NOTE: the "REPLACE attribute replaces all existing

files with the specified set
NOTE: aliases for "file_collection" are "fc" and "fi".
SNAPSHOT:
clmgr add snapshot <snapshot> \
DESCRIPTION="<snapshot_description>" \
[ METHODS="method1,method2,..." ] \
[ SAVE_LOGS={false|true} ]
clmgr modify snapshot <snapshot> \
[ NEWNAME="<new_snapshot_label>" ] \
[ DESCRIPTION="<snapshot_description>" ]
clmgr query snapshot [ <snapshot>[,<snapshot#2>,<snapshot#n>,...] ]
clmgr view snapshot <snapshot> \
[ TAIL=<number_of_trailing_lines> ] \

[ HEAD=<number_of_leading_lines> ] \
[ FILTER=<pattern>[,<pattern#2>,<pattern#n>,...] ] \
[ DELIMITER=<alternate_pattern_delimiter> ] \
[ CASE={insensitive|no|off|false} ]
clmgr delete snapshot {<snapshot>[,<snapshot#2>,<snapshot#n>,...] |
ALL}
clmgr manage snapshot restore <snapshot> \
[ CONFIGURE={yes|no} ] \
[ FORCE={no|yes} ]
NOTE: the "view" action displays the contents of the ".info"

file for the snapshot, if that file exists.
NOTE: CONFIGURE defaults to "yes"; FORCE defaults to "no".
NOTE: an alias for "snapshot" is "sn".
METHOD:
clmgr add method <method_label> \
TYPE={snapshot|verify} \
FILE=<executable_file> \
[ DESCRIPTION=<description> ]
clmgr modify method <method_label> \
TYPE={snapshot|verify} \
[ NEWNAME=<new_method_label> ] \
[ DESCRIPTION=<new_description> ] \
[ FILE=<new_executable_file> ]
clmgr add method <method_label> \

TYPE=notify \
CONTACT=<number_to_dial_or_email_address> \
EVENT=<event>[,<event#2>,<event#n>,...] \
[ NODES=<node>[,<node#2>,<node#n>,...] ] \
[ FILE=<message_file> ] \
[ DESCRIPTION=<description> ] \
[ RETRY=<retry_count> ] \
[ TIMEOUT=<timeout> ]
NOTE: "NODES" defaults to the local node.
clmgr modify method <method_label> \

TYPE=notify \
[ NEWNAME=<new_method_label> ] \
[ DESCRIPTION=<description> ] \
[ FILE=<message_file> ] \
[ CONTACT=<number_to_dial_or_email_address> ] \
[ EVENT=<cluster_event_label> ] \
[ RETRY=<retry_count> ] \
[ TIMEOUT=<timeout> ]
clmgr query method [ <method>[,<method#2>,<method#n>,...] ] \

[ TYPE={notify|snapshot|verify} ]
clmgr delete method {<method>[,<method#2>,<method#n>,...] | ALL} \
[ TYPE={notify|snapshot|verify} ]
clmgr verify method <method>
NOTE: the "verify" action can only be applied to "notify" methods.

If more than one method exploits the same event, and that
event is specified, then both methods will be invoked.
NOTE: an alias for "method" is "me".
LOG:
clmgr modify logs ALL DIRECTORY="<new_logs_directory>"
clmgr modify log {<log>|ALL} \
[ DIRECTORY="{<new_log_directory>"|DEFAULT} ]
[ FORMATTING={none|standard|low|high} ] \
[ TRACE_LEVEL={low|high} ]
[ REMOTE_FS={true|false} ]
clmgr query log [ <log>[,<log#2>,<log#n>,...] ]
clmgr view log [ {<log>|EVENTS} ] \
[ TAIL=<number_of_trailing_lines> ] \
[ HEAD=<number_of_leading_lines> ] \
[ FILTER=<pattern>[,<pattern#2>,<pattern#n>,...] ] \
[ DELIMITER=<alternate_pattern_delimiter> ] \
[ CASE={insensitive|no|off|false} ]
clmgr manage logs collect \
[ DIRECTORY="<directory_for_collection>" ] \
[ RSCT_LOGS={yes|no} ] \
NOTE: when "DEFAULT: is specified for the "DIRECTORY" attribute,

then the original, default IBM PowerHA SystemMirror directory
value is restored.
NOTE: the "FORMATTING" attribute only applies to the "hacmp.out"
log, and is ignored for all other logs.
NOTE: the "FORMATTING" and "TRACE_LEVEL" attributes only apply
to the "hacmp.out" and "clstrmgr.debug" logs, and are
ignored for all other logs.
NOTE: when "ALL" is specified in place of a log name, then the
provided DIRECTORY and REMOTE_FS modifications are applied
to all the logs.
NOTE: when "EVENTS" is specified in place of a log name,
then an events summary report is displayed.
VOLUME GROUP:
clmgr query volume_group
LOGICAL VOLUME:
clmgr query logical_volume
FILE_SYSTEM:
clmgr query file_system
PHYSICAL VOLUME:
clmgr query physical_volume \
[ <disk>[,<disk#2>,<disk#n>,...] ] \
[ NODES=<node>,<node#2>[,<node#n>,...] ] \
[ ALL={no|yes} ]
NOTE: "node" may be either a node name, or a network-

resolvable name (i.e. hostname or IP address).

NOTE: "disk" may be either a device name (e.g. "hdisk0")
or a PVID (e.g. "00c3a28ed9aa3512").
NOTE: an alias for "physical_volume" is "pv".
REPORT:
clmgr view report [<report>] \
[ FILE=<PATH_TO_NEW_FILE> ] \
[ TYPE={text|html} ]
clmgr view report {nodeinfo|rginfo|lvinfo|

fsinfo|vginfo|dependencies} \
[ TARGETS=<target>[,<target#2>,<target#n>,...] ] \
[ TYPE={text|html} ]
clmgr view report availability \

[ TARGETS=<appctlr>[,<appctlr#2>,<appctlr#n>,...] ] \
[ TYPE={text|html} ] \
[ BEGIN_TIME="YYYY:MM:DD" ] \
[ END_TIME="YYYY:MM:DD" ]
NOTE: the currently supported reports are "basic", "cluster",

"status", "topology", "applications", "availability",
"events", "nodeinfo", "rginfo", "networks", "vginfo",
"lvinfo", "fsinfo", and "dependencies".
Some of these reports provide overlapping information, but
each also provides its own, unique information, as well.
NOTE: "MM" must be 1 - 12. "DD" must be 1 - 31.
NOTE: if no "BEGIN_TIME" is provided, then a report will be
generated for the last 30 days prior to "END_TIME".
NOTE: if no "END_TIME" is provided, then the current time will
be the default.
NOTE: an alias for "report" is "re".
Usage Examples
==============
clmgr query cluster
* For output that is more easily consumed by other programs, alternative

output formats, such as colon-delimited or XML, may be helpful:
clmgr -c query cluster
clmgr -x query node nodeA
* Most multi-value lists can be specified in either a colon-delimited manner,

or via quoted strings:
clmgr -a cluster_id,version query cluster
clmgr -a "cluster_id version" query cluster
* Combinations of option flags can be used to good effect. For example, to

retrieve a single value for a single attribute:
clmgr -cSa "version" query cluster

* Attribute-based searching can help filter out unwanted data, ensuring that
only the desired results are returned:
clmgr -v -a "name" q rg nodes="*nodeB*"
clmgr query file_collection files="*rhosts*"
* Application availability reports can help measure application uptime

requirements:
clmgr view report availability
clmgr add cluster tryme nodes=nodeA,nodeB

clmgr add application_controller manage_httpd \
start_script=/usr/local/bin/scripts/start_ihs.sh \
stop_script=/usr/local/bin/scripts/stop_ihs.sh
clmgr add application_monitor monitor_httpd \
type=process \
applications=manage_httpd \
processes=httpd \
owner=root \
mode=continuous \
stabilization=300 \
restartcount=3 \
failureaction=notify \
notifymethod="/usr/local/bin/scripts/ihs_notification.sh" \
cleanupmethod="/usr/local/bin/scripts/web_app_cleanup.sh" \
restartmethod="/usr/local/bin/scripts/start_ihs.sh"
clmgr add resource_group ihs_rg \
nodes=nodeA,nodeB \
startup=OFAN \
fallover=FNPN \
fallback=NFB \
node_priority_policy=mem \
applications=manage_httpd
clmgr view log hacmp.out FILTER=Event:
Suggested Reading
=================
IBM PowerHA SystemMirror for AIX Troubleshooting Guide
IBM PowerHA SystemMirror for AIX Planning Guide
IBM PowerHA SystemMirror for AIX Installation Guide
Prerequisite Information
========================
IBM PowerHA SystemMirror for AIX Concepts and Facilities Guide
Related Information
===================
IBM PowerHA SystemMirror for AIX Administration Guide

Related publications
The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this book.
IBM Redbooks
The following IBM Redbooks publication provides additional information about the topic in this
document. Note that it might be available in softcopy only.
Best Practices for DB2 on AIX 6.1 for POWER Systems, SG24-7821
DS8000 Performance Monitoring and Tuning, SG24-7146
IBM AIX Version 7.1 Differences Guide, SG24-7910
IBM System Storage DS8700 Architecture and Implementation, SG24-8786
Implementing IBM Systems Director 6.1, SG24-7694
Personal Communications Version 4.3 for Windows 95, 98 and NT, SG24-4689
PowerHA for AIX Cookbook, SG24-7739
You can search for, view, download or order this document and other Redbooks, Redpapers,
Web Docs, draft, and additional materials, at the following website:
ibm.com/redbooks
Other publications
These publications are also relevant as further information sources:
Cluster Management, SC23-6779
PowerHA SystemMirror for IBM Systems Director, SC23-6763
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Administration Guide,
SC23-6750
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Concepts and Facilities
Guide, SC23-6751
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Installation Guide,
SC23-6755
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Master Glossary, SC23-6757
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Planning Guide,
SC23-6758-01
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Programming Client
Applications, SC23-6759
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Smart Assist Developer’s
Guide, SC23-6753
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Smart Assist for DB2 user’s
Guide, SC23-6752

PowerHA SystemMirror Version 7.1 for AIX Standard Edition Smart Assist for Oracle
User’s Guide, SC23-6760
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Smart Assist for WebSphere
User’s Guide, SC23-6762
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Troubleshooting Guide,
SC23-6761
Online resources
These websites are also relevant as further information sources:
IBM PowerHA SystemMirror for AIX
PowerHA hardware support matrix
http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105638
IBM PowerHA High Availability wiki
http://www.ibm.com/developerworks/wikis/display/WikiPtype/High%20Availability
Implementation Services for Power Systems for PowerHA for AIX
http://www.ibm.com/services/us/index.wss/offering/its/a1000032
IBM training classes for PowerHA SystemMirror for AIX
http://www.ibm.com/training
Help from IBM

IBM Support and downloads
ibm.com/support
IBM Global Services

ibm.com/services

Index
Application Availability and Configuration reports 358
Symbols application configuration 86
/etc/cluster/rhosts file 73 application controller
collection monitoring 202 configuration 91
populating 183 versus application server 67
rolling migration 186 application list 353
snapshot migration 168 application monitoring 368
/etc/filesystems file 318 Application Name field tip 143
/etc/hosts file, collection monitoring 202 application server
/etc/inittab file, cluster monitoring 206 clmgr command 120
/etc/services file 139 versus application controller 67
/etc/syslogd.conf file, cluster monitoring 206 application startup, testing with Startup Monitoring config-
/tmp/clmigcheck/clmigcheck.log 161 ured 298
/usr/es/sbin/cluster/utilities/ file 234 architecture
/var/adm/ras/syslog.caa log file 229 changes for RSCT 3.1 3
/var/hacmp/log/clutils.log file, clmgr debugging 131 IBM Systems Director 22
/var/log/clcomd/clcomd.log file 313 PowerHA SystemMirror 1
#5273/#5735 PCI-Express Dual Port Fibre Channel Autonomic Health Advisor File System (AHAFS) 11
Adapter 499 files used in RSCT 12
A B
-a option bootstrap repository 225
clmgr command 109 built-in serial ports 493–494
wildcards 110
ADAPTER_DOWN event 12
adapters C
Ethernet 499 -c flag 209
fibre channel 498 CAA (Cluster Aware AIX) 7, 225
for the repository disk 49 /etc/filesystems file 318
InfiniBand 500 AIX 6.1 and 7.1 3
PCI bus 500 central repository 9
SAS 499 changed PVID of repository disk 322
SCSI and iSCSI 500 chcluster command 480
adaptive failover 35, 102 clcomdES 24
Add Network tab 344 cluster after the node restarts 317
adding on new volume group 416 cluster commands 477
Additional Properties tab 257 cluster creation 154, 318
agent password 328 cluster environment management 11
AHAFS (Autonomic Health Advisor File System) 11 cluster services not active message 323
files used in RSCT 12 clusterconf command 481
AIX collecting debug information for IBM support 231
commands and log files 216 commands and log files 224
disk and dev_group association 443 communication 156
importing volume groups 383 daemons 8
installation of IBM Systems Director 327 disk fencing 37
volume group configuration 381 file sets 7, 179
AIX 6.1 67 initial cluster status 82
AIX 6.1 TL6 152 log files for troubleshooting 306
for migration 47 lscluster command 478
upgrading to 153 mkcluster command 478
AIX 7.1 support of PowerHA 6.1 193 previously used repository disk 316
AIX BOS components removal of volume group when rmcluster does not
installation 59 320
prerequisites for 44 repository disk 9
AIX_CONTROLLED interface state 18 repository disk replacement 317

rmcluster command 480 return of only one value 110
RSCT changes 8 service address 118
services, adding a shared disk 173 simple XML format 130
subsystem group active 208 syntax 113
subsystem guide 208 usage examples 106
subsystems 202 using help 111
support in RSCT v3.1 3 -v option 110
switch from Group Services 156 clmgr debugging, /var/hacmp/log/clutils.log file 131
troubleshooting 316 clmgr man page 501
volume group clmgr online cluster start_cluster command 129
already in use 320 clmgr query cluster command 108
previously used 320 clmgr query command 107, 122
zone 211 clmgr sync cluster command 124
Can’t find what you are looking for ? 66 clmgr utility 65, 241
Capture Snapshot window 345 cluster configuration 104
CCI instance 424 PowerHA log files 307
central cluster repository-based communication (DP- query action 242
COM) interface 15 view action 243
states 15 clmgr verify cluster command 124
central repository 9 clmgr view log command 307
chcluster command 480 clmigcheck command 153
description 480 menu options 164
examples 481 process overview 158
flags 481 profile 157
syntax 480 program 157, 167
checking the configuration 164 running 159
clcomd instances 157 running on one node 168
clcomd subsystem 157 clmigcheck script 308
clcomdES daemon 157 clmigcheck.txt file 160
clcomdES subsystem 157 clmigcleanup process 155
clconfd subsystem 8 clRGinfo command 284
clconvert_snapshot command 168 clshowres command 393
cldump utility 233, 312 clstat utility 231, 312
clevmgrdES subsystem 31 interactive mode 232
CLI -o flag 233
cluster creation 340 cltopinfo command 82, 118, 309
cluster creation with SystemMirror plug-in 339 cluster
command help 341 adding 385
examples of command usage for resource group man- adding networks 387
agement 360 adding sites 386
clinfo command 127 checking in rolling migration 191
clmgr add cluster command 114, 118 configuration 385
clmgr add resource_group command 111 configuration synchronization 454
clmgr add resource_group -h command 111 creation
clmgr command 131, 501 CAA 318
-a option 109 with CLI 340
actions 104 event flow when a node joins 39
alternative output formats 130 IP address 67
application server 120 menu 253
cluster definition synchronization 124 multicast IP address, configuration 73
cluster start 127 name 114
colon-delimited format 130 removal of 103
configuring a PowerHA cluster 112 restarting 183
displaying log file content 132 starting 403
enhanced search capability 109 status 205
error messages 106 topology, custom configuration 78
log file 130 Cluster Applications and Resources 27
new cluster configuration 113 Cluster Aware AIX (CAA) 7, 225
object classes 105 /etc/filesystems file 318
resource group 120 AIX 6.1 and 7.1 3

central repository 9 undoing local changes 370
chcluster command 480 verification and synchronization 360
clcomdES 24 CLI 363
cluster after node restarts 317 GUI 360
cluster commands 477 Cluster Configuration Report 366
cluster creation 154, 318 cluster creation 333–334
cluster environment management 11 common storage 337
clusterconf command 481 host names in FQDN format 75
collecting debug information for IBM support 231 SystemMirror plug-in CLI 339
commands and log files 224 SystemMirror plug-in wizard 334
communication 156 cluster event management 11
daemons 8 cluster implementation
disk fencing 37 hardware requirements 44
file sets 7, 179 migration planning 46
initial cluster status 82 network 50
log files for troubleshooting 306 planning for high availability 43
lscluster command 478 PowerHA 7.1 considerations 46
mkcluster command 478 prerequisites for AIX BOS components 44
repository disk 9 prerequisites for RSCT components 44
rmcluster command 480 software requirements 44
RSCT changes 8 storage 48
services, adding a shared disk 173 supported hardware 45
subsystem group active 208 cluster interfaces listing 234
subsystem guide 208 cluster management 333
subsystems 202 functionality 343
support in RSCT v3.1 3 modification functionality 349
troubleshooting 316 storage management 345
volume group, previously used 320 SystemMirror plug-in 341
zone 211 CLI 347
cluster communication 13 SystemMirror plug-in GUI wizard 341–342
heartbeat configuration 20 Cluster Management window 343
interfaces 13 Cluster Management Wizard 342
AIX_CONTROLLED 18 cluster modification locks 369
central cluster repository-based communication cluster monitoring 201
(DPCOM) 15 /etc/inittab file 206
IP network interfaces 13 /etc/syslogd.conf file 206
RESTRICTED 18 /usr/es/sbin/cluster/utilities/ file tools 234
SAN-based communication (SFWCOM) 14 /var/adm/ras/syslog.caa log file 229
node status 18 active cluster 368
round-trip time 20 AIX commands and log files 216
cluster configuration 65 application monitoring 368
clmgr utility 65, 104 CAA commands and log files 224
defining 70 CAA debug information for IBM support 231
event failure recovery 370 CAA subsystem group active 208
node names 70 CAA subsystem guide 208
problem determination data collection 370 CAA subsystems 202
recovery from issues 369 cldump utility 233
resource groups 95 clmgr utility 241
SMIT menu 65–66 clstat utility 231
custom configuration 68, 78 Cluster Configuration Report 366
repository disk and cluster multicast IP address cluster modification locks 369
73 cluster status 205, 208, 218
resource group dependencies 96 Cluster Topology Configuration Report 367
resources and applications configuration 86 common agent subsystems 205
resources configuration 68 disk configuration 203, 207, 216
typical configuration 67, 69 Group Services 218
starting all nodes 129 information collection
SystemMirror for IBM Systems Director 133 after cluster configuration 206
SystemMirror plug-in 65 after cluster is running 216
test environment 68 before configuration 202
Index 523
lscluster command for cluster information 209 common storage 337
map view 365 communication interfaces, adding 387
multicast information 205, 207, 217 communication node status 18
network configuration and routing table 219 communication path 314
network interfaces configuration 203 components, Reliable Scalable Cluster Technology
ODM classes 236 (RSCT) 2
of activities before starting a cluster 364 configuration
PowerHA groups 203 AIX disk and dev_group association 443
recovery from configuration issues 369 cluster 385
repository disk 206 adding 385
repository disk, CAA, solidDB 225 adding a node 386
routing table 204 adding communication interfaces 387
solidDB log files 230 cluster resources and resource group 388
subsystem services status 366 Hitachi TrueCopy/HUR resources 429
SystemMirror plug-in 364 PowerHA cluster 65
tcipdump, iptrace, mping utilities 220 recovery from issues 369
tools 231 troubleshooting 312
topology view 364 verification and synchronization 360
cluster node CLI 363
installation of SystemMirror agent 330 GUI 360
status and mapping 287 verification of Hitachi TrueCopy/HUR 453
Cluster Nodes and Networks 27 volume groups and file systems on primary site 381
cluster resources, configuration 388 Configure Persistent Node IP Label/Address menu 31
Cluster services are not active message 323 CPU starvation 292
Cluster Snapshot menu 31 Create Dependency function 357
Cluster Standard Configuration menu 29 creation
cluster status 208 custom resource group 351
cluster still stuck in migration condition 308 predefined resource group 353
cluster testing 259, 297 resource group 349
CPU starvation 292 verifying 355
crash in node with active resource group 289 C-SPOC
dynamic node priority 302 adding Global Mirror pair to existing volume group
Group Services failure 296 405
loss of the rootvg volume group 286, 289 creating a volume group 412
network failure 283 disaster recovery 373
network failure simulation 282 on other LVM operations 422
repository disk heartbeat channel 269 storage resources and resource groups 86
rootvg system event 286 cthags, grpsvcs 6
rootvg volume group offline 288 Custom Cluster Configuration menu 30
SAN-based heartbeat channel 260 custom configuration 68, 78
cluster topology 385 verifying and synchronizing 81
Cluster Topology Configuration Report 367
cluster topology information 235
CLUSTER_OVERRIDE environment variable 36 D
deleting the variable 37 -d flag 214
error message 37 daemons
clusterconf command 481 CAA 8
description 481 clcomd 8
examples 482 clconfd 8
flags 482 cld 8
syntax 481 failure in Group Services 12
clutils file 306 data collection, problem determination 370
Collect log files button 347 DB2
collection monitoring installation on nodes for Smart Assist 136
/etc/cluster/rhosts file 202 instance and database on shared 137
/etc/hosts file 202 debug information, collecting for IBM support 231
colon-delimited format of clmgr command 130 default value 122
command dev_group association and AIX disk configuration 443
help 341 disaster recovery
profile 157 C-SPOC operations 373
DS8700 requirements 372

Global Mirror 371 configuration verification 453
adding a cluster 385 considerations 421
adding a logical volume 407 creating volume groups and file systems on repli-
adding a node 386 cated disks 447
adding a pair to a new volume group 411 defining managed replicated resource to PowerHA
adding a pair to an existing volume group 404 451
adding communication interfaces 387 failover testing 454
adding networks 387 graceful site failover 455, 460
adding new logical volume on new volume group HORCM instances 426
416 horcm.conf files 425
adding sites 386 increasing size of existing file system 468
AIX volume group configuration 381 LVM administration of replicated pairs 463
cluster configuration 385 management 422
cluster resources and resource group configura- minimum connectivity requirements 420
tion 388 planning 420
considerations 373 replicated pair creation 432
creating new volume group 412 resource configuration 429
DS8700 requirements 372 rolling site failure 457, 461
failover testing 393 site re-integration 459, 462
FlashCopy relationship creation 379 software prerequisites 420
Global Copy relationship creation 378 Discovery Manager 249
graceful site failover 395 disk configuration 203, 207
importing new volume group to remote site 416 AIX 216
importing volume groups 383 disk heartbeat network, removing 310
installing DSCLI client software 373 DNP (dynamic node priority)
LVM administration of replicated resources 404 configuration 102
mirror group 391 script for the nodes 102
planning 372 testing 302
PPRC path creation 377 DPCOM 15
relationship configuration 377 dpcom node connection 83
resource configuration 374–375 dpcomm interface 213
resource definition 389 DPF database support 139
resources and resource group definition 391 DS storage units 495
rolling site failure 398 DS8000 Global Mirror Replicated Resources field 393
service IP definition 388 DS8700 disaster recovery requirements 372
session identifier 379 DSCLI client software 373
sessions for involved LSSs 380 duplicated events, filtering 12
site re-integration 400 dynamic node priority (DNP)
size increase of existing file system 410 adaptive failover 35
software prerequisites 372 configuration 102
source and target volumes 380 script for the nodes 102
storage agent 389 testing 302
storage system 390
symmetrical configuration 376
synchronization and verification of cluster configu- E
ration 416 ECM volume group 313
testing fallover after adding new volume group Edit Advanced Properties button 344
417 error messages
volume group and file system configuration 381 clmgr command 106
Hitachi TrueCopy/HUR 419 CLUSTER_OVERRIDE variable 37
adding logical volume 466 Ethernet 499
adding LUN pair to new volume group 469 event failure recovery 370
adding LUN pairs event flow 38
to existing volume group 463 node down processing normal with takeover 41
adding replicated resources 451 startup processing 38
to a resource group 452 when another node joins the cluster 39
AIX disk and dev_group association 443 export DISPLAY 329
asynchronous pairing 439
CCI software installation 422 F
cluster configuration synchronization 454 failback of PPRC pairs
Index 525
primary site 402 disaster recovery 371
secondary site 400 DS8700 requirements 372
failbackpprc command 402 failover testing 393
failover of PPRC pairs back to primary site 401 graceful site failover 395
failover testing 393 rolling site failure 398
graceful site failover 395 site re-integration 400
Hitachi TrueCopy/HUR 454 importing new volume group to remote site 416
rolling site failure 398 importing volume groups 383
site re-integration 400 LVM administration of replicated resources 404
failoverpprc command 383 mirror group 391
fallover testing planning for disaster recovery 372
after adding new volume group 476 software prerequisites 372
after making LVM changes 469 PPRC and SPPRC file sets 372
fast path, smitty cm_apps_resources 95 relationship configuration 377
fcsX FlashCopy relationship creation 379
device busy 57 Global Copy relationship creation 378
X value 57 PPRC path creation 377
Fibre Channel adapters 495, 498 session identifier 379
DS storage units 495 sessions for involved LSSs 380
IBM XIV 496 source and target volumes 380
SAN-based communication 57 resource configuration 374
SVC 497 prerequisites 375
file collection and logs management 346 source and target volumes 375
file collection creation 346 resource definition 389
file sets 7, 61 resources and resource group definition 391
installation 58, 64 service IP definition 388
PowerHA 62 session identifier 379
PPRC and SPPRC 372 sessions for all involved LSSs 380
Smart Assist 91 size increase of existing file system 410
Smart Assist for DB2 136 source and target volumes 380
file systems 121 storage agent 389
configuration 381 storage system 390
creation with volume groups 447 symmetrical configuration 376
importing for Smart Assist for DB2 137 synchronization and verification of cluster configura-
increasing size 468 tion 416
size increase 410 testing fallover after adding new volume group 417
FILTER argument 132 gossip protocol 13
FlashCopy relationship creation 379 graceful site failover 395, 455, 460
FQDN format on host names 75 moving resource group to another site 395
Group Services 2
daemon failure 12
G failure 296
GENXD Replicated Resources field 393 information 218
Global Copy relationships 378 subsystem name
Global Mirror cthags 6
adding a cluster 385 grpsvcs 6
adding a logical volume 407 switch to CAA 156
adding a node 386 grpsvcs
adding a pair to a new volume group 411 cthags 6
adding a pair to an existing volume group 404 SRC subsystem 156
adding communication interfaces 387
adding networks 387
adding new logical volume on new volume group 416 H
adding sites 386 HACMPtopsvcs class 238
AIX volume group configuration 381 halt -q command 289
cluster configuration 385 hardware configuration
cluster resources and resource group configuration Fibre Channel adapters for SAN-based communica-
388 tion 57
considerations for disaster recovery 373 SAN zoning 54
creating new volume group 412 shared storage 55
C-SPOC operations 373 test environment 54

hardware requirements 44 IBM support, collecting CAA debug information 231
multicast IP address 45 IBM Systems Director 21
repository disk 45 advantages 21
SAN 45 agent file 58
supported hardware 45 agent password 328
heartbeat architecture 22
considerations for configuration 20 availability menu 251
testing 260 CLI (smcli interface) 257
heartbeat channel, repository disk 269 cluster configuration 133
help in clmgr command 111 cluster creation 333–334
high availability, planning a cluster implementation 43 cluster management 333
Hitachi CCI software 422 configuration 328
installation in a non-root directory 423 installation 325–326
installation in root directory 423 AIX 327
installing a newer version 424 hardware requirements 326
Hitachi TrueCopy/Hitachi Universal Replicator (Hitachi login page 247
TrueCopy/HUR) 419 root user 247
Hitachi TrueCopy/HUR smadmin group 247
adding LUN pairs to existing volume group 463 smcli utility 22
adding LUN pairs to new volume group 469 status of common agent subsystems 205
adding new logical volume 466 SystemMirror plug-in 21, 65, 329
AIX disk and dev_group association 443 systems and agents to discover 250
assigning LUNs to hosts 429 web interface 246
asynchronous pairing 439 welcome page 248
CCI instance 424 IBM XIV 496
CCI software installation 422 ifconfig en0 down command 283
cluster configuration synchronization 454 IGMP (Internet Group Management Protocol) 14
considerations 421 InfiniBand 500
creating volume groups and file systems on replicated information collection after cluster is running 216
disks 447 installation
failover testing 454 AIX BOS components 59
graceful site failover 455, 460 common agent 331
HORCM instances 426 DSCLI client software 373
horcm.conf files 425 hardware configuration 54
increasing size of existing file system 468 IBM Systems Director 325–326
management 422 hardware requirements 326
minimum connectivity requirements 420 on AIX 327
replicated pair creation 432 PowerHA file sets 58, 62
rolling site failure 457, 461 PowerHA software example 59
site re-integration 459, 462 PowerHA SystemMirror 7.1 for AIX Standard Edition
software prerequisites 420 53
HORCM 444 SystemMirror agent 332
instance 426 SystemMirror plug-in 325, 329
horcm.conf files 425 agent installation 330
host groups, assigning LUNs 429 server installation 329
host names troubleshooting 312
FQDN format 75 volume group
network planning 51 consideration 64
hostname command 168 conversion 64
installp command 133
interfaces
I excluding configured 213
-i flag 211 states 14
IBM storage 495 up, point of contact down 20
Fibre Channel adapters 495 Internet Group Management Protocol (IGMP) 14
DS storage units 495 Inter-site Management Policy 452, 460, 462
IBM XIV 496 invalid events, filtering 12
SVC 497 IP address 67, 94
NAS 497 snapshot migration 166
SAS 498 IP network interfaces 13, 118
SCSI 498
Index 527
states 14 M
IPAT via aliasing subnetting requirements 51 -m flag 209
IPAT via replacement configuration 162 management interfaces 13
iptrace utility 220, 222 map view 365
iSCSI adapters 500 migration
AIX 6.1 TL6 152
L CAA cluster creation 154
LDEV hex values 433 clcomdES and clcomd subsystems 157
log files 224 considerations 152
AIX 216 planning 46
clmgr command 130 AIX 6.1 TL6 47
displaying content using clmgr command 132 PowerHA 7.1 151
PowerHA 306 premigration checking 153, 157
troubleshooting 306 process 153
logical subsystem (LSS), Global Mirror session definition protocol 155
380 snapshot 161
logical volume 416, 466 SRC subsystem changes 157
adding 407 stages 153
Logical Volume Manager (LVM) switch from Group Services to CAA 156
administration of Global Mirror replicated resources troubleshooting 308
404 clmigcheck script 308
commands over repository disk 207 cluster still stuck in migration condition 308
lppchk command 64 non-IP networks 308
lsavailpprcport command 377 upgrade to AIX 6.1 TL6 153
lscluster command 18, 82, 478 upgrade to PowerHA 7.1 154
-c flag 82, 209 mirror group 389, 391
cluster information 209 mkcluster command 478
-d flag 214 description 479
description 478 examples 479
examples 478 flags 479
-i flag 14–15, 20, 83, 211, 275 syntax 478
output 16 mksnapshot command 347
-m flag 18, 20, 82, 209, 273 mkss alias 347
-s flag 215 modification functionality 349
syntax 478 monitoring 201
zone 211 /etc/cluster/rhosts file 202
lscluster -m command 18 /etc/hosts file 202
lslpp command 59 /etc/inittab file 206
lsmap -all command 287 /etc/syslogd.conf file 206
lspv command 127 /usr/es/sbin/cluster/utilities/ file tools 234
lssi command 377 /var/adm/ras/syslog.caa log file 229
lssrc -ls clstrmgrES command 163 AIX commands and log files 216
lssrc -ls cthags command 218 CAA commands and log files 224
lssrc -ls grpsvcs command 218 CAA debug information for IBM support 231
lsvg command 10 CAA subsystem group active 208
LUN pairs CAA subsystem guide 208
adding to existing volume group 463 CAA subsystems 202
adding to new volume group 469 cldump utility 233
LUNs clmgr utility 241
assigning to hosts 429 clstat utility 231
LDEV hex values 433 cluster status 205, 208, 218
LVM (Logical Volume Manager) common agent subsystems 205
commands over repository disk 207 disk configuration 203, 207, 216
C-SPOC 422 Group Services 218
Global Mirror replicated resources 404 IBM Systems Director web interface 246
lwiplugin.bat script 330 information collection
lwiplugin.sh script 330 after cluster configuration 206
after cluster is running 216
before configuration 202
lscluster command for cluster information 209

multicast information 205, 207, 217 O
network configuration and routing table 219 object classes
network interfaces configuration 203 aliases 105
ODM classes 236 clmgr 105
PowerHA groups 203 supported 106
repository disk 206 Object Data Manager (ODM) classes 236
repository disk, CAA, solidDB 225 ODM (Object Data Manager) classes 236
routing table 204 odmget command 237
solidDB log files 230 OFFLINE DUE TO TARGET OFFLINE 33
tcpdump, iptrace, mping utilities 220 offline migration 191
tools 231 manually specifying an address 197
Move Resource Group 395, 455 planned target configuration 193
mping utility 220, 224 planning 191
mpio_get_config command 56 PowerHA 6.1 support on AIX 7.1 193
multicast address 51 procedure 195
multicast information 205, 207 process flow 194
netstat command 217 starting configuration 192
multicast IP address
configuration 73
hardware requirements 45 P
not specified 74 pausepprc command 383
multicast traffic monitoring utilities 220 PCI bus adapters 500
multipath driver 50 physical volume ID (PVID) 88
pick list 90
planning
N cluster implementation for high availability 43
NAS (network-attached storage) 497 hardware requirements 44
netstat command 217 migration 46
network configuration 219 network 50
network failure simulation 282–283 PowerHA 7.1 considerations 46
testing environment 282 software requirements 44
network planning 50 storage 48
host name and node name 51 point of contact 18
multicast address 51 down, interface up 20
network interfaces 51 point-of-contact status 82
single adapter networks 51 POWER Blade servers 494
subnetting requirements for IPAT via aliasing 51 Power Systems 492
virtual Ethernet 51 POWER5 systems 492
Network tab 256 POWER6 systems 493
network-attached storage (NAS) 497 POWER7 Systems 494
networks PowerHA 1
addition of 387 available clusters 253
interfaces 51, 118 cluster configuration 65
configuration 203 clmgr command 112
Never Fall Back (NFB) 121 clmgr utility 104
node custom configuration 68, 78
crash with an active resource group 289 resource group dependencies 96
down processing normal with takeover 41 resources and applications configuration 86
event flow when joining a cluster 39 resources configuration 68
failure 41 SMIT 66
status 18 SystemMirror for IBM Systems Director 133
node names typical configuration 67, 69
cluster configuration 70 cluster topology with smitty sysmirror 14
network planning 51 defining Hitachi TrueCopy/HUR managed replicated
NODE_DOWN event 12 resource 451
nodes groups, cluster monitoring 203
adding 386 installation
AIX 6.1 TL6 152 AIX BOS components 59, 62
starting all in a cluster 129 file sets 62
non-DPF database support 139 RSCT components 59
non-IP networks 308
Index 529
SMIT tree 483 replicated disks, volume group and file system creation
supported hardware 491 447
SystemMirror replicated pairs 432
architecture foundation 1 LVM administration 463
management interfaces 13 replicated resources
SystemMirror 7.1 1 adding 451
SystemMirror 7.1 features 23 adding to a resource group 452
PowerHA 6.1 support on AIX 7.1 193 defining to PowerHA 451
PowerHA 7.1 36 LVM administration of 404
considerations 46 reports
file set installation 58 Application Availability and Configuration 358
migration to 151–153 Cluster Configuration Report 366
software installation 59 Cluster Topology Configuration 367
SystemMirror plug-in 21 repository disk 9
volume group consideration 64 changed PVID 322
PPRC cluster 225
failing back pairs to primary site 402 cluster monitoring 206
failing back pairs to secondary site 400 configuration 73
failing over pairs back to primary site 401 hardware requirements 45
file sets 372 heartbeat channel testing 269
path creation 377 LVM command support 207
Prefer Primary Site policy 452, 462 node connection 83
premigration checking 153, 157 previously used for CAA 316
previous version 1 replacement 317
primary node 120–121 snapshot migration 166
problem determination data collection 370 resource configuration, Global Mirror 374
PVID (physical volume ID) 88, 115 prerequisites 375
of repository disk 322 source and target volumes 375
symmetrical configuration 376
resource group
Q adding 392
query action 242 adding from C-SPOC 86
adding Hitachi TrueCopy/HUR replicated resources
R 452
raidscan command 463 adding resources 392
Redbooks Web site, Contact us xiv application list 353
redundant heartbeat testing 260 circular dependencies 33
refresh -s clcomd command 69 clmgr command 120
refresh -s syslogd command 306 configuration 95, 388
relationship configuration 377 crash in node 289
FlashCopy relationship creation 379 creation
Global Copy relationship creation 378 verifying 355
PPRC path creation 377 with SystemMirror plug-in GUI wizard 349
session identifier 379 custom creation 351
sessions for involved LSSs 380 definition 391
source and target volumes 380 dependencies, Start After 32
Reliable Scalable Cluster Technology (RSCT) 2 management 355
AHAFS files 12 CLI 359
architecture changes for v3.1 3 CLI command usage 360
CAA support 3 functionality 357
cluster security services 2 wizard access 355
components 2 moving to another site 395
Group Services (grpsvcs) 2 mutual-takeover dual-node implementation 68
installation 59 OFFLINE DUE TO TARGET OFFLINE 33
PowerHA 5 parent/child dependency 33
prerequisites for 44 predefined creation 353
Remote Monitoring and Control 2 removal 358
resource managers 2 status change 359
Topology Services 2 resource group dependencies
remote site, importing new volume group 416 Start After and Stop After configuration 96

Stop After 32 S
Resource Group tab 255 -s flag 215
Resource Groups menu 255 SAN
resource management 32 hardware requirements 45
adaptive failover 35 zoning 54
dynamic node priority 35 SAN fiber communication
Start After and Stop After resource group dependen- enabling 15
cies 32 unavailable 15
user-defined resource type 34 SAN Volume Controller (SVC) 497
resource managers 2 SAN-based communication
Resource Monitoring and Control (RMC) subsystem 2 channel 54
resource type node connection 83
management 100 Fibre Channel adapters 57
user-defined 100 SAN-based communication (SFWCOM) interface 14
resources state 15
adding to a resource group 392 SAN-based heartbeat channel testing 260, 263
configuration 68, 86 SAS (serial-attached SCSI) 498
RESTRICTED interface state 18 adapters 499
RMC (Resource Monitoring and Control) subsystem 2 SCSI 498
rmcluster command 316, 480 adapters 500
description 480 security keys 313
example 480 serial-attached SCSI (SAS) 498
flags 480 adapters 499
removal of volume group 320 service address 118
syntax 480 defined 94
rolling migration 177 service IP 388
/etc/cluster/rhosts file 183, 186 SFWCOM 14
checking newly migrated cluster 191 sfwcom
migrating the final node 188 interface 213
migrating the first node 179 node connection 83
migrating the second node 185 shared disk, adding to CAA services 173
planning 178 shared storage 55
procedure 178 for repository disk 48
restarting the cluster 183 shared volume group
troubleshooting 191 importing for Smart Assist for DB2 137
rolling site failure 398, 457, 461 Smart Assist for DB2 instance and database creation
root user 247 137
rootvg system event 31 simple XML format of the clmgr command 130
testing 286 single adapter networks 51
rootvg volume group SIRCOL 229
cluster node status and mapping 287 site re-integration 400, 459, 462
PowerHA logs 289 failback of PPRC pairs to primary site 402
testing offline 288 failback of PPRC pairs to secondary site 400
testing the loss of 286 failover of PPRC pairs back to primary site 401
round trip time (rtt) 20, 213 starting the cluster 403
routing table 204, 219 site relationship 460
RSCT (Reliable Scalable Cluster Technology) 2, 59 sites, addition of 386
AHAFS files 12 smadmin group 247
architecture changes for v3.1 3 Smart Assist 91
CAA support 3 new location 29
changes 8 Smart Assist for DB2 135
cluster security services 2 configuration 147
components 2 DB2 installation on both nodes 136
Group Services 2 file set installation 136
PowerHA 5 implementation with SystemMirror cluster 139
prerequisites for 44 instance and database on shared volume group 137
Remote Monitoring and Control subsystem 2 log file 149
resource managers 2 prerequisites 136
Topology Services 2 shared volume group and file systems 137
rtt (round-trip time) 20, 213 starting 141
Index 531
steps before starting 139 socksimple command 263
SystemMirror configuration 139 software prerequisites 372
updating /etc/services file 139 software requirements 44
smcli command 257 solid subsystem 8
smcli lslog command 348 solidDB 225
smcli mkcluster command 341 log file names 231
smcli mkfilecollection command 348 log files 230
smcli mksnapshot command 347 SQL interface 228
smcli synccluster -h -v command 363 status 226
smcli undochanges command 363 source and target volumes
smcli utility 22 disaster recovery 375
smit bffcreate command 62 including in Global Mirror session 380
smit clstop command 163 SPPRC file sets 372
SMIT menu 65 SRC subsystem changes during migration 157
changes 66 Start After resource group dependency 32, 297
configuration 66 configuration 96
custom configuration 68, 78 standard configuration testing 298
locating available options 66 testing 297
resource group dependencies configuration 96 Startup Monitoring, testing application startup 298
resources and applications configuration 86 startup processing 38
resources configuration 68 Stop After resource group dependency 32
typical configuration 67, 69 configuration 96
SMIT panel 25 storage
Cluster Snapshot menu 31 agent 389
Cluster Standard Configuration menu 29 Fibre Channel adapters 495
Configure Persistent Node IP Label/Address menu management 345
31 NAS 497
Custom Cluster Configuration menu 30 resources, adding from C-SPOC 86
SMIT tree 25 SAS 498
smitty clstart 28 SCSI 498
smitty clstop 28 system 389–390
smitty hacmp 26 Storage Framework Communication (sfwcom) 213
smit sysmirror fast path 66 storage planning 48
SMIT tree 25, 483 adapters for the repository disk 49
smitty clstart command 28 multipath driver 50
smitty clstop command 28 shared storage for repository disk 48
smitty cm_apps_resources fast path 95 System Storage Interoperation Center 50
smitty hacmp command 26 Storage tab 256
smitty sysmirror command 26 subnetting requirements for IPAT via aliasing 51
PowerHA cluster topology 14 subsystem services status 366
snapshot supported hardware 45, 491
conversion 168 supported storage, third-party multipathing software
failure to restore 169 49–50
restoration 169 SVC (SAN Volume Controller) 497
snapshot migration 161, 164 symmetrical configuration 376
/etc/cluster/rhosts file 168 synccluster command 363
adding shared disk to CAA services 173 synchronization of cluster configuration 360, 416
AIX 6.1.6 and clmigcheck installation 163 CLI 363
checklist 176 GUI 360
clmigcheck program 167 syslog facility 306
cluster verification 175 system event, rootvg 31
conversion 168 System Mirror 7.1
procedure 163 resource management 32
process overview 162 rootvg system event 31
repository disk and multicast IP addresses 166 System Storage Interoperation Center 50
restoration 169 SystemMirror
snapshot creation 163 agent installation 332
stopping the cluster 163 cluster and Smart Assist for DB2 implementation 139
uninstalling SystemMirror 5.5 168 configuration for Smart Assist for DB2 139
SNMP, clstat and cldump utilities 312 SystemMirror 5.5, uninstalling 168

SystemMirror 7.1 loss of the rootvg volume group 286, 289
CAA disk fencing 37 network failure 283
CLUSTER_OVERRIDE environment variable 36 network failure simulation 282
deprecated features 24 repository disk heartbeat channel 269
event flow differences 38 environment 270
features 23 rootvg system event 286
installation of the Standard Edition 53 rootvg volume group offline 288
new features 24 SAN-based heartbeat channel 260, 263
planning a cluster implementation for high availability Start After resource group dependency 297
43 third-party multipathing software 49–50
SMIT panel 25 timeout value 36
supported hardware 45 top-level menu 67
SystemMirror plug-in 21 last two items 67
agent installation 330 Topology Services (topsvcs) 2
CLI for cluster creation 339 topology view 364
cluster creation and management 333 touch /tmp/syslog.out command 306
cluster management 341 troubleshooting 305
CLI 347 CAA 316
Cluster Management Wizard 342 changed PVID of repository disk 322
functionality 343 cluster after node restarts 317
GUI wizard 341 cluster creation 318
cluster monitoring 364 cluster services not active message 323
activities before starting a cluster 364 previously used repository disk 316
cluster subsystem services status 366 previously used volume group 320
Cluster tab 255 removal of volume group 320
common storage 337 repository disk replacement 317
creation volume group already in use 320
custom resource group 351 installation and configuration 312
predefined resource group 353 /var/log/clcomd/clcomd.log file and security keys
GUI wizard, resource group management 355 313
initial panel 252 clstat and cldump utilities and SNMP 312
installation 325, 329 communication path 314
verification 329 ECM volume group 313
monitoring an active cluster 368 log files 306
resource group CAA 306
creation with GUI wizard 349 clutils file 306
management, CLI 359 PowerHA 306
Resource Groups tab 254 syslog facility 306
server installation 329 migration 308
verifying creation of a resource group 355 clmigcheck script 308
wizard for cluster creation 334 cluster still stuck in migration condition 308
non-IP networks 308
rolling migration 191
T verbose logging level 307
TAIL argument 132 TrueCopy synchronous pairings 433
takeover, node down processing normal 41 TrueCopy/HUR
tcpdump utility 220–221 adding replicated resources 451
test environment 68 adding replicated resources to a resource group 452
testing configuration verification 453
application startup with Startup Monitoring configured defining managed replicated resource to PowerHA
298 451
cluster 259 disaster recovery 419
CPU starvation 292 LVM administration of replicated pairs 463
crash in node with active resource group 289 planning for management 420
dynamic node priority 302 resource configuration 429
failover 393 Two-Node Cluster Configuration Assistant 29
fallover after adding new volume group 417, 476 typical configuration 67, 69
fallover after making LVM changes 469 clcomdES versus clcomd subsystem 70
fallover on a cluster after making LVM changes 411 node names 70
Group Services failure 296 prerequisite 69
Hitachi TrueCopy/HUR 454
Index 533
U
uname -L command 287
undo changes 363
undochanges command 363
undoing local changes of a configuration 370
unestablished pairs 447
Universal Replicator asynchronous pairing 439
user-defined resource type 34, 100
UUID 225
V
-v option, clmgr command 110
verbose logging level 307
verification
cluster configuration 360, 416
configuration
CLI 363
GUI 360
Hitachi TrueCopy/HUR configuration 453
verification of cluster configuration 360
VGDA, removal from disk 320
view action 243
virtual Ethernet, network planning 51
volume disk group, previous 180
volume groups 120
adding a Global Mirror pair 404, 411
adding LUN pairs 463, 469
adding new logical volume 416
already in use 320
configuration 381
consideration for installation 64
conversion during installation 64
creating 412
creation with file systems on replicated disks 447
importing in the remote site 383
importing to remote site 416
previously used 320
removal when rmcluster command does not 320
testing fallover after adding 417
Volume Groups option 86
volume, dynamically expanding 404
W
web interface 246
wildcards 110
Z
zone 211

IBM PowerHA SystemMirror 7.1 for
AIX
(1.0” spine)
0.875”<->1.498”
460 <-> 788 pages
AIX
AIX
Back cover ®
IBM PowerHA
SystemMirror 7.1
for AIX ®
Learn how to plan for, IBM PowerHA SystemMirror 7.1 for AIX is a major product announcement
for IBM in the high availability space for IBM Power Systems Servers. This INTERNATIONAL
install, and configure
release now has a deeper integration between the IBM high availability TECHNICAL
PowerHA with the
solution and IBM AIX. It features integration with the IBM Systems SUPPORT
Cluster Aware AIX Director, SAP Smart Assist and cache support, the IBM System Storage ORGANIZATION
component DS8000 Global Mirror support, and support for Hitachi Storage.
This IBM Redbooks publication contains information about the IBM
See how to migrate to, PowerHA SystemMirror 7.1 release for AIX. This release includes
monitor, test, and fundamental changes, in particular departures from how the product has
troubleshoot been managed in the past, which has necessitated this Redbooks BUILDING TECHNICAL
PowerHA 7.1 publication. INFORMATION BASED ON
PRACTICAL EXPERIENCE
This Redbooks publication highlights the latest features of PowerHA
Explore the IBM SystemMirror 7.1 and explains how to plan for, install, and configure IBM Redbooks are developed
Systems Director PowerHA with the Cluster Aware AIX component. It also introduces you to by the IBM International
plug-in and disaster PowerHA SystemMirror Smart Assist for DB2. This book guides you Technical Support
through migration scenarios and demonstrates how to monitor, test, and Organization. Experts from
recovery troubleshoot PowerHA 7.1. In addition, it shows how to use IBM Systems IBM, Customers and Partners
Director for PowerHA 7.1 and how to install the IBM Systems Director from around the world create
Server and PowerHA SystemMirror plug-in. Plus, it explains how to timely technical information
perform disaster recovery using IBM DS8700 Global Mirror and Hitachi based on realistic scenarios.
TrueCopy and Universal Replicator. Specific recommendations
are provided to help you
This publication targets all technical professionals (consultants, IT implement IT solutions more
architects, support staff, and IT specialists) who are responsible for effectively in your
delivering and implementing high availability solutions for their enterprise. environment.
For more information:

ibm.com/redbooks
SG24-7845-00 ISBN 0738435120

SG 247845

Uploaded by

Document Informationclick to expand document informationIBM redbook I coauthored.

Document Informationclick to expand document information

Copyright:

Available Formats

SG 247845

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SG 247845

Uploaded by

Copyright:

Available Formats

Front cover

See how to migrate to, monitor, test,

Explore the IBM Systems Director

IBM PowerHA SystemMirror 7.1 for AIX

First Edition (March 2011)

© Copyright International Business Machines Corporation 2011. All rights reserved.

Chapter 1. PowerHA SystemMirror architecture foundation. . . . . . . . . . . . . . . . . . . . . . 1

Chapter 2. Features of PowerHA SystemMirror 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

© Copyright IBM Corp. 2011. All rights reserved. iii

Chapter 3. Planning a cluster implementation for high availability . . . . . . . . . . . . . . . 43

Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX . . . . . . . . . . . . . . . . . . . . . . . 53

Chapter 5. Configuring a PowerHA cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

iv IBM PowerHA SystemMirror 7.1 for AIX

Chapter 7. Migrating to PowerHA 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster . . . . . . . . . . . . . 201

Chapter 9. Testing the PowerHA 7.1 cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

Chapter 10. Troubleshooting PowerHA 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

vi IBM PowerHA SystemMirror 7.1 for AIX

Chapter 13. Disaster recovery using DS8700 Global Mirror . . . . . . . . . . . . . . . . . . . . 371

Appendix A. CAA cluster commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477

Appendix B. PowerHA SMIT tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483

Appendix C. PowerHA supported hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491

Appendix D. The clmgr man page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501

Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519

viii IBM PowerHA SystemMirror 7.1 for AIX

© Copyright IBM Corp. 2011. All rights reserved. ix

The following terms are trademarks of other companies:

x IBM PowerHA SystemMirror 7.1 for AIX

The team who wrote this book

Shawn Bodily is a Certified Consulting IT Specialist for Advanced Technical Support

Brandon Boles is a Development Support Specialist for PowerHA/HACMP in Austin, Texas.

© Copyright IBM Corp. 2011. All rights reserved. xi

Fabiano Zimmermann is an AIX/SAN/TSM Subject Matter Expert for Nestlé in Phoenix,

xii IBM PowerHA SystemMirror 7.1 for AIX

Thanks to the following people for their contributions to this project:

Now you can become a published author, too!

xiv IBM PowerHA SystemMirror 7.1 for AIX

Stay connected to IBM Redbooks

Chapter 1. PowerHA SystemMirror

This chapter includes the following topics:

© Copyright IBM Corp. 2011. All rights reserved. 1

1.1.1 Overview of the components for Reliable Scalable Cluster Technology

2 IBM PowerHA SystemMirror 7.1 for AIX

Important: On a given node, use only one RSCT version at a time.

RSCT without CAA RSCT with CAA

Monitoring and Control

Resource Group Resource Group

Figure 1-1 RSCT 3.1

Chapter 1. PowerHA SystemMirror architecture foundation 3

Figure 1-2 HA applications using the RSCT and CAA architecture

4 IBM PowerHA SystemMirror 7.1 for AIX

Figure 1-3 PowerHA using RSCT without CAA

Chapter 1. PowerHA SystemMirror architecture foundation 5

Figure 1-4 RSCT with Cluster Aware AIX (CAA)

Example 1-1 Output of lssrc

6 IBM PowerHA SystemMirror 7.1 for AIX

1.2 Cluster Aware AIX