Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
141 views556 pages

SG 247845

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 556

Front cover

IBM PowerHA
SystemMirror 7.1
for AIX
Learn how to plan for, install, and configure
PowerHA with the Cluster Aware AIX component

See how to migrate to, monitor, test,


and troubleshoot PowerHA 7.1

Explore the IBM Systems Director


plug-in and disaster recovery

Dino Quintero
Shawn Bodily
Brandon Boles
Bernhard Buehler
Rajesh Jeyapaul
SangHee Park
Minh Pham
Matthew Radford
Gus Schlachter
Stefan Velica
Fabiano Zimmermann

ibm.com/redbooks
International Technical Support Organization

IBM PowerHA SystemMirror 7.1 for AIX

March 2011

SG24-7845-00
Note: Before using this information and the product it supports, read the information in “Notices” on
page ix.

First Edition (March 2011)

This edition applies to the IBM PowerHA SystemMirror Version 7.1 and IBM AIX Version 6.1 TL6 and 7.1 as
the target.

© Copyright International Business Machines Corporation 2011. All rights reserved.


Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
Contents

Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
The team who wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

Chapter 1. PowerHA SystemMirror architecture foundation. . . . . . . . . . . . . . . . . . . . . . 1


1.1 Reliable Scalable Cluster Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Overview of the components for Reliable Scalable Cluster Technology. . . . . . . . . 2
1.1.2 Architecture changes for RSCT 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 PowerHA and RSCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Cluster Aware AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 CAA daemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.2 RSCT changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.3 The central repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.4 Cluster event management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Cluster communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.1 Communication interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.2 Communication node status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.3 Considerations for the heartbeat configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3.4 Deciding when a node is down: Round-trip time (rtt) . . . . . . . . . . . . . . . . . . . . . . 20
1.4 PowerHA 7.1 SystemMirror plug-in for IBM Systems Director . . . . . . . . . . . . . . . . . . . 21
1.4.1 Introduction to IBM Systems Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.2 Advantages of using IBM Systems Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.3 Basic architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Chapter 2. Features of PowerHA SystemMirror 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23


2.1 Deprecated features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2 New features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Changes to the SMIT panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.1 SMIT tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.2 The smitty hacmp command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.3 The smitty clstart and smitty clstop commands. . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.4 Cluster Standard Configuration menu. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.5 Custom Cluster Configuration menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.6 Cluster Snapshot menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.7 Configure Persistent Node IP Label/Address menu . . . . . . . . . . . . . . . . . . . . . . . 31
2.4 The rootvg system event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5 Resource management enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.5.1 Start After and Stop After resource group dependencies . . . . . . . . . . . . . . . . . . . 32
2.5.2 User-defined resource type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.5.3 Dynamic node priority: Adaptive failover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.6 CLUSTER_OVERRIDE environment variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.7 CAA disk fencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.8 PowerHA SystemMirror event flow differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.8.1 Startup processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

© Copyright IBM Corp. 2011. All rights reserved. iii


2.8.2 Another node joins the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.8.3 Node down processing normal with takeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Chapter 3. Planning a cluster implementation for high availability . . . . . . . . . . . . . . . 43


3.1 Software requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.1.1 Prerequisite for AIX BOS and RSCT components . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Hardware requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.1 Supported hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.2 Requirements for the multicast IP address, SAN, and repository disk . . . . . . . . . 45
3.3 Considerations before using PowerHA 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Migration planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5.1 Shared storage for the repository disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5.2 Adapters supported for storage communication . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5.3 Multipath driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.5.4 System Storage Interoperation Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.6 Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.6.1 Multicast address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6.2 Network interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6.3 Subnetting requirements for IPAT via aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6.4 Host name and node name. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6.5 Other network considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX . . . . . . . . . . . . . . . . . . . . . . . 53


4.1 Hardware configuration of the test environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.1 SAN zoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.2 Shared storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1.3 Configuring the FC adapters for SAN-based communication . . . . . . . . . . . . . . . . 57
4.2 Installing PowerHA file sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.1 PowerHA software installation example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3 Volume group consideration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Chapter 5. Configuring a PowerHA cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65


5.1 Cluster configuration using SMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1.1 SMIT menu changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1.2 Overview of the test environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.1.3 Typical configuration of a cluster topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.1.4 Custom configuration of the cluster topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.1.5 Configuring resources and applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.1.6 Configuring Start After and Stop After resource group dependencies . . . . . . . . . 96
5.1.7 Creating a user-defined resource type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.1.8 Configuring the dynamic node priority (adaptive failover) . . . . . . . . . . . . . . . . . . 102
5.1.9 Removing a cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2 Cluster configuration using the clmgr tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2.1 The clmgr action commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2.2 The clmgr object classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.2.3 Examples of using the clmgr command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.2.4 Using help in clmgr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.2.5 Configuring a PowerHA cluster using the clmgr command. . . . . . . . . . . . . . . . . 112
5.2.6 Alternative output formats for the clmgr command . . . . . . . . . . . . . . . . . . . . . . . 130
5.2.7 Log file of the clmgr command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.2.8 Displaying the log file content by using the clmgr command . . . . . . . . . . . . . . . 132
5.3 PowerHA SystemMirror for IBM Systems Director . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

iv IBM PowerHA SystemMirror 7.1 for AIX


Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2 . . . . . . . . . . . . . . . . . . 135
6.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.1.1 Installing the required file sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.1.2 Installing DB2 on both nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.1.3 Importing the shared volume group and file systems . . . . . . . . . . . . . . . . . . . . . 137
6.1.4 Creating the DB2 instance and database on the shared volume group . . . . . . . 137
6.1.5 Updating the /etc/services file on the secondary node . . . . . . . . . . . . . . . . . . . . 139
6.1.6 Configuring IBM PowerHA SystemMirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.2 Implementing a PowerHA SystemMirror cluster and Smart Assist for DB2 7.1 . . . . . 139
6.2.1 Preliminary steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.2.2 Starting Smart Assist for DB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.2.3 Completing the configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

Chapter 7. Migrating to PowerHA 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151


7.1 Considerations before migrating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.2 Understanding the PowerHA 7.1 migration process . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.2.1 Stages of migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.2.2 Premigration checking: The clmigcheck program . . . . . . . . . . . . . . . . . . . . . . . . 157
7.3 Snapshot migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.3.1 Overview of the migration process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
7.3.2 Performing a snapshot migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.3.3 Checklist for performing a snapshot migration . . . . . . . . . . . . . . . . . . . . . . . . . . 176
7.3.4 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
7.4 Rolling migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
7.4.1 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
7.4.2 Performing a rolling migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
7.4.3 Checking your newly migrated cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
7.5 Offline migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
7.5.1 Planning the offline migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
7.5.2 Offline migration flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
7.5.3 Performing an offline migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster . . . . . . . . . . . . . 201


8.1 Collecting information before a cluster is configured . . . . . . . . . . . . . . . . . . . . . . . . . 202
8.2 Collecting information after a cluster is configured . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
8.3 Collecting information after a cluster is running . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
8.3.1 AIX commands and log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
8.3.2 CAA commands and log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
8.3.3 PowerHA 7.1 cluster monitoring tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
8.3.4 PowerHA ODM classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
8.3.5 PowerHA clmgr utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
8.3.6 IBM Systems Director web interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
8.3.7 IBM Systems Director CLI (smcli interface) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

Chapter 9. Testing the PowerHA 7.1 cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259


9.1 Testing the SAN-based heartbeat channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
9.2 Testing the repository disk heartbeat channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
9.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
9.2.2 Testing environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
9.3 Simulation of a network failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
9.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
9.3.2 Testing environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
9.3.3 Testing a network failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
9.4 Testing the rootvg system event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

Contents v
9.4.1 The rootvg system event. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
9.4.2 Testing the loss of the rootvg volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
9.4.3 Loss of rootvg: What PowerHA logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
9.5 Simulation of a crash in the node with an active resource group . . . . . . . . . . . . . . . . 289
9.6 Simulations of CPU starvation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
9.7 Simulation of a Group Services failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
9.8 Testing a Start After resource group dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
9.8.1 Testing the standard configuration of a Start After resource group dependency 298
9.8.2 Testing application startup with Startup Monitoring configured. . . . . . . . . . . . . . 298
9.9 Testing dynamic node priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302

Chapter 10. Troubleshooting PowerHA 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305


10.1 Locating the log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
10.1.1 CAA log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
10.1.2 PowerHA log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
10.2 Troubleshooting the migration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
10.2.1 The clmigcheck script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
10.2.2 The ‘Cluster still stuck in migration’ condition . . . . . . . . . . . . . . . . . . . . . . . . . . 308
10.2.3 Existing non-IP networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
10.3 Troubleshooting the installation and configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 312
10.3.1 The clstat and cldump utilities and the SNMP. . . . . . . . . . . . . . . . . . . . . . . . . . 312
10.3.2 The /var/log/clcomd/clcomd.log file and the security keys . . . . . . . . . . . . . . . . 313
10.3.3 The ECM volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
10.3.4 Communication path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
10.4 Troubleshooting problems with CAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
10.4.1 Previously used repository disk for CAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
10.4.2 Repository disk replacement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
10.4.3 CAA cluster after the node restarts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
10.4.4 Creation of the CAA cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
10.4.5 Volume group name already in use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
10.4.6 Changed PVID of the repository disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
10.4.7 The ‘Cluster services are not active’ message . . . . . . . . . . . . . . . . . . . . . . . . . 323

Chapter 11. Installing IBM Systems Director and the PowerHA SystemMirror plug-in .
325
11.1 Installing IBM Systems Director Version 6.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
11.1.1 Hardware requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
11.1.2 Installing IBM Systems Director on AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
11.1.3 Configuring and activating IBM Systems Director. . . . . . . . . . . . . . . . . . . . . . . 328
11.2 Installing the SystemMirror plug-in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
11.2.1 Installing the SystemMirror server plug-in. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
11.2.2 Installing the SystemMirror agent plug-in in the cluster nodes . . . . . . . . . . . . . 330
11.3 Installing the clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
11.3.1 Installing the common agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
11.3.2 Installing the PowerHA SystemMirror agent . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

Chapter 12. Creating and managing a cluster using IBM Systems Director . . . . . . . 333
12.1 Creating a cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
12.1.1 Creating a cluster with the SystemMirror plug-in wizard . . . . . . . . . . . . . . . . . . 334
12.1.2 Creating a cluster with the SystemMirror plug-in CLI . . . . . . . . . . . . . . . . . . . . 339
12.2 Performing cluster management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
12.2.1 Performing cluster management with the SystemMirror plug-in GUI wizard. . . 341
12.2.2 Performing cluster management with the SystemMirror plug-in CLI. . . . . . . . . 347
12.3 Creating a resource group with the SystemMirror plug-in GUI wizard . . . . . . . . . . . 349

vi IBM PowerHA SystemMirror 7.1 for AIX


12.3.1 Creating a custom resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
12.3.2 Creating a predefined resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
12.3.3 Verifying the creation of a resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
12.4 Managing a resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
12.4.1 Resource group management using the SystemMirror plug-in wizard . . . . . . . 355
12.4.2 Managing a resource group with the SystemMirror plug-in CLI . . . . . . . . . . . . 359
12.5 Verifying and synchronizing a configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
12.5.1 Verifying and synchronizing a configuration with the GUI. . . . . . . . . . . . . . . . . 360
12.5.2 Verifying and synchronizing with the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
12.6 Performing cluster monitoring with the SystemMirror plug-in . . . . . . . . . . . . . . . . . . 364
12.6.1 Monitoring cluster activities before starting a cluster . . . . . . . . . . . . . . . . . . . . 364
12.6.2 Monitoring an active cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
12.6.3 Recovering from cluster configuration issues . . . . . . . . . . . . . . . . . . . . . . . . . . 369

Chapter 13. Disaster recovery using DS8700 Global Mirror . . . . . . . . . . . . . . . . . . . . 371


13.1 Planning for Global Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
13.1.1 Software prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
13.1.2 Minimum DS8700 requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
13.1.3 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
13.2 Installing the DSCLI client software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
13.3 Scenario description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
13.4 Configuring the Global Mirror resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
13.4.1 Checking the prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
13.4.2 Identifying the source and target volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
13.4.3 Configuring the Global Mirror relationships. . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
13.5 Configuring AIX volume groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
13.5.1 Configuring volume groups and file systems on primary site . . . . . . . . . . . . . . 381
13.5.2 Importing the volume groups in the remote site . . . . . . . . . . . . . . . . . . . . . . . . 383
13.6 Configuring the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
13.6.1 Configuring the cluster topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
13.6.2 Configuring cluster resources and resource group . . . . . . . . . . . . . . . . . . . . . . 388
13.7 Failover testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
13.7.1 Graceful site failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
13.7.2 Rolling site failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
13.7.3 Site re-integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
13.8 LVM administration of DS8000 Global Mirror replicated resources . . . . . . . . . . . . . 404
13.8.1 Adding a new Global Mirror pair to an existing volume group. . . . . . . . . . . . . . 404
13.8.2 Adding a Global Mirror pair into a new volume group . . . . . . . . . . . . . . . . . . . . 411

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator . . 419
14.1 Planning for TrueCopy/HUR management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
14.1.1 Software prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
14.1.2 Minimum connectivity requirements for TrueCopy/HUR . . . . . . . . . . . . . . . . . . 420
14.1.3 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
14.2 Overview of TrueCopy/HUR management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
14.2.1 Installing the Hitachi CCI software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
14.2.2 Overview of the CCI instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
14.2.3 Creating and editing the horcm.conf files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
14.3 Scenario description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
14.4 Configuring the TrueCopy/HUR resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
14.4.1 Assigning LUNs to the hosts (host groups). . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
14.4.2 Creating replicated pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
14.4.3 Configuring an AIX disk and dev_group association. . . . . . . . . . . . . . . . . . . . . 443

Contents vii
14.4.4 Defining TrueCopy/HUR managed replicated resource to PowerHA . . . . . . . . 451
14.5 Failover testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
14.5.1 Graceful site failover for the Austin site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
14.5.2 Rolling site failure of the Austin site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
14.5.3 Site re-integration for the Austin site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
14.5.4 Graceful site failover for the Miami site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
14.5.5 Rolling site failure of the Miami site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
14.5.6 Site re-integration for the Miami site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
14.6 LVM administration of TrueCopy/HUR replicated pairs. . . . . . . . . . . . . . . . . . . . . . . 463
14.6.1 Adding LUN pairs to an existing volume group . . . . . . . . . . . . . . . . . . . . . . . . . 463
14.6.2 Adding a new logical volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
14.6.3 Increasing the size of an existing file system . . . . . . . . . . . . . . . . . . . . . . . . . . 468
14.6.4 Adding a LUN pair to a new volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469

Appendix A. CAA cluster commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477


The lscluster command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
The mkcluster command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
The rmcluster command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
The chcluster command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
The clusterconf command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481

Appendix B. PowerHA SMIT tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483

Appendix C. PowerHA supported hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491


IBM Power Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
IBM POWER5 systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
IBM POWER6 systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
IBM POWER7 Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494
IBM POWER Blade servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494
IBM storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
Fibre Channel adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
Network-attached storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
Serial-attached SCSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
SCSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
Adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
Fibre Channel adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
Ethernet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
InfiniBand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
SCSI and iSCSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
PCI bus adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500

Appendix D. The clmgr man page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501

Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519


IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520
Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521

viii IBM PowerHA SystemMirror 7.1 for AIX


Notices

This information was developed for products and services offered in the U.S.A.

IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.

The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.

IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.

Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.

This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.

COPYRIGHT LICENSE:

This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs.

© Copyright IBM Corp. 2011. All rights reserved. ix


Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. These and other IBM trademarked terms are
marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US
registered or common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml

The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
AIX® Lotus® Redpaper™
DB2® Power Systems™ Redbooks (logo) ®
Domino® POWER5™ solidDB®
DS4000® POWER6® System i®
DS6000™ POWER7™ System p®
DS8000® POWER7 Systems™ System Storage®
Enterprise Storage Server® PowerHA™ Tivoli®
FileNet® PowerVM™ TotalStorage®
FlashCopy® POWER® WebSphere®
HACMP™ pureScale™ XIV®
IBM® Redbooks®

The following terms are trademarks of other companies:

Snapshot, NetApp, and the NetApp logo are trademarks or registered trademarks of NetApp, Inc. in the U.S.
and other countries.

Java, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other
countries, or both.

Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States,
other countries, or both.

Linux is a trademark of Linus Torvalds in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

x IBM PowerHA SystemMirror 7.1 for AIX


Preface

IBM® PowerHA™ SystemMirror 7.1 for AIX® is a major product announcement for IBM in the
high availability space for IBM Power Systems™ Servers. This release now has a deeper
integration between the IBM high availability solution and IBM AIX. It features integration with
the IBM Systems Director, SAP Smart Assist and cache support, the IBM System Storage®
DS8000® Global Mirror support, and support for Hitachi storage.

This IBM Redbooks® publication contains information about the IBM PowerHA SystemMirror
7.1 release for AIX. This release includes fundamental changes, in particular departures from
how the product has been managed in the past, which has necessitated this Redbooks
publication.

This Redbooks publication highlights the latest features of PowerHA SystemMirror 7.1 and
explains how to plan for, install, and configure PowerHA with the Cluster Aware AIX
component. It also introduces you to PowerHA SystemMirror Smart Assist for DB2®. This
book guides you through migration scenarios and demonstrates how to monitor, test, and
troubleshoot PowerHA 7.1. In addition, it shows how to use IBM Systems Director for
PowerHA 7.1 and how to install the IBM Systems Director Server and PowerHA SystemMirror
plug-in. Plus, it explains how to perform disaster recovery using DS8700 Global Mirror and
Hitachi TrueCopy and Universal Replicator.

This publication targets all technical professionals (consultants, IT architects, support staff,
and IT specialists) who are responsible for delivering and implementing high availability
solutions for their enterprise.

The team who wrote this book


This book was produced by a team of specialists from around the world working at the
International Technical Support Organization (ITSO), Poughkeepsie Center.

Dino Quintero is a Project Leader and IT generalist with the ITSO in Poughkeepsie, NY. His
areas of expertise include enterprise continuous availability planning and implementation,
enterprise systems management, virtualization, and clustering solutions. He is currently an
Open Group Master Certified IT Specialist - Server Systems. Dino holds a Master of
Computing Information Systems degree and a Bachelor of Science degree in Computer
Science from Marist College.

Shawn Bodily is a Certified Consulting IT Specialist for Advanced Technical Support


Americas in Dallas, Texas. He has worked for IBM for 12 years and has 14 years of AIX
experience, with 12 years specializing in High-Availability Cluster Multi-Processing
(HACMP™). He is certified in both versions 4 and 5 of HACMP and ATE. He has written and
presented on high availability and storage. Shawn has coauthored five other Redbooks
publications.

Brandon Boles is a Development Support Specialist for PowerHA/HACMP in Austin, Texas.


He has been with IBM for four years and has been doing support, programming, and
consulting with PowerHA and HACMP for 11 years. Brandon has been working with AIX since
version 3.2.5.

© Copyright IBM Corp. 2011. All rights reserved. xi


Bernhard Buehler is an IT Specialist for IBM in Germany. He is currently working for IBM
STG Lab Services in La Gaude, France. He has worked at IBM for 29 years and has 20 years
of experience in the AIX and availability field. His areas of expertise include AIX, PowerHA,
High Availability architecture, script programming, and AIX security. Bernhard has coauthored
several Redbooks publications and several courses in the IBM AIX curriculum.

Rajesh Jeyapaul is the technical lead for IBM Systems Director Power Server management.
His focus is on improving PowerHA SystemMirror, DB2 pureScale™, and the AIX Runtime
Expert plug-in for System Director. He has worked extensively with customers and
specialized in performance analysis under the IBM System p® and AIX environment. His
areas of expertise includes IBM POWER® Virtualization, high availability, and system
management. He has coauthored DS8000 Performance Monitoring and Tuning, SG24-7146,
and Best Practices for DB2 on AIX 6.1 for POWER Systems, SG24-7821. Rajesh holds a
Master in Software Systems degree from the University of BITS, India, and a Master of
Business Administration (MBA) degree from the University of MKU, India.

SangHee Park is a Certified IT Specialist in IBM Korea. He is currently working for IBM
Global Technology Services in Maintenance and Technical Support. He has 5 years of
experience in Power Systems. His areas of expertise include AIX, PowerHA SystemMirror,
and PowerVM™ Virtualization. SangHee holds a bachelor degree in aerospace and
mechanical engineering from Korea Aerospace University.

Minh Pham is currently a Development Support Specialist for PowerHA and HACMP in
Austin, Texas. She has worked for IBM for 10 years, including 6 years in System p
microprocessor development and 4 years in AIX development support. Her areas of expertise
include core and chip logic design for System p and AIX with PowerHA. Minh holds a
Bachelor of Science degree in Electrical Engineering from the University of Texas at Austin.

Matthew Radford is a Certified AIX Support Specialist in IBM UK. He is currently working for
IBM Global Technology Services in Maintenance and Technical Support. He has worked at
IBM for 13 years and is a member of the UKI Technical Council. His areas of expertise include
AIX, and PowerHA. Matthew coauthored Personal Communications Version 4.3 for Windows
95, 98 and NT, SG24-4689. Matthew holds a Bachelor of Science degree in Information
Technology from the University of Glamorgan.

Gus Schlachter is a Development Support Specialist for PowerHA in Austin, TX. He has
worked with HACMP for over 15 years in support, development, and testing. Gus formerly
worked for CLAM/Availant and is an IBM-certified Instructor for HACMP.

Stefan Velica is an IT Specialist who is currently working for IBM Global Technologies
Services in Romania. He has five years of experience in Power Systems. He is a Certified
Specialist for IBM System p Administration, HACMP for AIX, High-end and Entry/Midrange
DS Series, and Storage Networking Solutions. His areas of expertise include IBM System
Storage, PowerVM, AIX, and PowerHA. Stefan holds a bachelor degree in electronics and
telecommunications engineering from Politechnical Institute of Bucharest.

Fabiano Zimmermann is an AIX/SAN/TSM Subject Matter Expert for Nestlé in Phoenix,


Arizona. He has been working with AIX, High Availability and System Storage since 2000. A
former IBM employee, Fabiano has experience and expertise in the areas of Linux®, DB2,
and Oracle. Fabiano is a member of the L3 team that provides worldwide support for the
major Nestlé data centers. Fabiano holds a degree in computer science from Brazil.

xii IBM PowerHA SystemMirror 7.1 for AIX


Front row from left to right: Minh Pham, SangHee Park, Stefan Velica, Brandon Boles, and Fabiano
Zimmermann; back row from left to right: Gus Schlachter, Dino Quintero (project leader), Bernhard
Buehler, Shawn Bodily, Matt Radford, and Rajesh Jeyapaul

Thanks to the following people for their contributions to this project:

Bob Allison
Catherine Anderson
Chuck Coleman
Bill Martin
Darin Meyer
Keith O'Toole
Ashutosh Rai
Hitachi Data Systems

David Bennin
Ella Buslovich
Richard Conway
Octavian Lascu
ITSO, Poughkeepsie Center

Patrick Buah
Michael Coffey
Mark Gurevich
Felipe Knop
Paul Moyer
Skip Russell
Stephen Tovcimak
IBM Poughkeepsie

Eric Fried
Frank Garcia
Kam Lee
Gary Lowther
Deb McLemore
Ravi A. Shankar

Preface xiii
Stephen Tee
Tom Weaver
David Zysk
IBM Austin

Nick Fernholz
Steven Finnes
Susan Jasinski
Robert G. Kovacs
William E. (Bill) Miller
Rohit Krishna Prasad
Ted Sullivan
IBM USA

Philippe Hermes
IBM France

Manohar R Bodke
Jes Kiran
Anantoju Srinivas
IBM India

Claudio Marcantoni
IBM Italy

Now you can become a published author, too!


Here's an opportunity to spotlight your skills, grow your career, and become a published
author—all at the same time! Join an ITSO residency project and help write a book in your
area of expertise, while honing your experience using leading-edge technologies. Your efforts
will help to increase product acceptance and customer satisfaction, as you expand your
network of technical contacts and relationships. Residencies run from two to six weeks in
length, and you can participate either in person or as a remote resident working from your
home base.

Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html

Comments welcome
Your comments are important to us!

We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
 Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
 Send your comments in an email to:
redbooks@us.ibm.com

xiv IBM PowerHA SystemMirror 7.1 for AIX


 Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400

Stay connected to IBM Redbooks


 Find us on Facebook:
http://www.facebook.com/IBMRedbooks
 Follow us on Twitter:
http://twitter.com/ibmredbooks
 Look for us on LinkedIn:
http://www.linkedin.com/groups?home=&gid=2130806
 Explore new Redbooks publications, residencies, and workshops with the IBM Redbooks
weekly newsletter:
https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm
 Stay current on recent Redbooks publications with RSS Feeds:
http://www.redbooks.ibm.com/rss.html

Preface xv
xvi IBM PowerHA SystemMirror 7.1 for AIX
1

Chapter 1. PowerHA SystemMirror


architecture foundation
This chapter provides information about the new architecture of the IBM PowerHA
SystemMirror 7.1 for AIX. It includes the differences from previous versions.

This chapter includes the following topics:


 Reliable Scalable Cluster Technology
 Cluster Aware AIX
 Cluster communication
 PowerHA 7.1 SystemMirror plug-in for IBM Systems Director

For an introduction to high availability and IBM PowerHA SystemMirror 7.1, see the “IBM
PowerHA SystemMirror for AIX” page at:
http://www.ibm.com/systems/power/software/availability/aix/index.html

© Copyright IBM Corp. 2011. All rights reserved. 1


1.1 Reliable Scalable Cluster Technology
Reliable Scalable Cluster Technology (RSCT) is a set of software components that together
provide a comprehensive clustering environment for AIX, Linux, Solaris, and Microsoft®
Windows®. RSCT is the infrastructure used by various IBM products to provide clusters with
improved system availability, scalability, and ease of use.

This section provides an overview of RSCT, its components, and the communication paths
between these components. Several helpful IBM manuals, white papers, and Redbooks
publications are available about RSCT. This section focuses on the components that affect
PowerHA SystemMirror.

To find the most current documentation for RSCT, see the RSCT library in the IBM Cluster
Information Center at:
http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.
cluster.rsct.doc%2Frsctbooks.html

1.1.1 Overview of the components for Reliable Scalable Cluster Technology


RSCT has the following main components:
 Topology Services
This component provides node and network failure detection.
 Group Services
This component provides cross-node or process coordination on some cluster
configurations. For a detailed description about how Group Services work, see IBM
Reliable Scalable Cluster Technology: Group Services Programming Guide, SA22-7888,
at:
http://publibfp.boulder.ibm.com/epubs/pdf/a2278889.pdf
 RSCT cluster security services
This component provides the security infrastructure that enables RSCT components to
authenticate the identity of other parties.
 Resource Monitoring and Control (RMC) subsystem
This subsystem is the scalable, reliable backbone of RSCT. It runs on a single machine or
on each node (operating system image) of a cluster. Also, it provides a common
abstraction for the resources of the individual system or the cluster of nodes. You can use
RMC for single system monitoring or for monitoring nodes in a cluster. However, in a
cluster, RMC provides global access to subsystems and resources throughout the cluster.
Therefore, it provides a single monitoring and management infrastructure for clusters.
 Resource managers
A resource manager is a software layer between a resource (a hardware or software entity
that provides services to some other component) and RMC. A resource manager maps
programmatic abstractions in RMC into the actual calls and commands of a resource.

For a more detailed description of the RSCT components, see the IBM Reliable Scalable
Cluster Technology: Administration Guide, SA22-7889, at the following web address:
http://publibfp.boulder.ibm.com/epubs/pdf/22788919.pdf

2 IBM PowerHA SystemMirror 7.1 for AIX


1.1.2 Architecture changes for RSCT 3.1
RSCT version 3.1 is the first version that supports Cluster Aware AIX (CAA). Although this
section provides a high-level introduction to the RSCT architecture changes to support, you
can find more details about CAA in 1.2, “Cluster Aware AIX” on page 7.

As shown in Figure 1-1 on page 3, RSCT 3.1 can operate without CAA in “non-CAA” mode.
You use the non-CAA mode if you use one of the following products:
 PowerHA versions before PowerHA 7.1
 A mixed cluster with PowerHA 7.1 and older PowerHA versions
 Existing RSCT Peer Domains (RPD) that were created before RSCT 3.1 was installed
 A new RPD, when you specify during creation that the system must not use or create a
CAA cluster

Figure 1-1 shows both modes in which RSCT 3.1 can be used (with or without CAA). The left
part shows the non-CAA mode, which is equal to the older RSCT versions. The right part
shows the CAA-based mode. The difference between these modes is that Topology Services
has been replaced with CAA.

Important: On a given node, use only one RSCT version at a time.

RSCT without CAA RSCT with CAA


RSCT RSCT

Monitoring and Control


Monitoring and Control

Resource
Resource

Resource Group Resource Group


Manager Services Manager Services
(grpsvcs) (cthags)

Topology
Services

CAA
AIX AIX

Figure 1-1 RSCT 3.1

RSCT 3.1 is available for both AIX 6.1 and AIX 7.1. To use CAA, for RSCT 3.1 on AIX 6.1, you
must have TL 6 or later installed.

CAA on AIX 6.1 TL 6: The use of CAA on AIX 6.1 TL 6 is enabled only for PowerHA 7.1.

Chapter 1. PowerHA SystemMirror architecture foundation 3


Figure 1-2 shows a high-level architectural view of how IBM high availability (HA) applications
PowerHA, IBM Tivoli® System Automation for Multiplatforms, and Virtual I/O Server (VIOS)
Clustered Storage use the RSCT and CAA architecture.

Figure 1-2 HA applications using the RSCT and CAA architecture

4 IBM PowerHA SystemMirror 7.1 for AIX


1.1.3 PowerHA and RSCT
Figure 1-3 shows the non-CAA communication paths between PowerHA and RSCT.
Non-CAA mode is still used when you have a PowerHA version 6.1 or earlier, even if you are
using AIX 7.1.

The main communication goes from PowerHA to Group Services (grpsvcs), then to Topology
Services (topsvcs), and back to PowerHA. The communication path from PowerHA to RMC is
used for PowerHA Process Application Monitors. Another case where PowerHA uses RMC is
when a resource group is configured with the Dynamic Node Priority policy.

Figure 1-3 PowerHA using RSCT without CAA

Chapter 1. PowerHA SystemMirror architecture foundation 5


Figure 1-4 shows the new CAA-based communication paths of PowerHA, RSCT, and CAA.
You use this architecture when you have PowerHA v7.1 or later. It is the same architecture for
AIX 6.1 TL 6 and AIX 7.1 or later. As in the previous architecture, the main communication
goes from PowerHA to Group Services. However, in Figure 1-4, Group Services
communicates with CAA.

Figure 1-4 RSCT with Cluster Aware AIX (CAA)

Example 1-1 lists the cluster processes on a running PowerHA 7.1 cluster.

Group Services subsystem name: Group Services now uses the subsystem name
cthags, which replaces grpsvcs. Group Services is now started with a different control
script (cthags) and in turn from a different subsystem name cthags.

Example 1-1 Output of lssrc


# lssrc -a | egrep "rsct|ha|svcs|caa|cluster" | grep -v _rm
cld caa 4980920 active
clcomd caa 4915400 active
clconfd caa 5243070 active
cthags cthags 4456672 active
ctrmc rsct 5767356 active
clstrmgrES cluster 10813688 active
solidhac caa 10420288 active
solid caa 5832836 active
clevmgrdES cluster 5177370 active
clinfoES cluster 11337972 active
ctcas rsct inoperative
topsvcs topsvcs inoperative

6 IBM PowerHA SystemMirror 7.1 for AIX


grpsvcs grpsvcs inoperative
grpglsm grpsvcs inoperative
emsvcs emsvcs inoperative
emaixos emsvcs inoperative

1.2 Cluster Aware AIX


Cluster Aware AIX introduces fundamental clustering capabilities into the base operating
system AIX. Such capabilities include the creation and definition of the set of nodes that
comprise the cluster. CAA provides the tools and monitoring capabilities for the detection of
node and interface health.

File sets: CAA is provided by the non-PowerHA file sets bos.cluster.rte, bos.ahafs, and
bos.cluster.solid. The file sets are on the AIX Install Media or in the TL6 of AIX 6.1.

More information: For more information about CAA, see Cluster Management,
SC23-6779, and the IBM AIX Version 7.1 Differences Guide, SG24-7910.

CAA provides a set of tools and APIs to enable clustering on the AIX operating system. CAA
does not provide the application monitoring and resource failover capabilities that PowerHA
provides. PowerHA uses the CAA capabilities. Other applications and software programs can
use the APIs and command-line interfaces (CLIs) that CAA provides to make their
applications and services “Cluster Aware” on the AIX operating system.

Figure 1-2 on page 4 illustrates how applications can use CAA. The following products and
parties can use CAA technology:
 RSCT (3.1 and later)
 PowerHA (7.1 and later)
 VIOS (CAA support in a future release)
 Third-party ISVs, service providers, and software products

CAA provides the following features among others:


 Central repository
– Configuration
– Security
 Quorumless (CAA does not require a quorum to be up and operational.)
 Monitoring capabilities for custom actions
 Fencing aids
– Network
– Storage
– Applications

The following sections explain the concepts of the CAA central repository, RSCT changes,
and how PowerHA 7.1 uses CAA.

Chapter 1. PowerHA SystemMirror architecture foundation 7


1.2.1 CAA daemons
When CAA is active in your cluster, you notice the daemon services running as shown in
Figure 1-5.

chile:/ # lssrc -g caa


Subsystem Group PID Status
clcomd caa 4849670 active
cld caa 7012500 active
solid caa 11010276 active
clconfd caa 7340038 active
solidhac caa 10027064 active
Figure 1-5 CAA services

CAA includes the following services:


clcomd This daemon is the cluster communications daemon, which has changed in
PowerHA 7.1. In previous versions of PowerHA, it was called clcomdES. The
location of the rhosts file that PowerHA uses has also changed. The rhosts file
used by the clcomd service is in the /etc/cluster/rhosts directory. The old
clcomdES rhosts file in the /usr/es/sbin/cluster/etc directory is not used.
cld The cld daemon runs on each node and determines whether the local node
must be the primary or the secondary solidDB® database server.
solid The solid subsystem provides the database engine, and solidHAC is used for
high availability of the IBM solidDB database. Both run on the primary and the
secondary database servers.

In a two-node cluster, the primary database is mounted on node 1


(/clrepos_private1), and the secondary database is mounted on node 2
(/clrepos_private2). These nodes have the solid and solidHAC subsystems
running.

In a three-node cluster configuration, the third node acts as a standby for the
other two nodes. The solid subsystem (solid and solidHAC) is not running,
and the file systems (/clrepos_private1 and /clrepos_private2) are not
mounted.

If a failure occurs on the primary or secondary nodes of the cluster, the third
node activates the solid subsystem. It mounts either the primary or secondary
file system, depending on the node that has failed. See 1.2.3, “The central
repository” on page 9, for information about file systems.
clconfd The clconfd subsystem runs on each node of the cluster. The clconfd daemon
wakes up every 10 minutes to synchronize any necessary cluster changes.

1.2.2 RSCT changes


IBM PowerHA now uses CAA, instead of RSCT, to handle the cluster topology, including
heartbeating, configuration information, and live notification events. PowerHA still
communicates with RSCT Group Services (grpsvcs replaced by cthags), but PowerHA has
replaced the topsvcs function with the new CAA function. CAA reports the status of the

8 IBM PowerHA SystemMirror 7.1 for AIX


topology to cthags, by using Autonomic Health Advisory File System API (AHAFS) events,
which are fed up to cthagsrhosts.

For information about the RSCT changes, see 1.1.2, “Architecture changes for RSCT 3.1” on
page 3.

1.2.3 The central repository


A major part of CAA is the central repository. The central repository is stored on a dedicated
storage area network (SAN) disk that is shared between all participating nodes. This
repository contains the following structures:
 Bootstrap repository (BSR)
 LV1, LV2, LV3 (private LVs)
 solidDB (primary location (/clrepos_private1) and secondary location
(/clrepos_private2))

CAA repository disk: The CAA repository disk is reserved for use by CAA only. Do not
attempt to change any of it. The information in this chapter is provided for information only
to help you understand the purpose of the new disk and file system structure.

Figure 1-6 shows an overview of the CAA repository disk and its structure.

Figure 1-6 Cluster repository disk structure

If you installed and configured PowerHA 7.1, your cluster repository disk is displayed as
varied on (active) in lspv output as shown in Figure 1-7 on page 10. In this figure, the disk
label has changed to caa_private0 to remind you that this disk is for private use by CAA only.

Figure 1-7 on page 10 also shows a volume group, called caavg_private, which must always
be varied on (active) when CAA is running. CAA is activated when PowerHA 7.1 is installed

Chapter 1. PowerHA SystemMirror architecture foundation 9


and configured. If you are performing a migration or have an earlier level of PowerHA
installed, CAA is not active.

If you have a configured cluster and find that caavg_private is not varied on (active), your
CAA cluster has a potential problem. See Chapter 10, “Troubleshooting PowerHA 7.1” on
page 305, for guidance about recovery in this situation.

chile:/ # lspv
hdisk1 000fe4114cf8d1ce None
caa_private0 000fe40163c54011 caavg_private active
hdisk3 000fe4114cf8d2ec None
hdisk4 000fe4114cf8d3a1 diskhb
hdisk5 000fe4114cf8d441 None
hdisk6 000fe4114cf8d4d5 None
hdisk7 000fe4114cf8d579 None
hdisk8 000fe4114cf8d608 ny_datavg
hdisk0 000fe40140a5516a rootvg active
Figure 1-7 lspv command showing the caa_private repository disk

You can view the structure of caavg_private from the standpoint of a Logical Volume
Manager (LVM) as shown in Figure 1-8. The lsvg command shows the structure of the file
system.

chile:/ # lsvg -l caavg_private


caavg_private:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
caalv_private1 boot 1 1 1 closed/syncd N/A
caalv_private2 boot 1 1 1 closed/syncd N/A
caalv_private3 boot 4 4 1 open/syncd N/A
fslv00 jfs2 4 4 1 open/syncd
/clrepos_private1
fslv01 jfs2 4 4 1 closed/syncd
/clrepos_private2
powerha_crlv boot 1 1 1 closed/syncd N/A
Figure 1-8 The lsvg output of CAA

This file system has a special reserved structure. CAA mounts some file systems for its own
use as shown in Figure 1-9 on page 11. The fslv00 file system contains the solidDB
database mounted as /clrepos_private1 because the node is the primary node of the
cluster. If you look at the output for the second node, you might have /clrepos_private2
mounted instead of /clrepos_private1. See 1.2.1, “CAA daemons” on page 8, for an
explanation of the solid subsystem.

Important: CAA creates a file system for solidDB on the default lv name (fslv00, fslv01).
If you have a default name of lv for existing file systems that is outside of CAA, ensure that
both nodes have the same lv names. For example, if node A has the names fslv00,
fslv01, and fslv02, node B must have the same names. You must not have any default lv
names in your cluster nodes so that CAA can use fslv00, fslv01 for the solidDB.

Also a /aha, which is a special pseudo file system, is mounted in memory and used by the
AHAFS. See “Autonomic Health Advisor File System” on page 11 for more information.

10 IBM PowerHA SystemMirror 7.1 for AIX


Important: Do not interfere with this volume group and its file systems. For example,
forcing a umount of /aha on a working cluster causes the node to halt.

For more information about CAA, see Cluster Management, SC23-6779, at the following web
address:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.clusteraware/c
lusteraware_pdf.pdf

1.2.4 Cluster event management


With PowerHA 7.1, event management is handled by using a new pseudo file-system
architecture called the Autonomic Health Advisor File System. With this pseudo file system,
application programming interfaces (APIs) can program the monitoring of events by reading
and writing events to the file system.

Autonomic Health Advisor File System


The AHAFS is part of the AIX event infrastructure for AIX and AIX clusters and is what CAA
uses as its monitoring framework. The AHAFS file system is automatically mounted when you
create the cluster (Figure 1-9).

chile:/ # mount
node mounted mounted over vfs date options
-------- --------------- --------------- ------ ------------ ---------------
/dev/hd4 / jfs2 Sep 30 13:37 rw,log=/dev/hd8
/dev/hd2 /usr jfs2 Sep 30 13:37 rw,log=/dev/hd8
/dev/hd9var /var jfs2 Sep 30 13:37 rw,log=/dev/hd8
/dev/hd3 /tmp jfs2 Sep 30 13:37 rw,log=/dev/hd8
/dev/hd1 /home jfs2 Sep 30 13:38 rw,log=/dev/hd8
/dev/hd11admin /admin jfs2 Sep 30 13:38 rw,log=/dev/hd8
/proc /proc procfs Sep 30 13:38 rw
/dev/hd10opt /opt jfs2 Sep 30 13:38 rw,log=/dev/hd8
/dev/livedump /var/adm/ras/livedump jfs2 Sep 30 13:38
rw,log=/dev/hd8
/aha /aha ahafs Sep 30 13:46 rw
/dev/fslv00 /clrepos_private1 jfs2 Sep 30 13:52
rw,dio,log=INLINE
Figure 1-9 AHAFS file system mounted

Event handling entails the following process:


1. Create a monitor file based on the /aha directory.
2. Write the required information to the monitor file to represent the wait type (either a
select() call or a blocking read() call). Indicate when to trigger the event, such as a state
change of node down.
3. Wait in a select() call or a blocking read() call.
4. Read from the monitor file to obtain the event data. The event data is then fed to Group
Services.

The event information is retrieved from CAA, and any changes are communicated by using
AHAFS events. RSCT Group Services uses the AHAFS services to obtain events on the

Chapter 1. PowerHA SystemMirror architecture foundation 11


cluster. This information is provided by cluster query APIs and is fed to Group Services.
Figure 1-10 shows a list of event monitor directories.

drwxrwxrwt 1 root system 0 Oct 1 17:04 linkedCl.monFactory


drwxrwxrwt 1 root system 1 Oct 1 17:04 networkAdapterState.monFactory
drwxrwxrwt 1 root system 1 Oct 1 17:04 nodeAddress.monFactory
drwxrwxrwt 1 root system 0 Oct 1 17:04 nodeContact.monFactory
drwxrwxrwt 1 root system 1 Oct 1 17:04 nodeList.monFactory
drwxrwxrwt 1 root system 1 Oct 1 17:04 nodeState.monFactory
chile:/aha/cluster #

Figure 1-10 Directory listing of /aha/cluster

The AHAFS files used in RSCT


The following AHAFS event files are used in RSCT:
 Node state, such as NODE_UP or NODE_DOWN
/aha/cluster/nodeState.monFactory/nodeStateEvent.mon
 Node configuration, such as node added or deleted
/aha/cluster/nodeList.monFactory/nodeListEvent.mon
 Adapter state, such as ADAPTER_UP or ADAPTER_DOWN and interfaces added or deleted
/aha/cluster/networkAdapterState.monFactory/networkAdapterStateEvent.mon
 Adapter configuration
/aha/cluster/nodeAddress.monFactory/nodeAddressEvent.mon
 Process exit (Group Services daemon), such as PROCESS_DOWN
/aha/cpu/processMon.monFactory/usr/sbin/rsct/bin/hagsd.mon

Example of a NODE_DOWN event


A NODE_DOWN event is written to the nodeStateEvent.mon file in the nodeState.monFactory
directory. A NODE_DOWN event from the nodeStateEvent.mon file is interpreted as “a given node
has failed.” In this situation, the High Availability Topology Services (HATS) API generates an
Hb_Death event on the node group.

Example of a network ADAPTER_DOWN event


If a network adapter failure occurs, an ADAPTER_DOWN event is generated in the
networkAdapterStateEvent.mon file. This event is interpreted as “a given network interface
has failed.” In this situation, the HATS API generates an Hb_Death event on the adapter group.

Example of Group Services daemon failure


When you get a PROCESS_DOWN event because of a failure in Group Services, the event is
generated in the hagsd.mon file. This event is treated as a NODE_DOWN event, which is similar to
pre-CAA behavior. No PROCESS_UP event exists because, when the new Group Services
daemon is started, it broadcasts a message to peer daemons.

Filtering duplicated or invalid events


AHAFS handles duplicate or invalid events. For example, if a NODE_DOWN event is generated for
a node that is already marked as down, the event is ignored. The same applies for “up” events
and adapter events. Node events for local nodes are also ignored.

12 IBM PowerHA SystemMirror 7.1 for AIX


1.3 Cluster communication
Cluster Aware AIX indicates which nodes are in the cluster and provides information about
these nodes including their state. A special “gossip” protocol is used over the multicast
address to determine node information and implement scalable reliable multicast. No
traditional heartbeat mechanism is employed. Gossip packets travel over all interfaces. The
communication interfaces can be traditional networking interfaces (such as an Ethernet) and
storage fabrics (SANs with Fibre Channel, SAS, and so on). The cluster repository disk can
also be used as a communication device.

Gossip protocol: The gossip protocol determines the node configuration and then
transmits the gossip packets over all available networking and storage communication
interfaces. If no storage communication interfaces are configured, only the traditional
networking interfaces are used. For more information, see “Cluster Aware concepts” at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.clusterawar
e/claware_concepts.htm

1.3.1 Communication interfaces


The highly available cluster has several communication mechanisms. This section explains
the following interface concepts:
 IP network interfaces
 SAN-based communication (SFWCOM) interface
 Central cluster repository-based communication (DPCOM) interface
 Output of the lscluster -i command
 The RESTRICTED and AIX_CONTROLLED interface state
 Point of contact

IP network interfaces
IBM PowerHA communicates over available IP interfaces using a multicast address. PowerHA
use all IP interfaces that are configured with an address and are in an UP state as long as
they are reachable across the cluster.

PowerHA SystemMirror management interfaces: PowerHA SystemMirror and Cluster


Aware for AIX use all network interfaces that are available for cluster communication. All of
these interfaces are discovered by default and are used for health management and other
cluster communication. You can use the PowerHA SystemMirror management interfaces to
remove any interface that you do not want to be used for application availability. For
additional information, see “Cluster communication” topic in the AIX 7.1 Information Center
at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.
clusteraware/claware_comm_benifits.htm

Cluster communication requires the use of a multicast IP address. You can specify this
address when you create the cluster, or you can have one generated automatically when you
synchronize the initial cluster configuration.

Chapter 1. PowerHA SystemMirror architecture foundation 13


Cluster topology configuration on sydney node: The following PowerHA cluster
topology is configured from by using smitty sysmirror on the sydney node:
NODE perth:
Network ether01
perthb2 192.168.201.136
perth 192.168.101.136
NODE sydney:
Network ether01
sydneyb2 192.168.201.135
sydney 192.168.101.135

A default multicast address of 228.168.101.135 is generated for the cluster. PowerHA


takes the IP address of the node and changes its most significant part to 228 as shown in
the following example:
x.y.z.t -> 228.y.z.t

An overlap of the multicast addresses might be generated by default in the case of two
clusters with interfaces in the same virtual LAN (VLAN). This occurs when their IP
addresses are similar to the following example:
x1.y.z.t
x2.y.z.t

The netmon.cf configuration file is not required with CAA and PowerHA 7.1.

The range 224.0.0.0–224.0.0.255 is reserved for local purposes, such as administrative and
maintenance tasks. The data that they receive is never forwarded by multicast routers.
Similarly, the range 239.0.0.0–239.255.255.255 is reserved for administrative scooping.
These special multicast groups are regularly published in the assigned numbers RFC.1

If multicast traffic is present in the adjacent network, you must ask the network administrator
for multicast IP address allocation for your cluster. Also, ensure that the multicast traffic
generated by any of the cluster nodes is properly forwarded by the network infrastructure
toward the other cluster nodes. The Internet Group Management Protocol (IGMP) must be
enabled.

Interface states
Network interfaces can have any of the following common states. You can see the interface
state in the output of the lscluster -i command, as shown in Example 1-2 on page 16.
UP The interface is up and active.
STALE The interface configuration data is stale, which happens when
communication has been lost, but was previously up at some point.
DOWN SOURCE HARDWARE RECEIVE / SOURCE HARDWARE TRANSMIT
The interface is down because of a failure to receive or transmit, which
can happen in the event of a cabling problem.
DOWN SOURCE SOFTWARE
The interface is down in AIX software only.

SAN-based communication (SFWCOM) interface


Redundant high-speed communication channels can be established between the hosts
through the SAN fabric. To use this communication path, you must complete additional setup
1 http://tools.ietf.org/html/rfc3171

14 IBM PowerHA SystemMirror 7.1 for AIX


for the Fibre Channel (FC) adapters. Configure the server FC ports in the same zone of the
SAN fabric, and set their Target Mode Enable (tme) attribute to yes. Then enable the dynamic
tracking and fast fail. The SAS adapters do not require special setup. Based on this setup, the
CAA Storage Framework provides a SAN-based heartbeat. This heartbeat is an effective
replacement for all the non-IP heartbeat mechanisms used in earlier releases.

Enabling SAN fiber communication: To enable SAN fiber communication for cluster
communication, you must configure the Target Mode Enable attribute for FC adapters. See
Example 4-4 on page 57 for details.

Configure your cluster in an environment that supports SAN fabric-based communication.


This approach provides another channel of redundancy to help reduce the risk of getting a
partitioned (split) cluster.

The Virtual SCSI (VSCSI) SAN heartbeat depends on VIOS 2.2.0.11-FP24 SP01.

Interface state
The SAN-based communication (SFWCOM) interface has one state available, the UP state.
The UP state indicates that the SFWCOM interface is active. You can see the interface state
in the output of the lscluster -i command as shown in Example 1-2 on page 16.

Unavailable SAN fiber communication: When SAN fiber communication is unavailable,


the SFWCOM section is not listed in the output of the lscluster -i command. A DOWN
state is not shown.

Central cluster repository-based communication (DPCOM) interface


Heartbeating and other cluster messaging are also achieved through the central repository
disk. The repository disk is used as another redundant path of communication between the
nodes. A portion of the repository disk is reserved for node-to-node heartbeat and message
communication. This form of communication is used when all other forms of communication
have failed. The CAA Storage Framework provides a heartbeat through the repository disk,
which is only used when IP or SAN heartbeating no longer works.

When the underlying hardware infrastructure is available, you can proceed with the PowerHA
cluster topology configuration. The heartbeat starts right after the first successful “Verify and
Synchronization” operation, when the CAA cluster is created and activated by the PowerHA.

Interface states
The Central cluster repository-based communication (DPCOM) interface has the following
available states. You can see the interface state in the output of the lscluster -i command,
which is shown in Example 1-2.
UP AIX_CONTROLLED
Indicates that the interface is UP, but under AIX control. The user
cannot change the status of this interface.
UP RESTRICTED AIXCONTROLLED
Indicates that the interface is UP and under AIX system control, but is
RESTRICTED from monitoring mode.
STALE The interface configuration data is stale. This state occurs when
communication is lost, but was up previously at some point.

Chapter 1. PowerHA SystemMirror architecture foundation 15


Output of the lscluster -i command
Example 1-2 shows the output from the lscluster -i command. The output shows the
interfaces and the interface states as explained in the previous sections.

Example 1-2 The lscluster -i output for one node


lscluster -i
Network/Storage Interface Query

Cluster Name: au_cl


Cluster uuid: d77ac57e-cc1b-11df-92a4-00145ec5bf9a
Number of nodes reporting = 2
Number of nodes expected = 2
Node sydney
Node uuid = f6a81944-cbce-11df-87b6-00145ec5bf9a
Number of interfaces discovered = 4
Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9a
Smoothed rrt across interface = 8
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 110 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255 netmas
k 255.255.252.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netm
ask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9b
Smoothed rrt across interface = 8
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 110 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.135 broadcast 192.168.203.255 netmas
k 255.255.252.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netm
ask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 0
Mean Deviation in network rrt across interface = 0

16 IBM PowerHA SystemMirror 7.1 for AIX


Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
Node perth
Node uuid = 15bef17c-cbcf-11df-951c-00145e5e3182
Number of interfaces discovered = 4
Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.e7.25.d9
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.136 broadcast 192.168.103.255 netmas
k 255.255.252.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netm
ask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.e7.25.d8
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.136 broadcast 192.168.203.255 netmas
k 255.255.252.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netm
ask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0

Chapter 1. PowerHA SystemMirror architecture foundation 17


Smoothed rrt across interface = 0
Mean Deviation in network rrt across interface = 0
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED

The RESTRICTED and AIX_CONTROLLED interface state


When the network and storage interfaces in the cluster are active and available, the cluster
repository disk appears as restricted and controlled by AIX. (The restricted term identifies the
disk as “not currently used.”) In the output from the lscluster commands, the term dpcom is
used for the cluster repository disk as a communication device and is initially noted as UP
RESTRICTED AIX_CONTROLLED.

When the system determines that the node has lost the normal network or storage interfaces,
the system activates (unrestrict) the cluster repository disk interface (dpcom) and begins
using it for communications. At this point, the interface state changes to UP AIX_CONTROLLED
(unrestricted, but still system controlled).

Point of contact
The output of the lscluster -m command shows a reference to a point of contact as shown in
Example 1-3 on page 19. The local node is displayed as N/A, and the remote node is
displayed as en0 UP. CAA monitors the state and points of contact between the nodes for both
communication interfaces.

A point of contact indicates that a node has received a packet from the other node over the
interface. The point-of-contact status UP indicates that the packet flow is continuing. The
point-of-contact monitor tracks the number of UP points of contact for each communication
interface on the node. If this count reaches zero, the interface is marked as reachable through
the cluster repository disk only.

1.3.2 Communication node status


The node communication status is indicated by the State of Node value in the lscluster -m
command output (Example 1-3 on page 19). The cluster node can have the following
communication states:
UP Indicates that the node is up.
UP NODE_LOCAL Indicates that the node is up and is the local node in the cluster.
UP NODE_LOCAL REACHABLE THROUGH REPOS DISK ONLY
Indicates that the local node is up, but that it is reachable through the
repository disk only.

18 IBM PowerHA SystemMirror 7.1 for AIX


When a node can only communicate by using the cluster repository
disk, the output from the lscluster command notes it as REACHABLE
THROUGH REPOS DISK ONLY.
When the normal network or storage interfaces become available
again, the system automatically detects the restoration of
communication interfaces, and again places dpcom in the restricted
state. See “The RESTRICTED and AIX_CONTROLLED interface
state” on page 18.
UP REACHABLE THROUGH REPOS DISK ONLY
Indicates that the local node is up. It is reachable through the
repository disk only, but not through a local node.
DOWN Indicates that the node is down. If the node does not have access to
the cluster repository disk, the node is marked as down.

Example 1-3 The lscluster -m output


lscluster -m
Calling node query for all nodes
Node query number of nodes examined: 2

Node name: chile


Cluster shorthand id for node: 1
uuid for node: 7067c3fa-ca95-11df-869b-a2e310452004
State of node: UP NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
newyork local 5f2f5d38-cd78-11df-b986-a2e310452003

Number of points_of_contact for node: 0


Point-of-contact interface & contact state
n/a

------------------------------

Node name: serbia


Cluster shorthand id for node: 2
uuid for node: 8a5e2768-ca95-11df-8775-a2e312537404
State of node: UP
Smoothed rtt to node: 7
Mean Deviation in network rtt to node: 3
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
newyork local 5f2f5d38-cd78-11df-b986-a2e310452003

Number of points_of_contact for node: 1


Point-of-contact interface & contact state
en0 UP

Chapter 1. PowerHA SystemMirror architecture foundation 19


Interface up, point of contact down: This phrase means that an interface might be up but
a point-of-contact might be down. In this state, no packets are received from the other
node.

1.3.3 Considerations for the heartbeat configuration


In previous versions of PowerHA, the heartbeat configuration was necessary to configure a
non-IP heartbeat configuration, such as disk-based heartbeating. PowerHA 7.1 no longer
supports disk-based heartbeat monitoring. CAA uses all available interfaces to perform
heartbeat monitoring, including the Repository Disk-based and SAN fiber heartbeat
monitoring methods. Both types of heartbeat monitoring are replacements for the previous
non-IP heartbeat configuration. The cluster also performs heartbeat monitoring, similar to
how it use to perform it, across all available network interfaces.

Heartbeat monitoring is performed by sending and receiving gossip packets across the
network with the multicast protocol. CAA uses heartbeat monitoring to determine
communication problems that need to be reflected in the cluster information.

1.3.4 Deciding when a node is down: Round-trip time (rtt)


CAA monitors the interfaces of each node by using the multicast protocol and gossip packets.
Gossip packets are periodically sent from each node in the cluster for timing purposes. These
gossip packets are automatically replied to by the other nodes of the cluster. The packet
exchanges are used to calculate the round-trip time.

The round-trip time value is shown in the output of the lscluster -i and lscluster -m
commands. The mean deviation in network rtt is the average round-trip time, which is
automatically managed by CAA. Unlike previous versions of PowerHA and HACMP, no
heartbeat tuning is necessary. See Example 1-2 on page 16 and Figure 1-11 for more
information.

Smoothed rtt to node:7


Mean Deviation in network rtt to node: 3
Figure 1-11 Extract from the lscluster -m command output showing the rtt values

Statistical projections are directly employed to compute node-down events. By using normal
network dropped packet rates and the projected round-trip times with mean deviations, the
cluster can determine when a packet was lost or not sent. Each node monitors the time when
a response is due from other nodes in the cluster. If a node finds that a node is overdue, a
node down protocol is initiated in the cluster to determine if the node is down or if network
isolation has occurred.

This algorithm is self-adjusting to load and network conditions, providing a highly reliable and
scalable cluster. Expected round-trip times and variances rise quickly when load conditions
cause delays. Such delays cause the system to wait longer before setting a node down state.
Such a state provides for a high probability of valid state information. (Quantitative
probabilities of errors can be computed.) Conversely, expected round-trip times and variances
fall quickly when delays return to normal.

The cluster automatically adjusts to variances in latency and bandwidth characteristics of


various network and storage interfaces.

20 IBM PowerHA SystemMirror 7.1 for AIX


1.4 PowerHA 7.1 SystemMirror plug-in for IBM Systems
Director
PowerHA SystemMirror provides a plug-in to IBM Systems Director, giving you a graphical
user interface to manage a cluster. This topic includes the following sections:
 Introduction to IBM Systems Director
 Advantages of using IBM Systems Director
 Basic architecture

1.4.1 Introduction to IBM Systems Director


IBM Systems Director provides systems management personnel with a
single-point-of-control, helping to reduce IT management complexity and cost. With IBM
Systems Director, IT personnel can perform the following tasks:
 Optimize computing and network resources
 Quickly respond to business requirements with greater delivery flexibility
 Attain higher levels of services management with streamlined management of physical,
virtual, storage, and network resources

A key feature of IBM Systems Director is a consistent user interface with a focus on driving
common management tasks. IBM Systems Director provides a unified view of the total IT
environment, including servers, storage, and network. With this view, users can perform tasks
with a single tool, IBM Systems Director.

1.4.2 Advantages of using IBM Systems Director


IBM Systems Director offers the following advantages:
 A single, centralized view into all PowerHA SystemMirror clusters
– Centralized and secure access point
Everyone logs in to the same machine, simplifying security and providing an audit trail
of user activities.
– Single sign-on (SSO) capability
After the initial setup is done, using standard Director mechanisms, the password of
each individual node being managed no longer must be provided. Customers log in to
the Director server by using their account on that one machine only and have access to
all PowerHA clusters under management by that server.
 Two highly accessible interfaces
– Graphical interface
The GUI helps to explain and show relationships. It also guides customers through the
learning phase, improving their chances of success with Systems Director.
• Instant and nearly instant help for just about everything
• Maximum, interactive assistance with many tasks
• Maximum error checking
• SystemMirror enterprise health summary

Chapter 1. PowerHA SystemMirror architecture foundation 21


– Textual interface
As with all IBM Systems Director plug-ins, the textual interface (also known as the CLI)
is available through the smcli utility of IBM Systems Director. The namespace (which is
not needed) is sysmirror, for example smcli sysmirror help.
• Maximum speed
• Centralized, cross-cluster scripting
 A common, IBM unified interface (learn once, manage many)
More IBM products are now plugging into Systems Director. Although each individual
plug-in is different, the common framework around each on remains the same, reducing
the education burden of customers. Another benefit is in the synergies that might be used
by having multiple products all sharing a common data store on the IBM Systems Director
server.

To learn more about the advantages of IBM Systems Director, see the PowerHA 7.1
presentation by Peter Schenke at:
http://www-05.ibm.com/ch/events/systems/pdf/6_PowerHA_7_1_News.pdf

1.4.3 Basic architecture


Figure 1-12 shows the basic architecture of IBM Systems Director for PowerHA. IBM Systems
Director is used to quickly and easily scan subnets to find and load AIX systems. When these
systems are unlocked (when the login ID and password are provided), and if PowerHA is
installed on any of these systems, they are automatically discovered and loaded by the
plug-ins.

Three-tier architecture provides scalability:


User Interface Management Server Director Agent

User Interface
Director Agent Web-based interface
Automatically installed on AIX 7.1 & AIX V6.1 TL06 Command-line
interface

AIX
P D Director
PowerHA
Agent
P D

P D

P D
Secure communication

P D

P D Director Server
Central point of control
Supported on AIX, Linux,
P D Discovery of clusters and Windows
and resources Agent manager

Figure 1-12 Basic architecture of IBM Systems Director for PowerHA

22 IBM PowerHA SystemMirror 7.1 for AIX


2

Chapter 2. Features of PowerHA


SystemMirror 7.1
This chapter explains which previously supported features of PowerHA SystemMirror have
been removed. It also provides information about the new features in PowerHA SystemMirror
Standard Edition 7.1 for AIX.

This chapter includes the following topics:


 Deprecated features
 New features
 Changes to the SMIT panel
 The rootvg system event
 Resource management enhancements
 CLUSTER_OVERRIDE environment variable
 CAA disk fencing
 PowerHA SystemMirror event flow differences

© Copyright IBM Corp. 2011. All rights reserved. 23


2.1 Deprecated features
PowerHA SystemMirror 7.1 has removed support for the following previously available
features:
 IP address takeover (IPAT) via IP replacement
 Locally administered address (LAA) for hardware MAC address takeover (HWAT)
 Heartbeat over IP aliases
 clcomdES with the /usr/es/sbin/cluster/etc/rhosts directory is replaced by the Cluster
Aware AIX (CAA) clcomd with the /etc/cluster/rhosts directory
 The following IP network types:
– ATM
– FDDI
– Token Ring
 The following point-to-point (non-IP) network types:
– RS232
– TMSCSI
– TMSSA
– Disk heartbeat (diskhb)
– Multi-node disk heartbeat (mndhb)
 Two-node configuration assistant
 WebSMIT (replaced with the IBM Systems Director plug-in)
 Site support in this version
– Cross-site Logical Volume Manager (LVM) mirroring (available in PowerHA 7.1 SP3)
 IPV6 support in this version

IP address takeover via IP aliasing is now the only supported IPAT option. SAN heartbeat,
provided by the CAA repository disk, and FC heartbeat, as described in the following section,
have replaced all point-to-point (non-IP) network types.

2.2 New features


The new version of PowerHA uses much simpler heartbeat management. This method uses
multicasting, which reduces the burden on the customer to define aliases for heartbeat
monitoring. By default, it supports dual communication paths for most data center
deployments by using both the IP network and the SAN connections (available in 7.1 SP3 and
later). These communication paths are done through the CAA and the central repository disk.

PowerHA SystemMirror 7.1 introduces the following features:


 SMIT panel enhancements
 The rootvg system event
 Systems Director plug-in
 Resource management enhancements
– StartAfter
– StopAfter
 User-defined resource type

24 IBM PowerHA SystemMirror 7.1 for AIX


 Dynamic node priority: Adaptive failover
 Additional disk fencing by CAA
 New Smart Assists for the following products:
– SAP NetWeaver 7.0 (2004s) SR3
– IBM FileNet® 4.5.1
– IBM Tivoli Storage Manager 6.1
– IBM Lotus® Domino® Server
– SAP MaxDB v7.6 and 7.7
 The clmgr tool
The clmgr tool is the new command-line user interface (CLI) with which an administrator
can use a uniform interface to deploy and maintain clusters. For more information, see 5.2,
“Cluster configuration using the clmgr tool” on page 104.

2.3 Changes to the SMIT panel


PowerHA SystemMirror 7.1 includes several changes to the SMIT panel since the release of
PowerHA 6.1. This topic focuses on the most used items on the panel and not the technical
changes behind these items. These changes can help experienced system administrators to
quickly find the paths to the functions they need to implement in their new clusters.

In PowerHA SystemMirror 7.1, the SMIT panel has the following key changes:
 Separation of menus by function
 Addition of the Custom Cluster Configuration menu
 Removal of Extended Distance menus from the base product
 Removal of unsupported dialogs or menus
 Changes to some terminology
 New dialog for specifying repository and cluster IP address
 Many changes in topology and resource menus

2.3.1 SMIT tree


The SMIT tree offers several changes that make it easier for system administrators to find the
task they want to perform. For an overview of these changes, see Appendix B, “PowerHA
SMIT tree” on page 483. To access a list of the SMIT tree and available fast paths, use the
smitty path: smitty hacmp  Can't find what you are looking for ?.

Chapter 2. Features of PowerHA SystemMirror 7.1 25


2.3.2 The smitty hacmp command
Figure 2-1 shows the SMIT screens that you see when you use the smitty hacmp command
or the path: smitty  Communications Applications and Services  PowerHA
SystemMirror. It compares PowerHA 5.5, PowerHA 6.1, and PowerHA SystemMirror 7.1.

Figure 2-1 The screens shown after running the smitty hacmp command

In PowerHA SystemMirror 7.1, the smitty sysmirror (or smit sysmirror) command provides
a new fast path to the PowerHA start menu in SMIT. The old fast path (smitty hacmp) is still
valid.

26 IBM PowerHA SystemMirror 7.1 for AIX


Figure 2-2 shows, in more detail, where some of the main functions moved to. Minor changes
have been made to the following paths, which are not covered in this Redbooks publication:
 System Management (C-SPOC)
 Problem Determination Tools
 Can’t find what you are looking for ?
 Not sure where to start ?

The “Initialization and Standard Configuration” path has been split into two paths: Cluster
Nodes and Networks and Cluster Applications and Resources. For more details about these
paths, see 2.3.4, “Cluster Standard Configuration menu” on page 29. Some features for the
Extended Configuration menu have moved to the Custom Cluster Configuration menu. For
more details about custom configuration, see 2.3.5, “Custom Cluster Configuration menu” on
page 30.

smitty sysmirror
Figure 2-2 PowerHA SMIT start panel

Chapter 2. Features of PowerHA SystemMirror 7.1 27


2.3.3 The smitty clstart and smitty clstop commands
The SMIT screens to start and stop a cluster did not change, and the fast path is still the
same. Figure 2-3 shows the Start Cluster Services panels for PowerHA versions 5.5, 6.1, and
7.1.

Although the SMIT path did not change, some of the wording has changed. For example, the
word “HACMP” was replaced with “Cluster Services.” The path with the new wording is smitty
hacmp  System Management (C-SPOC)  PowerHA SystemMirror Services, and then
you select either the “Start Cluster Services” or “Stop Cluster Services” menu.

Figure 2-3 The screens that are shown when running the smitty clstart command

28 IBM PowerHA SystemMirror 7.1 for AIX


2.3.4 Cluster Standard Configuration menu
In previous versions, the “Cluster Standard Configuration” menu was called the “Initialization
and Standard Configuration” menu. This menu is now split into the following menu options as
indicated in Figure 2-2 on page 27:
 Cluster Nodes and Networks
 Cluster Applications and Resources

This version has a more logical flow. The topology configuration and management part is in
the “Cluster Nodes and Networks” menu. The resources configuration and management part
is in the “Cluster Applications and Resources” menu.

Figure 2-4 shows some tasks and where they have moved to. The dotted line shows where
Smart Assist was relocated. The Two-Node Cluster Configuration Assistant no longer exists.

Figure 2-4 Cluster standard configuration

Chapter 2. Features of PowerHA SystemMirror 7.1 29


2.3.5 Custom Cluster Configuration menu
The “Custom Cluster Configuration” menu is similar to the “Extended Configuration” menu in
the previous release. Unlike the “Extended Configuration” menu, which contains entries that
were duplicated from the standard menu path, the “Custom Cluster Configuration” menu in
PowerHA SystemMirror 7.1 does not contain these duplicate entries. Figure 2-5 shows an
overview of where some of the functions have moved to. The Custom Cluster Configuration
menu is shown in the upper-right corner, and the main PowerHA SMIT menu is shown in the
lower-right corner.

Figure 2-5 Custom Cluster Configuration menu

30 IBM PowerHA SystemMirror 7.1 for AIX


2.3.6 Cluster Snapshot menu
The content of the Cluster Snapshot menu did not change compared to PowerHA 6.1
(Figure 2-6). However, the path to this menu has changed to smitty sysmirror  Cluster
Nodes and Networks  Manage the Cluster  Snapshot Configuration.

Snapshot Configuration

Move cursor to desired item and press Enter.

Create a Snapshot of the Cluster Configuration


Change/Show a Snapshot of the Cluster Configuration
Remove a Snapshot of the Cluster Configuration
Restore the Cluster Configuration From a Snapshot
Configure a Custom Snapshot Method
Figure 2-6 Snapshot Configuration menu

2.3.7 Configure Persistent Node IP Label/Address menu


The content of the SMIT panel to add or change a persistent IP address did not change
compared to PowerHA 6.1 (Figure 2-7). However, the path to it changed to smitty hacmp 
Cluster Nodes and Networks  Manage Nodes  Configure Persistent Node IP
Label/Addresses.

Configure Persistent Node IP Label/Addresses

Move cursor to desired item and press Enter.

Add a Persistent Node IP Label/Address


Change/Show a Persistent Node IP Label/Address
Remove a Persistent Node IP Label/Address
Figure 2-7 Configure Persistent Node IP Label/Addresses menu

2.4 The rootvg system event


PowerHA SystemMirror 7.1 introduces system events. These events are handled by a new
subsystem called clevmgrdES. The rootvg system event allows for the monitoring of loss of
access to the rootvg volume group. By default, in the case of loss of access, the event logs an
entry in the system error log and reboots the system. If required, you can change this option
in the SMIT menu to log only an event entry and not to reboot the system. For further details
about this event and a test example, see 9.4.1, “The rootvg system event” on page 286.

Chapter 2. Features of PowerHA SystemMirror 7.1 31


2.5 Resource management enhancements
PowerHA SystemMirror 7.1 offers the following new resource and resource group
configuration choices. They provide more flexibility in administering resource groups across
the various nodes in the cluster.
 Start After and Stop After resource group dependencies
 User-defined resource type
 Adaptive failover

2.5.1 Start After and Stop After resource group dependencies


The previous version of PowerHA has the following types of resource group dependency
runtime policies:
 Parent-child
 Online on the Same Node
 Online on Different Nodes
 Online On Same Site Location

These policies are insufficient for supporting some complex applications. For example, the
FileNet application server must be started only after its associated database is started. It
does not need to be stopped if the database is brought down for some time and then started.

The following dependencies have been added to PowerHA:


 Start After dependency
 Stop After dependency

The Start After and Stop After dependencies use source and target resource group
terminology. The source resource group depends on the target resource group as shown in
Figure 2-8.

db_rg
Target

Start After

Source

app_rg
Figure 2-8 Start After resource group dependency

32 IBM PowerHA SystemMirror 7.1 for AIX


For Start After dependency, the target resource group must be online on any node in the
cluster before a source (dependent) resource group can be activated on a node. Resource
groups can be released in parallel and without any dependency.

Similarly, for Stop After dependency, the target resource group must be offline on any node in
the cluster before a source (dependent) resource group can be brought offline on a node.
Resource groups are acquired in parallel and without any dependency.

A resource group can serve as both a target and a source resource group, depending on
which end of a given dependency link it is placed. You can specify three levels of
dependencies for resource groups. You cannot specify circular dependencies between
resource groups.

A Start After dependency applies only at the time of resource group acquisition. During a
resource group release, these resource groups do not have any dependencies. A Start After
source resource group cannot be acquired on a node until its target resource group is fully
functional. If the target resource group does not become fully functional, the source resource
group goes into an OFFLINE DUE TO TARGET OFFLINE state. If you notice that a resource group
is in this state, you might need to troubleshoot which resources need to be brought online
manually to resolve the resource group dependency.

When a resource group in a Start After target role falls over from one node to another, the
resource groups that depend on it are unaffected.

After the Start After source resource group is online, any operation (such as bring offline or
move resource group) on the target resource group does not affect the source resource
group. A manual resource group move or bring resource group online on the source resource
group is not allowed if the target resource group is offline.

A Stop After dependency applies only at the time of a resource group release. During
resource group acquisition, these resource groups have no dependency between then. A
Stop After source resource group cannot be released on a node until its target resource group
is offline.

When a resource group in a Stop After source role falls over from one node to another, its
related target resource group is released as a first step. Then the source (dependent)
resource group is released. Next, both resource groups are acquired in parallel, assuming
that no start after or tparent-child dependency exists between these resource groups.

A manual resource group move or bring resource group offline on the Stop After source
resource group is not allowed if the target resource group is online.

Summary: In summary, the source Start After and Stop After target resource groups have
the following dependencies:
 Source Start After target: The source is brought online after the target resource group.
 Source Stop After target: The source is brought offline after the target resource group.

Chapter 2. Features of PowerHA SystemMirror 7.1 33


A parent-child dependency can be seen as being composed of two parts with the newly
introduced Start After and Stop After dependencies. Figure 2-9 shows this logical
equivalence.

Figure 2-9 Comparing Start After, Stop After, and parent-child resource group (rg) dependencies

If you configure a Start After dependency between two resource groups in your cluster, the
applications in these resource groups are started in the configured sequence. To ensure that
this process goes smoothly, configure application monitors and use a Startup Monitoring
mode for the application included in the target resource group.

For a configuration example, see 5.1.6, “Configuring Start After and Stop After resource
group dependencies” on page 96.

2.5.2 User-defined resource type


With PowerHA, you can add your own resource types and specify management scripts to
configure how and where PowerHA processes the resource type. You can then configure a
user-defined resource instance for use in a resource group.

A user-defined resource type is one that you can define for a customized resource that you
can add to a resource group. A user-defined resource type contains several attributes that
describe the properties of the instances of the resource type.

When you create a user-defined resource type, you must choose processing order among
existing resource types. PowerHA SystemMirror processes the user-defined resources at the
beginning of the resource acquisition order if you choose the FIRST value. If you chose any
other value, for example, VOLUME_GROUP, the user-defined resources are acquired after varying
on the volume groups. Then they are released before varying off the volume groups. These
resources are existing resource types. You can choose from a pick list in the SMIT menu.

34 IBM PowerHA SystemMirror 7.1 for AIX


Figure 2-10 shows the existing resource type and acquisition or release order. A user-defined
resource can be any of the following types:
 FIRST
 WPAR
 SERVICEIP
 TAPE (DISKS)
 VOLUME_GROUP
 FILE_SYSTEM
 APPLICATION

DISK

RSCT with CAA


Release Order

Acquisition Order
FILE SYSTEM

Userdefined
Resource

SERVICE IP

APPLICATION

Figure 2-10 Processing order of the resource type

2.5.3 Dynamic node priority: Adaptive failover


The framework for dynamic node priority is already present in the previous versions of
PowerHA. This framework determines the takeover node at the time of a failure according to
one of the following policies:
 cl_highest_free_mem
 cl_highest_idle_cpu
 cl_lowest_disk_busy

The cluster manager queries the Resource Monitoring and Control (RMC) subsystem every
3 minutes to obtain the current value of these attributes on each node. Then the cluster
manager distributes them cluster-wide. For an architecture overview of PowerHA and RSCT,
see 1.1.3, “PowerHA and RSCT” on page 5.

Chapter 2. Features of PowerHA SystemMirror 7.1 35


The dynamic node priority feature is enhanced in PowerHA SystemMirror 7.1 to support the
following policies:
 cl_lowest_nonzero_udscript_rc
 cl_highest_udscript_rc

The return code of a user-defined script is used in determining the destination node.

When you select one of the criteria, you must also provide values for the DNP script path and
DNP timeout attributes for a resource group. PowerHA executes the supplied script and
collects the return codes from all nodes. If you choose the cl_highest_udscript_rc policy,
collected values are sorted. The node that returned the highest value is selected as a
candidate node to fall over. Similarly, if you choose the cl_lowest_nonzero_udscript_rc
policy, collected values are sorted. The node that returned lowest nonzero positive value is
selected as a candidate takeover node. If the return value of the script from all nodes is the
same or zero, the default node priority is considered. PowerHA verifies the script existence
and the execution permissions during verification.

Time-out value: When you select a time-out value, ensure that it is within the time period
for running and completing a script. If you do not specify a time-out value, a default value
equal to the config_too_long time is specified.

For information about configuring the dynamic node priority, see 5.1.8, “Configuring the
dynamic node priority (adaptive failover)” on page 102.

2.6 CLUSTER_OVERRIDE environment variable


In PowerHA SystemMirror 7.1, the use of several AIX commands on cluster resources can
potentially impair the integrity of the cluster configuration. PowerHA SystemMirror 7.1
provides C-SPOC versions of these functions, which are safer to use in the cluster
environment. You can avoid this usage by using the commands outside of C-SPOC. By
default, it is set to allow the use of these commands outside of C-SPOC.

To restrict people from using these commands in the command line, you can change the
default value from yes to no:
1. Locate the following line in the /etc/environment file:
CLUSTER_OVERRIDE=yes
2. Change the line to the following line:
CLUSTER_OVERRIDE=no

The following commands are affected by this variable:


 chfs
 crfs
 chgroup
 chlv
 chpasswd
 chuser
 chvg
 extendlv
 extendvg
 importvg
 mirrorvg

36 IBM PowerHA SystemMirror 7.1 for AIX


 mkgroup
 mklv
 mklvcopy
 mkuser
 mkvg
 reducevg

If the CLUSTER_OVERRIDE variable has the value no, you see an error message similar to the
one shown in Example 2-1.

Example 2-1 Error message when using CLUSTER_OVERRIDE=no


# chfs -a size=+1 /home
The command must be issued using C-SPOC or the override environment variable must
be set.

In this case, use the equivalent C-SPOC CLI called cli_chfs. See the C-SPOC man page for
more details.

Deleting the CLUSTER_OVERRIDE variable: You also see the message shown in
Example 2-1 if you delete the CLUSTER_OVERRIDE variable in your /etc/environment file.

2.7 CAA disk fencing


CAA introduces another level of disk fencing beyond what PowerHA and gsclvmd provide by
using enhanced concurrent volume groups (ECVGs). In previous releases of PowerHA when
using ECVGs in a fast disk takeover mode, the volume groups are in full read/write (active)
mode on the node owning the resource group. Any standby candidate node has the volume
group varied on in read only (passive) mode.

The passive state allows only read access to a volume group special file and the first 4 KB of
a logical volume. Write access through standard LVM is not allowed. However, low-level
commands, such as dd, can bypass LVM and write directly to the disk.

The new CAA disk fencing feature prevents writes from any other nodes to the disk device,
invalidating the potential for a lower-level operation, such as dd, to succeed. However, any
system that has access to that disk might be a member of the CAA cluster. Therefore, its still
important to zone the storage appropriately so that only cluster nodes have the disks
configured.

The PowerHA SystemMirror 7.1 announcement letter explains this fencing feature as a
storage framework that is embedded in the operating system to aid in storage device
management. As part of the framework, fencing disks or disk groups are supported. Fencing
shuts off write access to the shared disks from any entity on the node (irrespective of the
privileges associated with the entity trying to access the disk). Fencing is exploited by
PowerHA SystemMirror to implement strict controls in regard to shared disks and their access
solely from one the nodes that is sharing the disk. Fencing ensures that, when the workload
moves to another node for continuing operations, access to the disks on the departing node is
turned off for write operations.

Chapter 2. Features of PowerHA SystemMirror 7.1 37


2.8 PowerHA SystemMirror event flow differences
The event flow process occurs when the PowerHA SystemMirror cluster starts.

2.8.1 Startup processing


In this example, a resource group must be started on a node. The start server is not done
until the necessary resources are acquired. Figure 2-11 illustrates the necessary steps to
move the acquired resource groups during a node failure.

1)rg_move_acquire
r
es
Start
Cluste

process_resources (NONE)
lls
servic

for each RG:


ca process_resources (ACQUIRE)

RC process_resources (SERVICE_LABELS)
acquire_service_addr
acquire_aconn_service en0 net_ether_01
clstrmgrES process_resources (DISKS)
process_resources (VGS)
Event process_resources (LOGREDO)
cal process_resources (FILESYSTEMS)
Manager ls process_resources (SYNC_VGS)
RC process_resources (TELINIT)
process_resources (NONE)
< Event Summary >
2) rg_move_complete
for each RG: process resources (APPLICATIONS)
start_server app01
process_resources (ONLINE)
process_resources (NONE)
< Event Summary >

Figure 2-11 First node starting the cluster services

TE_RG_MOVE_ACQUIRE is the SystemMirror event listed in the debug file. The


/usr/es/sbin/cluster/events/rg_online.rp recovery program is listed in the HACMP rules
Object Data Manager (ODM) file (Example 2-2).

Example 2-2 The rg_online.rp file


all "rg_move_fence" 0 NULL
barrier
#
all "rg_move_acquire" 0 NULL
#
barrier
#
all "rg_move_complete" 0 NULL

The following section explains what happens when a subsequent node joins the cluster.

38 IBM PowerHA SystemMirror 7.1 for AIX


2.8.2 Another node joins the cluster
When another node starts, it must first join the cluster. If a resource group needs to fall back,
then rg_move_release is done. If the resource group fallback is not needed, the
rg_move_release is skipped. The numbers indicate the order of the steps. The same number
means that parallel processing is taking place. Example 2-3 shows the messages on the
process flow.

Example 2-3 Debug file showing the process of another node joining the cluster
Debug file:
[TE_JOIN_NODE_DEP] r
[TE_RG_MOVE_ACQUIRE]
[TE_JOIN_NODE_DEP_COMPLETE]i

cluster.log file node1:


Nov 23 00:35:06 AIX: EVENT START: node_up node2
Nov 23 00:35:06 AIX: EVENT COMPLETED: node_up node2 0
Nov 23 00:35:11 AIX: EVENT START: rg_move_fence node1 2
Nov 23 00:35:11 AIX: EVENT COMPLETED: rg_move_fence node1 2 0
Nov 23 00:35:11 AIX: EVENT START: rg_move_acquire node1 2
Nov 23 00:35:11 AIX: EVENT START: rg_move node1 2 ACQUIRE
Nov 23 00:35:11 AIX: EVENT COMPLETED: rg_move node1 2 ACQUIRE 0
Nov 23 00:35:11 AIX: EVENT COMPLETED: rg_move_acquire node1 2 0
Nov 23 00:35:15 AIX: EVENT START: rg_move_complete node1 2
Nov 23 00:35:15 AIX: EVENT COMPLETED: rg_move_complete node1 2 0
Nov 23 00:35:18 AIX: EVENT START: node_up_complete node2
Nov 23 00:35:18 AIX: EVENT COMPLETED: node_up_complete node2 0

cluster.log file node2


Nov 23 00:35:06 AIX: EVENT START: node_up node2
Nov 23 00:35:08 AIX: EVENT COMPLETED: node_up node2 0
Nov 23 00:35:11 AIX: EVENT START: rg_move_fence node1 2
Nov 23 00:35:11 AIX: EVENT COMPLETED: rg_move_fence node1 2 0
Nov 23 00:35:11 AIX: EVENT START: rg_move_acquire node1 2
Nov 23 00:35:11 AIX: EVENT START: rg_move node1 2 ACQUIRE
Nov 23 00:35:11 AIX: EVENT START: acquire_service_addr
Nov 23 00:35:13 AIX: EVENT START: acquire_aconn_service en2 appsvc_
Nov 23 00:35:13 AIX: EVENT COMPLETED: acquire_aconn_service en2 app
Nov 23 00:35:13 AIX: EVENT COMPLETED: acquire_service_addr 0
Nov 23 00:35:15 AIX: EVENT COMPLETED: rg_move node1 2 ACQUIRE 0
Nov 23 00:35:15 AIX: EVENT COMPLETED: rg_move_acquire node1 2 0
Nov 23 00:35:15 AIX: EVENT START: rg_move_complete node1 2
Nov 23 00:35:15 AIX: EVENT START: start_server appBctrl
Nov 23 00:35:16 AIX: EVENT COMPLETED: start_server appBctrl 0
Nov 23 00:35:16 AIX: EVENT COMPLETED: rg_move_complete node1 2 0
Nov 23 00:35:18 AIX: EVENT START: node_up_complete node2
Nov 23 00:35:18 AIX: EVENT COMPLETED: node_up_complete node2 0

Chapter 2. Features of PowerHA SystemMirror 7.1 39


Figure 2-12 shows the process flow when another node joins the cluster.

g
n nin
ru t
ar
clstrmgrES clstrmgrES St ster s
u e
Event Messages Event Cl rvic
Manager Manager ca s e
ll 1) rg_move_release
ca C

1) rg_move_release
ll

R
R

C
nothing
fallback to higher node
call

2) rg_move_acquire
(see node leaves slide)

ca
RC

Same sequence as

ll
2) rg_move_acquire node 1 up (previous visual)

RC
Nothing
3) rg_move_complete
3)rg_move_complete for each RG:
nothing process resources (APPLICATIONS)
start_server app02
process_resources (ONLINE)
If no fallback, rg_move_release is not done process_resources (NONE)
< Event Summary >
Figure 2-12 Another node joining the cluster

The next section explains what happens when a node leaves the cluster voluntarily.

40 IBM PowerHA SystemMirror 7.1 for AIX


2.8.3 Node down processing normal with takeover
In this example, a resource group is on the departing node and must be moved to one of the
remaining nodes.

Node failure
The situation is slightly different if the node on the right fails suddenly. Because a node is not
in a position to run any events, the calls to process_resources listed under the right node are
not run as shown in Figure 2-13.

ning p
run Sto ter
s
Clu vices
clstrmgrES clstrmgrES
ca
Event Event ll ser 1) rg_move_release
Manager Messages Manager
for each RG:
RC
ca C

process_resources (RELEASE)
ll

1) rg_move_release
R

process_resources (APPLICATIONS)
stop_server app02
nothing process_resources (FILESYSTEMS)
ca

process_resources (VGS)
ll
ca

2) rg_move_acquire
RC

process_resources (SERVICE_LABELS)
ll
RC

for each RG: release_service_addr


service address < Event Summary >
disks
2) rg_move_acquire
nothing
3) rg_move_complete
3) rg_move_complete
start server
Figure 2-13 Node leaving the cluster (stopped)

Example 2-4 shows details about the process flow from the clstrmgr.debug file.

Example 2-4 clstrmgr.debug file


clstrmgr.debug file:
[TE_FAIL_NODE_DEP]
[TE_RG_MOVE_RELEASE]
[TE_RG_MOVE_ACQUIRE]
[TE_FAIL_NODE_DEP_COMPLETE]
cluster.log file node1
Nov 23 06:24:21 AIX: EVENT COMPLETED: rg_move node1 1 RELEASE 0
Nov 23 06:24:21 AIX: EVENT COMPLETED: rg_move_release node1 1 0
Nov 23 06:24:32 AIX: EVENT START: rg_move_fence node1 1
Nov 23 06:24:32 AIX: EVENT COMPLETED: rg_move_fence node1 1 0
Nov 23 06:24:34 AIX: EVENT START: rg_move_fence node1 2
Nov 23 06:24:34 AIX: EVENT COMPLETED: rg_move_fence node1 2 0
Nov 23 06:24:35 AIX: EVENT START: rg_move_acquire node1 2
Nov 23 06:24:35 AIX: EVENT START: rg_move node1 2 ACQUIRE
Nov 23 06:24:35 AIX: EVENT START: acquire_service_addr
Nov 23 06:24:36 AIX: EVENT START: acquire_aconn_service en2 appsvc_

Chapter 2. Features of PowerHA SystemMirror 7.1 41


Nov 23 06:24:36 AIX: EVENT COMPLETED: acquire_aconn_service en2 app
Nov 23 06:24:36 AIX: EVENT COMPLETED: acquire_service_addr 0
Nov 23 06:24:36 AIX: EVENT START: acquire_takeover_addr
Nov 23 06:24:38 AIX: EVENT COMPLETED: acquire_takeover_addr 0
Nov 23 06:24:41 AIX: EVENT COMPLETED: rg_move node1 2 ACQUIRE 0
Nov 23 06:24:41 AIX: EVENT COMPLETED: rg_move_acquire node1 2 0
Nov 23 06:24:41 AIX: EVENT START: rg_move_complete node1 2
Nov 23 06:24:41 AIX: EVENT START: start_server appActrl
Nov 23 06:24:42 AIX: EVENT START: start_server appBctrl
Nov 23 06:24:42 AIX: EVENT COMPLETED: start_server appBctrl 0
Nov 23 06:24:49 AIX: EVENT COMPLETED: start_server appActrl 0
Nov 23 06:24:49 AIX: EVENT COMPLETED: rg_move_complete node1 2 0
Nov 23 06:24:51 AIX: EVENT START: node_down_complete node2
Nov 23 06:24:51 AIX: EVENT COMPLETED: node_down_complete node2 0

cluster.log node2
Nov 23 06:24:21 AIX: EVENT START: rg_move_release node1 1
Nov 23 06:24:21 AIX: EVENT START: rg_move node1 1 RELEASE
Nov 23 06:24:21 AIX: EVENT START: stop_server appActrl
Nov 23 06:24:21 AIX: EVENT START: stop_server appBctrl
Nov 23 06:24:22 AIX: EVENT COMPLETED: stop_server appBctrl 0
Nov 23 06:24:24 AIX: EVENT COMPLETED: stop_server appActrl 0
Nov 23 06:24:27 AIX: EVENT START: release_service_addr
Nov 23 06:24:28 AIX: EVENT COMPLETED: release_service_addr 0
Nov 23 06:24:29 AIX: EVENT START: release_takeover_addr
Nov 23 06:24:30 AIX: EVENT COMPLETED: release_takeover_addr 0
Nov 23 06:24:30 AIX: EVENT COMPLETED: rg_move node1 1 RELEASE 0
Nov 23 06:24:30 AIX: EVENT COMPLETED: rg_move_release node1 1 0
Nov 23 06:24:32 AIX: EVENT START: rg_move_fence node1 1
Nov 23 06:24:32 AIX: EVENT COMPLETED: rg_move_fence node1 1 0
Nov 23 06:24:34 AIX: EVENT START: rg_move_fence node1 2
Nov 23 06:24:35 AIX: EVENT COMPLETED: rg_move_fence node1 2 0
Nov 23 06:24:35 AIX: EVENT START: rg_move_acquire node1 2
Nov 23 06:24:35 AIX: EVENT START: rg_move node1 2 ACQUIRE
Nov 23 06:24:35 AIX: EVENT COMPLETED: rg_move node1 2 ACQUIRE 0
Nov 23 06:24:35 AIX: EVENT COMPLETED: rg_move_acquire node1 2 0
Nov 23 06:24:41 AIX: EVENT START: rg_move_complete node1 2
Nov 23 06:24:41 AIX: EVENT COMPLETED: rg_move_complete node1 2 0
Nov 23 06:24:51 AIX: EVENT START: node_down_complete node2
Nov 23 06:24:52 AIX: EVENT COMPLETED: node_down_complete node2 0

42 IBM PowerHA SystemMirror 7.1 for AIX


3

Chapter 3. Planning a cluster


implementation for high
availability
This chapter provides guidance for planning a cluster implementation for high availability with
IBM PowerHA SystemMirror 7.1 for AIX. It explains the software, hardware, and storage
requirements with a focus on PowerHA 7.1.

For more details about planning, consider the following publications:


 PowerHA for AIX Cookbook, SG24-7739
 PowerHA SystemMirror Version 7.1 for AIX Planning Guide, SC23-6758-01

This chapter includes the following topics:


 Software requirements
 Hardware requirements
 Considerations before using PowerHA 7.1
 Migration planning
 Storage
 Network

© Copyright IBM Corp. 2011. All rights reserved. 43


3.1 Software requirements
Because PowerHA 7.1 for AIX uses Cluster Aware AIX (CAA) functionality, the following
minimum versions of AIX and Reliable Scalable Cluster Technology (RSCT) are required:
 AIX 6.1 TL6 or AIX 7.1
 RSCT 3.1

CAA cluster: PowerHA SystemMirror creates the CAA cluster automatically. You do not
manage the CAA configuration or state directly. You can use the cluster commands to view
the CAA status directly.

Download and install the latest service packs for AIX and PowerHA from IBM Fix Central at:
http://www.ibm.com/support/fixcentral

3.1.1 Prerequisite for AIX BOS and RSCT components


The following Base Operating System (BOS) components for AIX are required for PowerHA:
 bos.adt.lib
 bos.adt.libm
 bos.adt.syscalls
 bos.ahafs
 bos.clvm.enh
 bos.cluster
 bos.data
 bos.net.tcp.client
 bos.net.tcp.server
 bos.rte.SRC
 bos.rte.libc
 bos.rte.libcfg
 bos.rte.libcur
 bos.rte.libpthreads
 bos.rte.lvm
 bos.rte.odm
 cas.agent (required for the IBM Systems Director plug-in)

The following file sets on the AIX base media are required:
 rsct.basic.rte
 rsct.compat.basic.hacmp
 rsct.compat.clients.hacmp

The appropriate versions of RSCT for the supported AIX releases are also supplied with the
PowerHA installation media.

3.2 Hardware requirements


The nodes of your cluster can be hosted on any hardware system on which installation of AIX
6.1 TL6 or AIX 7.1 is supported. They can be hosted as a full system partition or inside a
logical partition (LPAR).

44 IBM PowerHA SystemMirror 7.1 for AIX


The right design methodology can help eliminate network and disk single points of failure
(SPOF) by using redundant configurations. Have at least two network adapters connected to
different Ethernet switches in the same virtual LAN (VLAN). EtherChannel is supported with
PowerHA. Employ dual-fabric SAN connections to the storage subsystems using at least two
Fibre Channel (FC) adapters and appropriate multipath drivers. Use Redundant Array of
Independent Disks (RAID) technology to protect data from any disk failure.

This topic describes the hardware that is supported.

3.2.1 Supported hardware


Your hardware, including the firmware and the AIX multipath driver, must be in a supported
configuration. For more information about hardware, see Appendix C, “PowerHA supported
hardware” on page 491.

More information: For a list of the supported FC adapters, see “Setting up cluster storage
communication” in the AIX 7.1 Information Center at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.
clusteraware/claware_comm_setup.htm

See the readme files that are provided with the base PowerHA file sets and the latest service
pack. See also the PowerHA SystemMirror 7.1 for AIX Standard Edition Information Center
at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.doc/doc/base/
powerha.htm

The nodes of your cluster can be any system on which the installation of AIX 6.1 TL6 or
AIX 7.1 is supported, either as a full system partition or as a logical partition (LPAR).

Design methodologies can help eliminate network and disk single points of failure (SPOF) by
using redundant configurations. Use at least two network adapters connected to different
Ethernet switches in the same virtual LAN (VLAN). (PowerHA also supports the use of
EtherChannel.) Similarly, use dual-fabric storage area network (SAN) connections to the
storage subsystems with at least two Fibre Channel (FC) adapters and appropriate multipath
drivers. Also use Redundant Array of Independent Disks (RAID) technology to protect data
from any disk failure.

3.2.2 Requirements for the multicast IP address, SAN, and repository disk
Cluster communication requires the use of a multicast IP address. You can specify this
address when you create the cluster, or you can have one be generated automatically. The
ranges 224.0.0.0–224.0.0.255 and 239.0.0.0–239.255.255.255 are reserved for
administrative and maintenance purposes. If multicast traffic is present in the adjacent
network, you must ask the network administrator for a multicast IP address allocation. Also,
ensure that the multicast traffic that is generated by each of the cluster nodes is properly
forwarded by the network infrastructure to any other cluster node.

If you use SAN-based heartbeat, you must have zoning setup to ensure connectivity between
host FC adapters. You also must activate the Target Mode Enabled parameter on the involved
FC adapters.

Hardware redundancy at the storage subsystem level is mandatory for the Cluster Repository
disk. Logical Volume Manager (LVM) mirroring of the repository disk is not supported. The disk

Chapter 3. Planning a cluster implementation for high availability 45


must be at least 1 GB in size and not exceed 10 GB. For more information about supported
hardware for the cluster repository disk, see 3.5.1, “Shared storage for the repository disk” on
page 48.

CAA support: Currently CAA only supports the repository disk Fibre Channel or SAS
disks as described in the “Cluster communication” topic in the AIX 7.1 Information Center
at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.
clusteraware/claware_comm_benifits.htm

3.3 Considerations before using PowerHA 7.1


You must be aware of the following considerations before planning to use PowerHA 7.1:
 You cannot change the host name in a configured cluster.
After the cluster is synchronized, you are unable to change the host name of any of the
cluster nodes. Therefore, changing the host name is not supported.
 You cannot change the cluster name in a configured cluster.
After the cluster is synchronized, you are unable to change the name of the cluster. If you
want to change the cluster name, you must completely remove and recreate the cluster.
 You cannot change the repository location or cluster IP address in a configured cluster.
After the cluster is synchronized, you are unable to change the repository disk or cluster
multicast IP address. To change the repository disk or the cluster multicast IP address,
you must completely remove and recreate the cluster.
 No IPV6 support is available, which is a restriction from the CAA implementation.

3.4 Migration planning


Before migrating your cluster, you must be aware of the following considerations:
 The required software
– AIX
– Virtual I/O Server (VIOS)
 Multicast address
 Repository disk
 FC heartbeat support
 All non-IP networks support removed
– RS232
– TMSCSI
– TMSSA
– Disk heartbeat (DISKHB)
– Multinode disk heartbeat (MNDHB)
 IP networks support removed
– Asynchronous transfer mode (ATM)
– Fiber Distributed Data Interface (FDDI)
– Token ring

46 IBM PowerHA SystemMirror 7.1 for AIX


 IP Address Takeover (IPAT) via replacement support removed
 Heartbeat over alias support removed
 Site support not available in this version
 IPV6 support not available in this version

You can migrate from High-Availability Cluster Multi-Processing (HACMP) or PowerHA


versions 5.4.1, 5.5, and 6.1 only. If you are running a version earlier than HACMP 5.4.1, you
must upgrade to a newer version first.

TL6: AIX must be at a minimum version of AIX 6.1 TL6 (6.1.6.0) on all nodes before
migration. Use of AIX 6.1 TL6 SP2 or later is preferred.

Most migration scenarios require a two-part upgrade. First, you migrate AIX to the minimum
version of AIX 6.1 TL6 on all nodes. You must reboot each node after upgrading AIX. Second,
you migrate to PowerHA 7.1 by using the offline, rolling, or snapshot scenario as explained in
Chapter 7, “Migrating to PowerHA 7.1” on page 151.

In addition, keep in mind the following considerations:


 Multicast address
A multicast address is required for communication between the nodes (used by CAA).
During the migration, you can specify this address or allow CAA to automatically generate
one for you.
Discuss the multicast address with your network administrator to ensure that such
addresses are allowed on your network. Consider firewalls and routers that might not have
this support enabled.
 CAA repository disk
A shared disk that is zoned in and available to all nodes in the cluster is required. This disk
is reserved for use by CAA only.
 VIOS support
You can configure a PowerHA 7.1 cluster on LPARs that are using resources provided by a
VIOS. However, the support of your CAA repository disk has restrictions.

Support for vSCSI: CAA repository disk support for virtual SCSI (vSCSI) is officially
introduced in AIX 6.1 TL6 SP2 and AIX 7.1 SP2. You can create a vSCSI disk
repository at AIX 6.1 TL6 base levels, but not at SP1. Alternatively, direct SAN
connection logical unit numbers (LUNs) or N_Port ID Virtualization (NPIV) LUNs are
supported with all versions.

 SAN heartbeat support


One of the new features of PowerHA 7.1 is the ability to use the SAN fabric for another
communications route between hosts. This feature is implemented through CAA and
replaces Non-IP support in previous versions.

Adapters for SAN heartbeat: This feature requires 4 GB or 8 GB adapters, which


must be direct attach or virtualized. If the adapters are virtualized as vSCSI through
VIOS or by using NPIV, VIOS 2.2.0.11-FP24 SP01 is required.

Chapter 3. Planning a cluster implementation for high availability 47


 Heartbeat support for non-IP configurations (such as disk heartbeat)
Disk-based heartbeat, MNDHB, RS232, TMSCSI, and TMSSA are no longer supported
configurations with PowerHA 7.1. When you migrate, be aware that you cannot keep these
configurations. When the migration is completed, these definitions are removed from the
Object Data Manager (ODM).
As an alternative, PowerHA 7.1 uses SAN-based heartbeat, which is configured
automatically when you migrate.
 Removal of existing network hardware support
FDDI, ATM, and token ring are no longer supported. You must remove this hardware
before you begin the migration.
 IPAT via IP replacement
IPAT via IP replacement for address takeover is no longer supported. You must remove
this configuration before you begin the migration.
 Heartbeat over aliases
Configurations using heartbeat over aliases are no longer supported. You must remove
this configuration before you begin the migration.
 PowerHA SystemMirror for AIX Enterprise Edition (PowerHA/XD) configurations
The latest version of PowerHA/XD is 6.1. You cannot migrate this version to PowerHA 7.1.

3.5 Storage
This section provides details about storage planning considerations for high availability of
your cluster implementation.

3.5.1 Shared storage for the repository disk


You must dedicate a shared disk with a minimum size of 1 GB as a central repository for the
cluster configuration data of CAA. For this disk, configure intrinsic data redundancy by using
hardware RAID features of the external storage subsystems.

For additional information about the shared disk, see the PowerHA SystemMirror Version 7.1
for AIX Standard Edition Concepts and Facilities Guide, SC23-6751. See also the PowerHA
SystemMirror Version 7.1 announcement information or the PowerHA SystemMirror Version
7.1 for AIX Standard Edition Planning Guide, SC23-6758-01, for a complete list of supported
devices.

The following disks are supported (through Multiple Path I/O (MPIO)) for the repository disk:
 All FC disks that configure as MPIO
 IBM DS8000, DS3000, DS4000®, DS5000, XIV®, ESS800, SAN Volume Controller (SVC)
 EMC: Symmetrix, DMX, CLARiiON
 HDS: 99XX, 96XX, OPEN series
 IBM System Storage N series/NetApp®: All models of N series and all NetApp models
common to N series
 VIOS vSCSI
 All IBM serial-attached SCSI (SAS) disks that configure as MPIO
 SAS storage

48 IBM PowerHA SystemMirror 7.1 for AIX


The following storage types are known to work with MPIO but do not have a service
agreement:
 HP
 SUN
 Compellent
 3PAR
 LSI
 Texas Memory Systems
 Fujitsu
 Toshiba

Support for third-party multipathing software: At the time of writing, some third-party
multipathing software was not supported.

3.5.2 Adapters supported for storage communication


At the time of this writing, only the 4 GB and 8 GB FC adapters are supported. Also the
daughter card for IBM System p blades and Emulex FC adapters are supported. See
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Planning Guide, SC23-6758-01,
for additional information.

The following FC and SAS adapters are supported for connection to the repository disk:
 4 GB Single-Port Fibre Channel PCI-X 2.0 DDR Adapter (FC 1905; CCIN 1910)
 4 GB Single-Port Fibre Channel PCI-X 2.0 DDR Adapter (FC 5758; CCIN 280D)
 4 GB Single-Port Fibre Channel PCI-X Adapter (FC 5773; CCIN 5773)
 4 GB Dual-Port Fibre Channel PCI-X Adapter (FC 5774; CCIN 5774)
 4 Gb Dual-Port Fibre Channel PCI-X 2.0 DDR Adapter (FC 1910; CCIN 1910)
 4 Gb Dual-Port Fibre Channel PCI-X 2.0 DDR Adapter (FC 5759; CCIN 5759)
 8 Gb PCI Express Dual Port Fibre Channel Adapter (FC 5735; CCIN 577D)
 8 Gb PCI Express Dual Port Fibre Channel Adapter 1Xe Blade (FC 2B3A; CCIN 2607)
 3 Gb Dual-Port SAS Adapter PCI-X DDR External (FC 5900 and 5912; CCIN 572A)

More information: For the most current list of supported storage adapters for shared disks
other than the repository disk, contact your IBM representative. Also see the “IBM
PowerHA SystemMirror for AIX” web page at:
http://www.ibm.com/systems/power/software/availability/aix/index.html

The PowerHA software supports the following disk technologies as shared external disks in a
highly available cluster:
 SCSI drives, including RAID subsystems
 FC adapters and disk subsystems
 Data path devices (VPATH): SDD 1.6.2.0, or later
 Virtual SCSI (vSCSI) disks

Support for vSCSI: CAA repository disk support for vSCSI is officially introduced in
AIX 6.1 TL6 SP2 and AIX 7.1 SP2. You can create a vSCSI disk repository at AIX 6.1
TL6 base levels, but not at SP1. Alternatively, direct SAN connection LUNs or NPIV
LUNs are supported with all versions.

You can combine these technologies within a cluster. Before choosing a disk technology,
review the considerations for configuring each technology as described in the following
section.

Chapter 3. Planning a cluster implementation for high availability 49


3.5.3 Multipath driver
AIX 7.1 does not support the IBM Subsystem Device Driver (SDD) for TotalStorage® Enterprise
Storage Server®, the IBM System Storage DS8000, and the IBM System Storage SAN Volume
Controller. Instead, you can use the IBM Subsystem Device Driver Path Control Module
(SDDPCM) or native AIX MPIO Path Control Module (PCM) for multipath support on AIX7.1.

AIX MPIO is an architecture that uses PCMs. The following PCMs are all supported:
 SDDPCM
 HDLM PCM
 AIXPCM

SDDPCM only supports DS6000™, DS8000, SVC, and some models of DS4000. HDLM PCM
only supports Hitachi storage devices. AIXPCM supports all storage devices that System p
servers and VIOS support. AIXPCM supports storage devices from over 25 storage vendors.

Support for third-party multipath drivers: At the time of writing, other third-party
multipath drivers (such as EMC PowerPath, and Veritas) are not supported. This limitation
is planned to be resolved in a future release.

See the “Support Matrix for Subsystem Device Driver, Subsystem Device Driver Path Control
Module, and Subsystem Device Driver Device Specific Module” at:
http://www.ibm.com/support/docview.wss?rs=540&uid=ssg1S7001350

Also check whether the coexistence of different multipath drivers using different FC ports on
the same system is supported for mixed cases. For example, the cluster repository disk might
be a on storage or FC adapter other than the shared data disks.

3.5.4 System Storage Interoperation Center


To check the compatibility of your particular storage and SAN infrastructure with PowerHA,
see the System Storage Interoperation Center (SSIC) site at:
http://www.ibm.com/systems/support/storage/config/ssic

3.6 Network
The networking requirements for PowerHA SystemMirror 7.1 differ from all previous versions.
This section focuses specifically on the differences of the following requirements:
 Multicast address
 Network interfaces
 Subnetting requirements for IPAT via aliasing
 Host name and node name
 Other network considerations
– Single adapter networks
– Virtual Ethernet (VIOS)

IPv6: IPv6 is not supported in PowerHA SystemMirror 7.1.

For additional information, and details about common features between versions, see the
PowerHA for AIX Cookbook, SG24-7739.

50 IBM PowerHA SystemMirror 7.1 for AIX


3.6.1 Multicast address
The CAA functionality in PowerHA SystemMirror 7.1 employs multicast addressing for
heartbeating. Therefore, the network infrastructure must handle and allow the use of multicast
addresses. If multicast traffic is present in the adjacent network, you must ask the network
administrator for a multicast IP address allocation. Also, ensure that the multicast traffic
generated by each of the cluster nodes is properly forwarded by the network infrastructure
toward any other cluster node.

3.6.2 Network interfaces


Because PowerHA SystemMirror uses CAA, CAA forces the use of all common network
(Ethernet, InfiniBand, or both) interfaces between the cluster nodes for communications. You
cannot limit which interfaces are used or configured to the cluster.

In previous versions, the network Failure Detection Rate (FDR) policy was tunable, which is
no longer true in PowerHA SystemMirror 7.1.

3.6.3 Subnetting requirements for IPAT via aliasing


In terms of subnetting requirements, IPAT via aliasing is now the only IPAT option available.
IPAT via aliasing has the following subnet requirements:
 All base IP addresses on a node must be on separate subnets.
 All service IP addresses must be on a separate subnet from any of the base subnets.
 The service IP addresses can all be in the same or different subnets.
 The persistent IP address can be in the same or a different subnet from the service IP
address.

If the networks are a single adapter configuration, both the base and service IP addresses
are allowed on the same subnet.

3.6.4 Host name and node name


In PowerHA SystemMirror 7.1, both the cluster node name and AIX host name be the same.

3.6.5 Other network considerations


Other network considerations for using PowerHA SystemMirror 7.1 include single adapter
networks and virtual Ethernet.

Single adapter networks


Through the use of EtherChannel, Shared Ethernet Adapters (SEA), or both at the VIOS
level, it is common today to have redundant interfaces act as one logical interface to the AIX
client or cluster node. In these configurations, historically users configured a netmon.cf file to
ping additional external interfaces or addresses. The netmon.cf configuration file is no longer
required.

Virtual Ethernet
In previous versions, when using virtual Ethernet, users configured a special formatted
netmon.cf file to ping additional external interfaces or addresses by using specific outbound
interfaces. The netmon.cf configuration file no longer applies.

Chapter 3. Planning a cluster implementation for high availability 51


52 IBM PowerHA SystemMirror 7.1 for AIX
4

Chapter 4. Installing PowerHA SystemMirror


7.1 for AIX
This chapter explains how to install the IBM PowerHA SystemMirror 7.1 for AIX Standard
Edition software.

This chapter includes the following topics:


 Hardware configuration of the test environment
 Installing PowerHA file sets
 Volume group consideration

© Copyright IBM Corp. 2011. All rights reserved. 53


4.1 Hardware configuration of the test environment
Figure 4-1 shows a hardware overview of the test environment to demonstrate the installation
and configuration procedures in this chapter. It consists of two IBM Power 570 logical
partitions (LPARs), both SAN-attached to a DS4800 storage subsystem and connected to a
common LAN segment.

Figure 4-1 PowerHA Lab environment

4.1.1 SAN zoning


In the test environment, the conventional SAN zoning is configured between each host and
the storage subsystem to allow for the host attachment of the shared disks.

For the cluster SAN-based communication channel, two extra zones are created as shown in
Example 4-1. One zone includes the fcs0 ports of each server, and the other zone includes
the fcs1 ports of each server.

Example 4-1 Host-to-host zoning for SAN-based channel


sydney:/ # for i in 0 1; do lscfg -vpl fcs$i|grep "Network Address";done
Network Address.............10000000C974C16E
Network Address.............10000000C974C16F

perth:/ # for i in 0 1; do lscfg -vpl fcs$i|grep "Network Address";done


Network Address.............10000000C97720D8
Network Address.............10000000C97720D9

54 IBM PowerHA SystemMirror 7.1 for AIX


Fabric1:
zone: Syndey_fcs0__Perth_fcs0
10:00:00:00:c9:74:c1:6e
10:00:00:00:c9:77:20:d8

Fabric2:
zone: Syndey_fcs1__Perth_fcs1
10:00:00:00:c9:74:c1:6f
10:00:00:00:c9:77:20:d9

This dual zone setup provides redundancy for the SAN communication channel at the Cluster
Aware AIX (CAA) storage framework level. The dotted lines in Figure 4-2 represent the
initiator-to-initiator zones added on top of the conventional ones, connecting host ports to
storage ports.

Figure 4-2 Host-to-host zoning

4.1.2 Shared storage


Three Redundant Array of Independent Disks (RAID) logical drives are configured on the
DS4800 storage subsystem and are presented to both AIX nodes. One logical drive hosts the
cluster repository disk. On the other two drives, the shared storage space is configured for
application data.

Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX 55


Example 4-2 shows that each disk is available through two paths on different Fibre Channel
(FC) adapters.

Example 4-2 FC path setup on AIX nodes


sydney:/ # for i in hdisk1 hdisk2 hdisk3 ; do lspath -l $i;done
Enabled hdisk1 fscsi0
Enabled hdisk1 fscsi1
Enabled hdisk2 fscsi0
Enabled hdisk2 fscsi1
Enabled hdisk3 fscsi0
Enabled hdisk3 fscsi1

perth:/ # for i in hdisk1 hdisk2 hdisk3 ; do lspath -l $i;done


Enabled hdisk1 fscsi0
Enabled hdisk1 fscsi1
Enabled hdisk2 fscsi0
Enabled hdisk2 fscsi1
Defined hdisk3 fscsi0
Enabled hdisk3 fscsi1

The multipath driver being used is the AIX native MPIO. In Example 4-3, the mpio_get_config
command shows identical LUNs on both nodes, as expected.

Example 4-3 MPIO shared LUNs on AIX nodes


sydney:/ # mpio_get_config -Av
Frame id 0:
Storage Subsystem worldwide name: 60ab800114632000048ed17e
Controller count: 2
Partition count: 1
Partition 0:
Storage Subsystem Name = 'ITSO_DS4800'
hdisk LUN # Ownership User Label
hdisk1 7 B (preferred) PW-0201-L7
hdisk2 8 A (preferred) PW-0201-L8
hdisk3 9 B (preferred) PW-0201-L9

perth:/ # mpio_get_config -Av


Frame id 0:
Storage Subsystem worldwide name: 60ab800114632000048ed17e
Controller count: 2
Partition count: 1
Partition 0:
Storage Subsystem Name = 'ITSO_DS4800'
hdisk LUN # Ownership User Label
hdisk1 7 B (preferred) PW-0201-L7
hdisk2 8 A (preferred) PW-0201-L8
hdisk3 9 B (preferred) PW-0201-L9

56 IBM PowerHA SystemMirror 7.1 for AIX


4.1.3 Configuring the FC adapters for SAN-based communication
To properly configure the FC adapters for the cluster SAN-based communication, follow these
steps:

X in fcsX: In the following steps, the X in fcsX represents the number of the FC adapters.
You must complete this procedure for each FC adapter that is involved in cluster
SAN-based communication.

1. Unconfigure fcsX:
rmdev -Rl fcsX

fcsX device busy: If the fcsX device is busy when you use the rmdev command, enter
the following commands:
chdev -P -l fcsX -a tme=yes
chdev -P -l fscsiX -a dyntrk=yes -a fc_err_recov=fast_fail

Then restart the system.

2. Change tme attribute value to yes in the fcsX definition:


chdev -l fcsX -a tme=yes
3. Enable the dynamic tracking and the fast-fail error recovery policy on the corresponding
fscsiX device:
chdev -l fscsiX -a dyntrk=yes -a fc_err_recov=fast_fail
4. Configure fcsX port and its associated Storage Framework Communication device:
cfgmgr -l fcsX;cfgmgr -l sfwcommX
5. Verify the configuration changes by running the following commands:
lsdev -C | grep -e fcsX -e sfwcommX
lsattr -El fcsX | grep tme
lsattr -El fscsiX | grep -e dyntrk -e fc_err_recov

Example 4-4 illustrates the procedure for port fcs0 on node sydney.

Example 4-4 SAN-based communication channel setup


sydney:/ # lsdev -l fcs0
fcs0 Available 00-00 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03)

sydney:/ # lsattr -El fcs0|grep tme


tme no Target Mode Enabled True

sydney:/ # rmdev -Rl fcs0


fcnet1 Defined
sfwcomm0 Defined
fscsi0 Defined
fcs0 Defined

sydney:/ # chdev -l fcs0 -a tme=yes


fcs0 changed

sydney:/ # chdev -l fscsi0 -a dyntrk=yes -a fc_err_recov=fast_fail

Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX 57


fscsi0 changed

sydney:/ # cfgmgr -l fcs0;cfgmgr -l sfwcomm0

sydney:/ # lsdev -C|grep -e fcs0 -e sfwcomm0


fcs0 Available 01-00 8Gb PCI Express Dual Port FC Adapter
(df1000f114108a03)
sfwcomm0 Available 01-00-02-FF Fiber Channel Storage Framework Comm

sydney:/ # lsattr -El fcs0|grep tme


tme yes Target Mode Enabled True

sydney:/ # lsattr -El fscsi0|grep -e dyntrk -e fc_err_recov


dyntrk yes Dynamic Tracking of FC Devices True
fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True

4.2 Installing PowerHA file sets


At a minimum, you must have the following PowerHA runtime executable files:
 cluster.es.client
 cluster.es.server
 cluster.es.cspoc

Depending on the functionality required for your environment, additional file sets might be
selected for installation.

Migration consideration: Installation on top of a previous release is considered a


migration. Additional steps are required for migration including running the clmigcheck
command. For more information about migration, see Chapter 7, “Migrating to PowerHA
7.1” on page 151.

PowerHA SystemMirror 7.1 for AIX Standard Edition includes the Smart Assists images. For
more details about the Smart Assists functionality and new features, see 2.2, “New features”
on page 24.

The PowerHA for IBM Systems Director agent file set comes with the base installation media.
To learn more about PowerHA SystemMirror for IBM Systems Director, see 5.3, “PowerHA
SystemMirror for IBM Systems Director” on page 133.

You can install the required packages in the following ways:


 From a CD
 From a hard disk to which the software has been copied
 From a Network Installation Management (NIM) server

Installation from a CD is more appropriate for small environments. Use NFS export and
import for remote nodes to avoid multiple CD maneuvering or image copy operations.

The following section provides an example of how to use a NIM server to install the PowerHA
software.

58 IBM PowerHA SystemMirror 7.1 for AIX


4.2.1 PowerHA software installation example
This section guides you through an example of installing the PowerHA software. This example
runs on the server configuration shown in 4.1, “Hardware configuration of the test
environment” on page 54.

Installing the AIX BOS components and RSCT


Some of the prerequisite file sets might already be present, or they might be missing from
previous installations, updates, and removals. To begin, a consistent AIX image must be
installed. The test environment entailed starting with a “New and Complete Overwrite” of AIX
6.1.6.1 installation from a NIM server. Example 4-5 shows how to check the AIX version and
the consistency of the installation.

Example 4-5 Initial AIX image


sydney:/ # oslevel -s
6100-06-01-1043
sydney:/ # lppchk -v
sydney:/ #

In Example 4-6, the lslpp command lists the prerequisites that are already installed and the
ones that are missing in a single output.

Example 4-6 Checking the installed and missing prerequisites


sydney:/ # lslpp -L bos.adt.lib bos.adt.libm bos.adt.syscalls bos.clvm.enh \
> bos.cluster.rte bos.cluster.solid bos.data bos.ahafs bos.net.tcp.client \
> bos.net.tcp.server bos.rte.SRC bos.rte.libc bos.rte.libcfg \
> bos.rte.libcur bos.rte.libpthreads bos.rte.lvm bos.rte.odm \
> bos.rte.libcur bos.rte.libpthreads bos.rte.lvm bos.rte.odm \
> rsct.basic.rte rsct.compat.basic.hacmp rsct.compat.clients.hacmp
Fileset Level State Type Description (Uninstaller)
----------------------------------------------------------------------------
bos.adt.lib 6.1.2.0 C F Base Application Development
Libraries
lslpp: Fileset bos.adt.libm not installed.
lslpp: Fileset bos.adt.syscalls not installed.
bos.cluster.rte 6.1.6.1 C F Cluster Aware AIX
bos.cluster.solid 6.1.6.1 C F POWER HA Business Resiliency
solidDB
lslpp: Fileset bos.clvm.enh not installed.
lslpp: Fileset bos.data not installed.
bos.net.tcp.client 6.1.6.1 C F TCP/IP Client Support
bos.net.tcp.server 6.1.6.0 C F TCP/IP Server
bos.rte.SRC 6.1.6.0 C F System Resource Controller
bos.rte.libc 6.1.6.1 C F libc Library
bos.rte.libcfg 6.1.6.0 C F libcfg Library
bos.rte.libcur 6.1.6.0 C F libcurses Library
bos.rte.libpthreads 6.1.6.0 C F pthreads Library
bos.rte.lvm 6.1.6.0 C F Logical Volume Manager
bos.rte.odm 6.1.6.0 C F Object Data Manager
rsct.basic.rte 3.1.0.1 C F RSCT Basic Function
rsct.compat.basic.hacmp 3.1.0.1 C F RSCT Event Management Basic
Function (HACMP/ES Support)

Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX 59


rsct.compat.clients.hacmp 3.1.0.0 C F RSCT Event Management Client
Function (HACMP/ES Support)

Figure 4-3 shows selection of the appropriate lpp_source on the NIM server, aix6161, by
following the path smitty nim  Install and Update Software  Install Software. You
select all of the required file sets on the next panel.

Install and Update Software

Move cursor to desired item and press Enter.

Install Software
Update Installed Software to Latest Level (Update All)
Install Software Bundle
Update Software by Fix (APAR)
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
• Select the LPP_SOURCE containing the install images •
• •
• Move cursor to desired item and press Enter. •
• •
• aix7100g resources lpp_source •
• aix7101 resources lpp_source •
• aix6161 resources lpp_source •
• ha71sp1 resources lpp_source •
• aix6060 resources lpp_source •
• aix6160-SP1-only resources lpp_source •
• •
• F1=Help F2=Refresh F3=Cancel •
• Esc+8=Image Esc+0=Exit Enter=Do •
F1• /=Find n=Find Next •
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 4-3 Installing the prerequisites: Selecting lpp_source

60 IBM PowerHA SystemMirror 7.1 for AIX


Figure 4-4 shows one of the selected file sets, bos.clvm. Although it is not required for
another file set, bos.clvm is mandatory for PowerHA 7.1 because only enhanced concurrent
volume groups (ECVGs) are supported. See 10.3.3, “The ECM volume group” on page 313,
for more details.

Ty••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Pr• Software to Install •
• •
[T• Move cursor to desired item and press Esc+7. Use arrow keys to scroll. •
* • ONE OR MORE items can be selected. •
* • Press Enter AFTER making all selections. •
• •
• [MORE...2286] •
• + 6.1.6.1 POWER HA Business Resiliency solidDB •
• + 6.1.6.0 POWER HA Business Resiliency solidDB •
• •
• > bos.clvm ALL •
• + 6.1.6.0 Enhanced Concurrent Logical Volume Manager •
• •
• bos.compat ALL •
• + 6.1.6.0 AIX 3.2 Compatibility Commands •
• [MORE...4498] •
[M• •
• F1=Help F2=Refresh F3=Cancel •
F1• Esc+7=Select Esc+8=Image Esc+0=Exit •
Es• Enter=Do /=Find n=Find Next •
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 4-4 Installing the prerequisites: Selecting the file sets

After installing from the NIM server, ensure that each node remains at the initial version of AIX
and RSCT, and check the software consistency, as shown in Example 4-7.

Example 4-7 Post-installation check of the prerequisites


sydney:/ # oslevel -s
6100-06-01-1043

sydney:/ # lppchk -v

sydney:/ # lslpp -L rsct.basic.rte rsct.compat.basic.hacmp \


> rsct.compat.clients.hacmp
Fileset Level State Type Description (Uninstaller)
----------------------------------------------------------------------------
rsct.basic.rte 3.1.0.1 C F RSCT Basic Function
rsct.compat.basic.hacmp 3.1.0.1 C F RSCT Event Management Basic
Function (HACMP/ES Support)
rsct.compat.clients.hacmp 3.1.0.0 C F RSCT Event Management Client
Function (HACMP/ES Support)

Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX 61


Installing the PowerHA file sets
To prepare an lpp_source that contains the required base and updated file sets, follow these
steps:
1. Copy the file set from the media to a directory on the NIM server by using the smit
bffcreate command.
2. Apply the latest service pack in the same directory by using the smit bffcreate
command.
3. Create an lpp_source resource that points to the directory on the NIM server.

Example 4-8 lists the contents of the lpp_source. As mentioned previously, both the Smart
Assist file sets and PowerHA for IBM Systems Director agent file set come with the base
media.

Example 4-8 The contents of lpp_source in PowerHA SystemMirror


nimres1:/ # lsnim -l ha71sp1
ha71sp1:
class = resources
type = lpp_source
arch = power
Rstate = ready for use
prev_state = unavailable for use
location = /nimrepo/lpp_source/HA71
alloc_count = 0
server = master

nimres1:/nimres1:/ # ls /nimrepo/lpp_source/HA71
.toc
cluster.adt.es
cluster.doc.en_US.assist
cluster.doc.en_US.assist.db2.html.7.1.0.1.bff
cluster.doc.en_US.assist.oracle.html.7.1.0.1.bff
cluster.doc.en_US.assist.websphere.html.7.1.0.1.bff
cluster.doc.en_US.es
cluster.doc.en_US.es.html.7.1.0.1.bff
cluster.doc.en_US.glvm.html.7.1.0.1.bff
cluster.es.assist
cluster.es.assist.common.7.1.0.1.bff
cluster.es.assist.db2.7.1.0.1.bff
cluster.es.assist.domino.7.1.0.1.bff
cluster.es.assist.ihs.7.1.0.1.bff
cluster.es.assist.sap.7.1.0.1.bff
cluster.es.cfs
cluster.es.cfs.rte.7.1.0.1.bff
cluster.es.client
cluster.es.client.clcomd.7.1.0.1.bff
cluster.es.client.lib.7.1.0.1.bff
cluster.es.client.rte.7.1.0.1.bff
cluster.es.cspoc
cluster.es.director.agent
cluster.es.migcheck
cluster.es.nfs
cluster.es.server
cluster.es.server.diag.7.1.0.1.bff
cluster.es.server.events.7.1.0.1.bff

62 IBM PowerHA SystemMirror 7.1 for AIX


cluster.es.server.rte.7.1.0.1.bff
cluster.es.server.utils.7.1.0.1.bff
cluster.es.worksheets
cluster.license
cluster.man.en_US.es.data
cluster.msg.en_US.assist
cluster.msg.en_US.es
rsct.basic_3.1.0.0
rsct.compat.basic_3.1.0.0
rsct.compat.clients_3.1.0.0
rsct.core_3.1.0.0
rsct.exp_3.1.0.0
rsct.opt.fence_3.1.0.0
rsct.opt.stackdump_3.1.0.0
rsct.opt.storagerm_3.1.0.0
rsct.sdk_3.1.0.0

Example 4-9 shows the file sets that were selected for the test environment and installed from
the lpp_source that was prepared previously. Each node requires a PowerHA license.
Therefore, you must install the license file set.

Example 4-9 List of installed PowerHA file sets


sydney:/ # lslpp -L cluster.*
Fileset Level State Type Description (Uninstaller)
----------------------------------------------------------------------------
Infrastructure
cluster.es.client.lib 7.1.0.1 C F PowerHA SystemMirror Client
Libraries
cluster.es.client.rte 7.1.0.1 C F PowerHA SystemMirror Client
Runtime
cluster.es.client.utils 7.1.0.0 C F PowerHA SystemMirror Client
Utilities
cluster.es.client.wsm 7.1.0.0 C F Web based Smit
cluster.es.cspoc.cmds 7.1.0.0 C F CSPOC Commands
cluster.es.cspoc.dsh 7.1.0.0 C F CSPOC dsh
cluster.es.cspoc.rte 7.1.0.0 C F CSPOC Runtime Commands
cluster.es.migcheck 7.1.0.0 C F PowerHA SystemMirror Migration
support
cluster.es.server.cfgast 7.1.0.0 C F Two-Node Configuration
Assistant
cluster.es.server.diag 7.1.0.1 C F Server Diags
cluster.es.server.events 7.1.0.1 C F Server Events
cluster.es.server.rte 7.1.0.1 C F Base Server Runtime
cluster.es.server.testtool
7.1.0.0 C F Cluster Test Tool
cluster.es.server.utils 7.1.0.1 C F Server Utilities
cluster.license 7.1.0.0 C F PowerHA SystemMirror
Electronic License
cluster.man.en_US.es.data 7.1.0.0 C F Man Pages - U.S. English
cluster.msg.en_US.assist 7.1.0.0 C F PowerHA SystemMirror Smart
Assist Messages - U.S. English
cluster.msg.en_US.es.client
7.1.0.0 C F PowerHA SystemMirror Client
Messages - U.S. English
cluster.msg.en_US.es.server

Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX 63


7.1.0.0 C F Recovery Driver Messages -
U.S. English

Then verify the installed software as shown in Example 4-10. The prompt return by the lppchk
command confirms the consistency of the installed file sets.

Example 4-10 Verifying the installed PowerHA filesets consistency


sydney:/ # lppchk -v
sydney:/ # lppchk -c cluster.*
sydney:/ #

4.3 Volume group consideration


PowerHA 7.1 supports only the use of enhanced concurrent volume groups. If you try to add
an existing non-current volume group to a PowerHA resource group, it fails if it is not already
imported on the other node with the error message shown in Figure 4-5.

Auto Discover/Import of Volume Groups was set to true.


Gathering cluster information, which may take a few minutes.

claddres: test_vg is not a shareable volume group.


Could not perform all imports.
No ODM values were changed.
<01> Importing Volume group: test_vg onto node: chile: FAIL
Verification to be performed on the following:
Cluster Topology
Cluster Resources
Figure 4-5 Error message when adding a volume group

To work around the problem shown in Figure 4-5, manually import the volume group on the
other node by using the following command:
importvg -L test_vg hdiskx

After the volume group is added to the other node, the synchronization and verification are
then completed.

Volume group conversion: The volume group is automatically converted to an enhanced


concurrent volume group during the first startup of the PowerHA cluster.

64 IBM PowerHA SystemMirror 7.1 for AIX


5

Chapter 5. Configuring a PowerHA cluster


To configure a PowerHA cluster, you can choose from the following options.
 SMIT
SMIT is the most commonly used way to manage and configure a cluster. The SMIT
menus are available after the cluster file sets are installed. The learning cycle for using
SMIT is shorter than the learning cycle for using the command-line interface (CLI). For
more information about using SMIT to configure a cluster, see 5.1, “Cluster configuration
using SMIT” on page 66.
 PowerHA SystemMirror plug-in for IBM Systems Director
IBM Systems Director is for users who are ready to use and want to use it to manage and
configure the PowerHA clusters. You might choose this option if you are working with large
environments for central management of all clusters.
You can choose from two methods, as explained in the following sections, to configure a
cluster using IBM Systems Director:
– 12.1.1, “Creating a cluster with the SystemMirror plug-in wizard” on page 334
– 12.1.2, “Creating a cluster with the SystemMirror plug-in CLI” on page 339
 The clmgr CLI
You can use the clmgr utility for configuration tasks. However, its purpose is to provide a
uniform scripting interface for deployments in larger environments and to perform
day-to-day cluster management. For more information about using this tool, see 5.2,
“Cluster configuration using the clmgr tool” on page 104.

You can perform most administration tasks with any of these options. The option that you
choose depends on which one you prefer and which one meets the requirements of your
environment.

This chapter includes the following topics:


 Cluster configuration using SMIT
 Cluster configuration using the clmgr tool
 PowerHA SystemMirror for IBM Systems Director

© Copyright IBM Corp. 2011. All rights reserved. 65


5.1 Cluster configuration using SMIT
This topic includes the following sections:
 SMIT menu changes
 Overview of the test environment
 Typical configuration of a cluster topology
 Custom configuration of the cluster topology
 Configuring resources and applications
 Configuring Start After and Stop After resource group dependencies
 Creating a user-defined resource type
 Configuring the dynamic node priority (adaptive failover)
 Removing a cluster

5.1.1 SMIT menu changes


The SMIT menus for PowerHA SystemMirror 7.1 are restructured to simplify configuration
and administration by grouping menus by function.

Locating available options: If you are familiar with the SMIT paths from an earlier
version, and need to locate a specific feature, use the “Can’t find what you are looking for
?” feature from the main SMIT menu to list and search the available options.

To enter the top-level menu, use the new fast path, smitty sysmirror. The fast path on earlier
versions, smitty hacmp, still works. From the main menu, the highlighted options shown in
Figure 5-1 are available to help with topology and resources configuration. Most of the tools
necessary to configure cluster components are under “Cluster Nodes and Networks” and
“Cluster Applications and Resources.” Some terminology has changed, and the interface
looks more simplified for easier navigation and management.

PowerHA SystemMirror

Move cursor to desired item and press Enter.

Cluster Nodes and Networks


Cluster Applications and Resources

System Management (C-SPOC)


Problem Determination Tools
Custom Cluster Configuration

Can't find what you are looking for ?


Not sure where to start ?
Figure 5-1 Top-level SMIT menu

Because topology monitoring has been transferred to CAA, its management has been
simplified. Support for non-TCP/IP heartbeat has been transferred to CAA and is no longer a
separate configurable option. Instead of multiple menu options and dialogs for configuring
non-TCP/IP heartbeating devices, a single option is available plus a window (Figure 5-2) to
specify the CAA cluster repository disk and the multicast IP address.

66 IBM PowerHA SystemMirror 7.1 for AIX


Up-front help information and navigation aids, similar to the last two items in the top-level
menu in Figure 5-1 (Can't find what you are looking for ? and Not sure where to start ?), are
now available in some of the basic panels. See the last menu option in Figure 5-2 (What are a
repository disk and cluster IP address ?) for an example. The context-sensitive help (F1 key)
in earlier versions is still available.

Initial Cluster Setup (Typical)

Move cursor to desired item and press Enter.

Setup a Cluster, Nodes and Networks


Define Repository Disk and Cluster IP Address

What are a repository disk and cluster IP address ?

F1=Help F2=Refresh F3=Cancel Esc+8=Image


Esc+9=Shell Esc+0=Exit Enter=Do
Figure 5-2 Help information

The top resource menus keep only the commonly used options, and the less frequently used
menus are deeper in the hierarchy, under a new Custom Cluster Configuration menu. This
menu includes various customizable and advanced options, similar to the “Extended
Configuration” menu in earlier versions. See 2.3, “Changes to the SMIT panel” on page 25,
for a layout that compares equivalent menu screens in earlier versions with the new screens.

The Verify and Synchronize functions now have a simplified form in most of the typical menus,
while the earlier customizable version is available in more advanced contexts.

Application server versus application controller: Earlier versions used the term
application server to refer to the scripts that are used to start and stop applications under
SystemMirror control. In version 7.1, these scripts are referred to as application
controllers.

A System Events dialog is now available in addition to the user-defined events and pre- and
post-event commands for predefined events from earlier versions. For more information about
this dialog, see 9.4, “Testing the rootvg system event” on page 286.

SSA disks are no longer supported in AIX 6.1, and the RSCT role has been diminished.
Therefore, some related menu options have been removed. See Chapter 2, “Features of
PowerHA SystemMirror 7.1” on page 23, for more details about the new and obsolete
features.

For a topology configuration, SMIT provides two possible approaches that resemble the
previous Standard and Extended configuration paths: typical configuration and custom
configuration.

Typical configuration
The smitty sysmirror  Cluster Nodes and Networks  Initial Cluster Setup (Typical)
configuration path provides the means to configure the basic components of a cluster in a few
steps. Discovery and selection of configuration information is automated, and default values
are provided whenever possible. If you need to use specific values instead of the default

Chapter 5. Configuring a PowerHA cluster 67


paths that are provided, you can change them later or use the custom configuration path
instead.

Custom configuration
Custom cluster configuration options are not typically required or used by most customers.
However they provide extended flexibility in configuration and management options. These
options are under the Custom Cluster Configuration option in the top-level panel. If you want
complete control over which components are added to the cluster, and create them piece by
piece, you can configure the cluster topology with the SMIT menus. Follow the path Custom
Cluster Configuration  Initial Cluster Setup (Custom). With this path, you can also set
your own node and network names, other than the default ones. Alternatively, you can choose
only specific network interfaces to support the clustered applications. (By default, all IP
configured interfaces are used.)

Resources configuration
The Cluster Applications and Resources menu in the top-level panel groups the commonly
used options for configuring resources, resource groups, and application controllers.

Other resource options that are not required in most typical configurations are under the
Custom Cluster Configuration menu. They provide dialogs and options to perform the
following tasks:
 Configure a custom disk, volume group, and file system methods for cluster resources
 Customize resource recovery and service IP label distribution policy
 Customize and event

Most of the resources menus and dialogs are similar to their counterparts in earlier versions.
For more information, see the existing documentation about the previous releases listed in
“Related publications” on page 519.

5.1.2 Overview of the test environment


The cluster used in the test environment is a mutual-takeover, dual-node implementation with
two resource groups, one on each node. Figure 5-3 on page 69 shows the cluster
configuration on top of the hardware infrastructure introduced in 4.1, “Hardware configuration
of the test environment” on page 54.

68 IBM PowerHA SystemMirror 7.1 for AIX


Figure 5-3 Mutual-takeover, dual-node cluster

By using this setup, we can present various aspects of a typical production implementation,
such as topology redundancy or more complex resource configuration. As an example, we
configure SAN-based heartbeating and introduce the new Start After and Stop After resource
group dependencies.

5.1.3 Typical configuration of a cluster topology


This section explains step-by-step how to configure a basic PowerHA cluster topology using
the typical cluster configuration path. For an example of using the custom cluster
configuration path, see 5.1.4, “Custom configuration of the cluster topology” on page 78.

Prerequisite: Before reading this section, you must have configured all your networks and
storage devices as explained in 3.2, “Hardware requirements” on page 44.

The /etc/cluster/rhosts directory must be populated with all cluster IP addresses before
using PowerHA SystemMirror. This process was done automatically in earlier versions, but is
now a required, manual process. The addresses that you enter in this file must include the
addresses that resolve to the host name of the cluster nodes. If you update this file, you must
refresh the clcomd subsystem with the refresh -s clcomd command.

Chapter 5. Configuring a PowerHA cluster 69


In previous releases of PowerHA, you were not required to have the host name resolve into
an IP address. From the information based on the PowerHA release notes, you are required
to resolve the host name.

Important: Previous releases used the clcomdES subsystem, which read information from
the /usr/es/sbin/cluster/etc/rhosts directory. The clcomdES subsystem is no longer
used. Therefore, you must configure the clcomd subsystem as explained in this section.

Also, ensure that you have one unused shared disk available for the cluster repository.
Example 5-1 shows the lspv command output on the systems sydney and perth. The first
part shows the output from the node sydney, and the second part shows the output from
perth.

Example 5-1 lspv command output before configuring PowerHA


sydney:/ # lspv
hdisk0 00c1f170488a4626 rootvg active
hdisk1 00c1f170fd6b4d9d dbvg
hdisk2 00c1f170fd6b50a5 appvg
hdisk3 00c1f170fd6b5126 None

---------------------------------------------------------------------------
perth:/ # lspv
hdisk0 00c1f1707c6092fe rootvg active
hdisk1 00c1f170fd6b4d9d dbvg
hdisk2 00c1f170fd6b50a5 appvg
hdisk3 00c1f170fd6b5126 None

Node names: The sydney and perth node names have no implication on extended
distance capabilities. The names have been used only for node names.

Defining a cluster
To define a cluster, follow these steps:
1. Use the smitty sysmirror or smitty hacmp fast path.
2. In the PowerHA SystemMirror menu (Figure 5-4), select the Cluster Nodes and
Networks option.

PowerHA SystemMirror

Move cursor to desired item and press Enter.

Cluster Nodes and Networks


Cluster Applications and Resources

System Management (C-SPOC)


Problem Determination Tools
Custom Cluster Configuration

Can't find what you are looking for ?


Not sure where to start ?
Figure 5-4 Menu that is displayed after entering smitty sysmirror

70 IBM PowerHA SystemMirror 7.1 for AIX


3. In the Cluster Nodes and Networks menu (Figure 5-5), select the Initial Cluster Setup
(Typical) option.

Cluster Nodes and Networks

Move cursor to desired item and press Enter.

Initial Cluster Setup (Typical)

Manage the Cluster


Manage Nodes
Manage Networks and Network Interfaces

Discover Network Interfaces and Disks

Verify and Synchronize Cluster Configuration


Figure 5-5 Cluster Nodes and Networks menu

4. In the Initial Cluster Setup (Typical) menu (Figure 5-6), select the Setup a Cluster, Nodes
and Networks option.

Initial Cluster Setup (Typical)

Move cursor to desired item and press Enter.

Setup a Cluster, Nodes and Networks


Define Repository Disk and Cluster IP Address

What are a repository disk and cluster IP address ?


Figure 5-6 Initial cluster setup (typical)

5. From the Setup a Cluster, Nodes, and Networks panel (Figure 5-7 on page 72), complete
the following steps:
a. Specify the repository disk and the multicast IP address.
The cluster name is based on the host name of the system. You can use this default or
replace it with a name you want to use. In the text environment, the cluster is named
australia.
b. In the New Nodes field, define the IP label that you want to use to communicate to the
other systems. In this example, we plan to build a two-node cluster where the two
systems are named sydney and perth. If you want to create a cluster with more than
two nodes, you can specify more than one system by using the F4 key. The advantage
is that you do not get typographical errors, and you can verify that the /etc/hosts file
contains your network addresses.
The Currently Configured Node(s) field lists all the configured nodes or lists the host
name of the system you are working on if nothing is configured so far.
c. Press Enter.

Chapter 5. Configuring a PowerHA cluster 71


Setup Cluster, Nodes and Networks (Typical)

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Cluster Name [australia]
New Nodes (via selected communication paths) [perth] +
Currently Configured Node(s) sydney
Figure 5-7 Setup a Cluster, Nodes and Networks panel

The COMMAND STATUS panel (Figure 5-8) indicates that the cluster creation completed
successfully.

COMMAND STATUS

Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below.

[TOP]
Cluster Name: australia_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: None
Cluster IP Address:
There are 2 node(s) and 1 network(s) defined
NODE perth:
Network net_ether_01
perth 192.168.101.136
NODE sydney:
Network net_ether_01
sydney 192.168.101.135

No resource groups defined


clharvest_vg: Initializing....
Gathering cluster information, which may take a few minutes...
clharvest_vg: Processing...
Storing the following information in file
/usr/es/sbin/cluster/etc/config/clvg_config

perth:
[MORE...93]
Figure 5-8 Cluster creation completed successfully

72 IBM PowerHA SystemMirror 7.1 for AIX


If you receive an error message similar to the example in Figure 5-9, you might have missed a
step. For example, you might not have added the host names to /etc/cluster/rhosts
directory or forgot to use the refresh -s clcomd command. Alternatively, you might have to
change the host name in the /etc/cluster/rhosts directory to a full domain-based host
name.

Reminder: After you change the /etc/cluster/rhosts directory, enter the refresh -s
clcomd command.

COMMAND STATUS

Command: failed stdout: yes stderr: no

Before command completion, additional instructions may appear below.

Warning: There is no cluster found.


cllsclstr: No cluster defined
cllsclstr: Error reading configuration
Figure 5-9 Failure to set up the initial cluster

When you look in more detail at the output, you might notice that the system adds your entries
to the cluster configuration and runs a discovery on the systems. You also get information
about the discovered shared disks that are listed.

Configuring the repository disk and cluster multicast IP address


After you configure the cluster, configure the repository disk and the cluster multicast IP
address.
1. Go back to the Initial Cluster Setup (Typical) panel (Figure 5-6 on page 71). You can use
the path smitty sysmirror  Cluster Nodes and Networks  Initial Cluster Setup
(Typical) or the smitty cm_setup_menu fast path.
2. In the Initial Cluster Setup (Typical) panel, select the Define Repository and Cluster IP
Address option.

Chapter 5. Configuring a PowerHA cluster 73


3. In the Define Repository and Cluster IP Address panel (Figure 5-10), complete these
steps:
a. Press the F4 key to select the disk that you want to use as the repository disk for CAA.
As shown in Example 5-1 on page 70, only one unused shared disk, hdisk3, remains.
b. Leave the Cluster IP Address field empty. The system generates an appropriate
address for you.
The cluster IP address is a multicast address that is used for internal cluster
communication and monitoring. Specify an address manually only if you have an
explicit reason to do so. For more information about the cluster multicast IP address,
see “Requirements for the multicast IP address, SAN, and repository disk” on page 45.

Multicast address not specified: If you did not specify a multicast address, you
can see the one that AIX chose for you in the output of the cltopinfo command.

c. Press Enter.

Define Repository and Cluster IP Address

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Cluster Name australia
* Repository Disk [None] +
Cluster IP Address []

+--------------------------------------------------------------------------+
| Repository Disk |
| |
| Move cursor to desired item and press Enter. |
| |
| hdisk3 |
| |
| F1=Help F2=Refresh F3=Cancel |
F1| F8=Image F10=Exit Enter=Do |
F5| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 5-10 Define Repository and Cluster IP Address panel

74 IBM PowerHA SystemMirror 7.1 for AIX


Then the COMMAND STATUS panel (Figure 5-11) opens.

COMMAND STATUS

Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below.

[TOP]
Cluster Name: australia
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: hdisk3
Cluster IP Address:
There are 2 node(s) and 1 network(s) defined
NODE perth:
Network net_ether_01
perth 192.168.101.136
NODE sydney:
Network net_ether_01
sydney 192.168.101.135

No resource groups defined

Current cluster configuration:

[BOTTOM]
Figure 5-11 COMMAND STATUS showing OK for adding a repository disk

This process only updates the information in the cluster configuration. If you use the lspv
command on any nodes in the cluster, each node still shows the same output as listed in
Example 5-1 on page 70. When the cluster is synchronized the first time, both the CAA
cluster and repository disk are created.

Creating a cluster with host names in the FQDN format


In the testing environments, we create working clusters with both short and fully qualified
domain name (FQDN) host names. To use the FQDN, you must follow this guidance:
 The /etc/hosts file has the FQDN entry first, right after the IP address, and then the short
host name as an alias for each label. In this case, the FQDN name is used by CAA
because CAA always uses the host name for its node names, regardless of whether the
host name is short or FQDN.
 Define the PowerHA node names with the short names because dots are not accepted as
part of a node name.
 As long as the /etc/hosts file contains the FQDN entry first, and then the short name as
an alias, the host name can be either FQDN or short in your configuration.
 As long as the /etc/hosts file contains the FQDN entry first, and then the short name as
an alias, the /etc/cluster/rhosts file can contain only the short name. This file is only
used for the first synchronization of the cluster, when the Object Data Manager (ODM)
classes are still not populated with the communication paths for the nodes. The same

Chapter 5. Configuring a PowerHA cluster 75


function as /usr/es/sbin/cluster/etc/rhosts file exists in previous PowerHA and
HACMP versions.
 When you are defining the interfaces to PowerHA, choose either the short or long name
from the pick lists in SMIT. PowerHA always uses the short name at the end. The same
guidance applies for service or persistent addresses.
 Logical partition (LPAR) names continue to be the short ones, even if you use FQDN for
host names.

Example 5-2 shows a configuration that uses host names in the FQDN format.

Example 5-2 Configuration using host names in the FQDN format


seoul.itso.ibm.com:/ # clcmd cat /etc/hosts
-------------------------------
NODE seoul.itso.ibm.com
-------------------------------
127.0.0.1 loopback localhost # loopback (lo0) name/address
::1 loopback localhost # IPv6 loopback (lo0) name/address
192.168.101.143 seoul-b1.itso.ibm.com seoul-b1 # Base IP label 1
192.168.101.144 busan-b1.itso.ibm.com busan-b1 # Base IP label 1
192.168.201.143 seoul-b2.itso.ibm.com seoul-b2 # Base IP label 2
192.168.201.144 busan-b2.itso.ibm.com busan-b2 # Base IP label 2
10.168.101.43 seoul.itso.ibm.com seoul # Persistent IP
10.168.101.44 busan.itso.ibm.com busan # Persistent IP
10.168.101.143 poksap-db.itso.ibm.com poksap-db # Service IP label
10.168.101.144 poksap-en.itso.ibm.com poksap-en # Service IP label
10.168.101.145 poksap-er.itso.ibm.com poksap-er # Service IP label

-------------------------------
NODE busan.itso.ibm.com
-------------------------------
127.0.0.1 loopback localhost # loopback (lo0) name/address
::1 loopback localhost # IPv6 loopback (lo0) name/address
192.168.101.143 seoul-b1.itso.ibm.com seoul-b1 # Base IP label 1
192.168.101.144 busan-b1.itso.ibm.com busan-b1 # Base IP label 1
192.168.201.143 seoul-b2.itso.ibm.com seoul-b2 # Base IP label 2
192.168.201.144 busan-b2.itso.ibm.com busan-b2 # Base IP label 2
10.168.101.43 seoul.itso.ibm.com seoul # Persistent IP
10.168.101.44 busan.itso.ibm.com busan # Persistent IP
10.168.101.143 poksap-db.itso.ibm.com poksap-db # Service IP label
10.168.101.144 poksap-en.itso.ibm.com poksap-en # Service IP label
10.168.101.145 poksap-er.itso.ibm.com poksap-er # Service IP label

seoul.itso.ibm.com:/ # clcmd hostname


-------------------------------
NODE seoul.itso.ibm.com
-------------------------------
seoul.itso.ibm.com
-------------------------------
NODE busan.itso.ibm.com
-------------------------------
busan.itso.ibm.com

seoul.itso.ibm.com:/ # clcmd cat /etc/cluster/rhosts


-------------------------------
NODE seoul.itso.ibm.com
-------------------------------
seoul
busan
-------------------------------
NODE busan.itso.ibm.com
-------------------------------
seoul
busan

76 IBM PowerHA SystemMirror 7.1 for AIX


seoul.itso.ibm.com:/ # clcmd lsattr -El inet0
-------------------------------
NODE seoul.itso.ibm.com
-------------------------------
authm 65536 Authentication Methods True
bootup_option no Use BSD-style Network Configuration True
gateway Gateway True
hostname seoul.itso.ibm.com Host Name True
rout6 IPv6 Route True
route net,,0,192.168.100.60 Route True
-------------------------------
NODE busan.itso.ibm.com
-------------------------------
authm 65536 Authentication Methods True
bootup_option no Use BSD-style Network Configuration True
gateway Gateway True
hostname busan.itso.ibm.com Host Name True
rout6 IPv6 Route True
route net,,0,192.168.100.60 Route True

seoul.itso.ibm.com:/ # cllsif
Adapter Type Network Net Type Attribute Node IP Address Hardware Address Interface
Name Global Name Netmask Alias for HB Prefix Length
busan-b1 boot net_ether_01 ether public busan 192.168.101.144 en0 255.255.255.0
24
busan-b2 boot net_ether_01 ether public busan 192.168.201.144 en2 255.255.255.0
24
poksap-er service net_ether_01 ether public busan 10.168.101.145
255.255.255.0 24
poksap-en service net_ether_01 ether public busan 10.168.101.144
255.255.255.0 24
poksap-db service net_ether_01 ether public busan 10.168.101.143
255.255.255.0 24
seoul-b1 boot net_ether_01 ether public seoul 192.168.101.143 en0 255.255.255.0
24
seoul-b2 boot net_ether_01 ether public seoul 192.168.201.143 en2 255.255.255.0
24
poksap-er service net_ether_01 ether public seoul 10.168.101.145
255.255.255.0 24
poksap-en service net_ether_01 ether public seoul 10.168.101.144
255.255.255.0 24
poksap-db service net_ether_01 ether public seoul 10.168.101.143
255.255.255.0 24

seoul.itso.ibm.com:/ # cllsnode
Node busan
Interfaces to network net_ether_01
Communication Interface: Name busan-b1, Attribute public, IP address 192.168.101.144
Communication Interface: Name busan-b2, Attribute public, IP address 192.168.201.144
Communication Interface: Name poksap-er, Attribute public, IP address 10.168.101.145
Communication Interface: Name poksap-en, Attribute public, IP address 10.168.101.144
Communication Interface: Name poksap-db, Attribute public, IP address 10.168.101.143

Node seoul
Interfaces to network net_ether_01
Communication Interface: Name seoul-b1, Attribute public, IP address 192.168.101.143
Communication Interface: Name seoul-b2, Attribute public, IP address 192.168.201.143
Communication Interface: Name poksap-er, Attribute public, IP address 10.168.101.145
Communication Interface: Name poksap-en, Attribute public, IP address 10.168.101.144
Communication Interface: Name poksap-db, Attribute public, IP address 10.168.101.143

# LPAR names
seoul.itso.ibm.com:/ # clcmd uname -n
-------------------------------
NODE seoul.itso.ibm.com

Chapter 5. Configuring a PowerHA cluster 77


-------------------------------
seoul
-------------------------------
NODE busan.itso.ibm.com
-------------------------------
busan

seoul.itso.ibm.com:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
sapdb ONLINE seoul
OFFLINE busan

sapen ONLINE seoul


OFFLINE busan

saper ONLINE busan


OFFLINE seoul

# The output below shows that CAA always use the hostname for its node names
# The Power HA nodenames are: seoul, busan
seoul.itso.ibm.com:/ # lscluster -c
Cluster query for cluster korea returns:
Cluster uuid: 02d20290-d578-11df-871d-a24e50543103
Number of nodes in cluster = 2
Cluster id for node busan.itso.ibm.com is 1
Primary IP address for node busan.itso.ibm.com is 10.168.101.44
Cluster id for node seoul.itso.ibm.com is 2
Primary IP address for node seoul.itso.ibm.com is 10.168.101.43
Number of disks in cluster = 2
for disk cldisk2 UUID = 428e30e8-657d-8053-d70e-c2f4b75999e2 cluster_major = 0 cluster_minor = 2
for disk cldisk1 UUID = fe1e9f03-005b-3191-a3ee-4834944fcdeb cluster_major = 0 cluster_minor = 1
Multicast address for cluster is 228.168.101.43

5.1.4 Custom configuration of the cluster topology


For the custom configuration path example, we use the test environment from 4.1, “Hardware
configuration of the test environment” on page 54.

As a preliminary step, add the base IP aliases in /etc/cluster/rhosts file on each node and
refresh the CAA clcomd daemon. Example 5-3 illustrates this step on the node sydney.

Example 5-3 Populating the /etc/cluster/rhosts file


sydney:/ # cat /etc/cluster/rhosts
sydney
perth
sydneyb2
perthb2

sydney:/ # stopsrc -s clcomd;startsrc -s clcomd


0513-044 The clcomd Subsystem was requested to stop.
0513-059 The clcomd Subsystem has been started. Subsystem PID is 4980906.

78 IBM PowerHA SystemMirror 7.1 for AIX


Performing a custom configuration
To perform a custom configuration, follow these steps:
1. Access the Initial Cluster Setup (Custom) panel (Figure 5-12) by following the path smitty
sysmirror  Custom Cluster Configuration  Cluster Nodes and Networks  Initial
Cluster Setup (Custom). This task shows how to use each option on this menu.

Initial Cluster Setup (Custom)

Move cursor to desired item and press Enter.

Cluster
Nodes
Networks
Network Interfaces
Define Repository Disk and Cluster IP Address
Figure 5-12 initial Cluster Setup (Custom) panel for a custom configuration

2. Define the cluster:


a. From the Initial Cluster Setup (Custom) panel (Figure 5-12), follow the path Cluster 
Add/Change/Show a Cluster.
b. In the Add/Change/Show a Cluster panel (Figure 5-13), define the cluster name,
australia.

Add/Change/Show a Cluster

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Cluster Name [australia]
Figure 5-13 Adding a cluster

3. Add the nodes:


a. From the Initial Cluster Setup (Custom) panel, select the path Nodes  Add a Node,
b. In the Add a Node panel (Figure 5-14), specify the first node, sydney, and the path that
is taken to initiate communication with the node. The cluster Node Name might be
different from the host name of the node.
c. Add the second node, perth, in the same way as you did for the sydney node.

Add a Node

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

Entry Fields]
* Node Name [sydney]
Communication Path to Node [sydney] +
Figure 5-14 Add a Node panel

Chapter 5. Configuring a PowerHA cluster 79


4. Add a network:
a. From the Initial Cluster Setup (Custom) panel, follow the path Networks  Add a
Network.
b. In the Add a Network panel (Figure 5-15), For Network Type, select ether.
c. Define a PowerHA logical network, ether01, and specify its netmask. This logical
network is later populated with the corresponding base and service IP labels. You can
define more networks if needed.

Add a Network

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Network Name [ether01]
* Network Type ether
* Netmask(IPv4)/Prefix Length(IPv6) [255.255.252.0]
Figure 5-15 Add a Network panel

5. Add the network interfaces:


a. From the Initial Cluster Setup (Custom) panel, follow the path Network Interfaces 
Add a Network Interface.
b. Select the logical network and populate it with the appropriate interfaces. In the
example shown in Figure 5-16, we select the only defined ether01 network, and add
the interface sydneyb2 on the sydney node. Add in all the other interfaces in the same
way.

Tip: You might find it useful to remember the following points:


 The sydneyb1 and perthb1 addresses are defined in the same subnet network.
 The sydnetb2 and perthb2 addresses are defined in another subnet network.
 All interfaces must have the same network mask.

Add a Network Interface

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* IP Label/Address [sydneyb2] +
* Network Type ether
* Network Name ether01
* Node Name [sydney] +
Network Interface []
Figure 5-16 Add a Network Interface panel

80 IBM PowerHA SystemMirror 7.1 for AIX


6. Define the repository disk and cluster IP address:
a. From the Initial Cluster Setup (Custom) panel, select the Define Repository Disk and
Cluster IP Address option.
b. Choose the physical disk that is used as a central repository of the cluster configuration
and specify the multicast IP address to be associated with this cluster. In the example
shown in Figure 5-17, we let the cluster automatically generate a default value for the
multicast IP address.

Define Repository and Cluster IP Address

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Cluster Name australia
* Repository Disk [hdisk1] +
Cluster IP Address []
Figure 5-17 Define Repository Disk and Cluster IP Address panel

Verifying and synchronizing the custom configuration


With the cluster topology defined, you can verify and synchronize the cluster for the first time.
When the first Verify and Synchronize Cluster Configuration action is successful, the
underlying CAA cluster is activated, and the heartbeat messages begin. We use the
customizable version of the Verify and Synchronize Cluster Configuration command.

Figure 5-18 shows an example where the Automatically correct errors found during
verification? option changed from the default value of No to Yes.

PowerHA SystemMirror Verification and Synchronization

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Verify, Synchronize or Both [Both] +
* Include custom verification library checks [Yes] +
* Automatically correct errors found during [Yes] +
verification?

* Force synchronization if verification fails? [No] +


* Verify changes only? [No] +
* Logging [Standard] +
Figure 5-18 Verifying and synchronizing the cluster configuration (advanced)

Chapter 5. Configuring a PowerHA cluster 81


Upon successful synchronization, check the PowerHA topology and the CAA cluster
configuration by using cltopinfo and lscluster -c commands on any node. Example 5-4
shows usage of the PowerHA cltopinfo command. It also shows how the topology
configured on the node sydney looks on the node perth after synchronization.

Example 5-4 PowerHA cluster topology


perth:/ # cltopinfo
Cluster Name: australia
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: caa_private0
Cluster IP Address:
There are 2 node(s) and 1 network(s) defined
NODE perth:
Network ether01
perthb2 192.168.201.136
perth 192.168.101.136
NODE sydney:
Network ether01
sydneyb2 192.168.201.135
sydney 192.168.101.135

No resource groups defined

Example 5-5 shows a summary configuration of the CAA cluster created during the
synchronization phase.

Example 5-5 CAA cluster summary configuration


perth:/ # lscluster -c
Cluster query for cluster australia returns:
Cluster uuid: d77ac57e-cc1b-11df-92a4-00145ec5bf9a
Number of nodes in cluster = 2
Cluster id for node perth is 1
Primary IP address for node perth is 192.168.101.136
Cluster id for node sydney is 2
Primary IP address for node sydney is 192.168.101.135
Number of disks in cluster = 0
Multicast address for cluster is 228.168.101.135

For more details about the CAA cluster status, see the following section.

Initial CAA cluster status


Check the status of the CAA cluster by using lscluster command. As shown in Example 5-6,
the lscluster -m command lists the node and point-of-contact status information. A
point-of-contact status indicates that a node has received communication packets across this
interface from another node.

Example 5-6 CAA cluster node status


sydney:/ # lscluster -m
Calling node query for all nodes
Node query number of nodes examined: 2

82 IBM PowerHA SystemMirror 7.1 for AIX


Node name: perth
Cluster shorthand id for node: 1
uuid for node: 15bef17c-cbcf-11df-951c-00145e5e3182
State of node: UP
Smoothed rtt to node: 7
Mean Deviation in network rtt to node: 3
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
australia local 98f28ffa-cfde-11df-9a82-00145ec5bf9a

Number of points_of_contact for node: 3


Point-of-contact interface & contact state
sfwcom UP
en2 UP
en1 UP

------------------------------

Node name: sydney


Cluster shorthand id for node: 2
uuid for node: f6a81944-cbce-11df-87b6-00145ec5bf9a
State of node: UP NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
australia local 98f28ffa-cfde-11df-9a82-00145ec5bf9a

Number of points_of_contact for node: 0


Point-of-contact interface & contact state
n/a
sydney:/ #

Example 5-7 shows detailed interface information provided by the lscluster -i command. It
shows information about the network interfaces and the other two logical interfaces that are
used for cluster communication:
sfwcom The node connection to the SAN-based communication channel.
dpcom The node connection to the repository disk.

Example 5-7 CAA cluster interface status


sydney:/ # lscluster -i
Network/Storage Interface Query

Cluster Name: australia


Cluster uuid: d77ac57e-cc1b-11df-92a4-00145ec5bf9a
Number of nodes reporting = 2
Number of nodes expected = 2
Node sydney
Node uuid = f6a81944-cbce-11df-87b6-00145ec5bf9a
Number of interfaces discovered = 4

Chapter 5. Configuring a PowerHA cluster 83


Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9a
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 5
Probe interval for interface = 120 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255 netmask
255.255.252.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9b
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 5
Probe interval for interface = 120 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.135 broadcast 192.168.203.255 netmask
255.255.252.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 0
Mean Deviation in network rrt across interface = 0
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
Node perth
Node uuid = 15bef17c-cbcf-11df-951c-00145e5e3182
Number of interfaces discovered = 4

84 IBM PowerHA SystemMirror 7.1 for AIX


Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.e7.25.d9
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.136 broadcast 192.168.103.255 netmask
255.255.252.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.e7.25.d8
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.136 broadcast 192.168.203.255 netmask
255.255.252.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 0
Mean Deviation in network rrt across interface = 0
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED

Chapter 5. Configuring a PowerHA cluster 85


5.1.5 Configuring resources and applications
This section continues to build up the cluster by configuring its resources, resource groups,
and application controllers. The goal is to prepare the setup that is needed to introduce the
new Start After and Stop After resource group dependencies in PowerHA 7.1. For a
configuration example for these dependencies, see 5.1.6, “Configuring Start After and Stop
After resource group dependencies” on page 96.

Adding storage resources and resource groups from C-SPOC


To add storage resources and resource groups form C-SPOC, follow these steps:
1. Use the smitty cl_lvm fast path or follow the path smitty sysmirror  System
Management (C-SPOC)  Storage to configure storage resources.
2. Create two volume groups, dbvg and appvg. In the Storage panel (Figure 5-19), select the
path Volume Groups  Create a Volume Group (smitty cl_createvg fast path).

Storage

Move cursor to desired item and press Enter.

Volume Groups
Logical Volumes
File Systems

Physical Volumes
Figure 5-19 C-SPOC storage panel

The Volume Groups option is the preferred method for creating a volume group, because it
is automatically configured on all of the selected nodes. Since the release of PowerHA 6.1,
most operations on volume groups, logical volumes, and file systems no longer require
these objects to be in a resource group. Smart menus check for configuration and state
problems and prevent invalid operations before they can be initiated.

86 IBM PowerHA SystemMirror 7.1 for AIX


3. In the Volume Groups panel, in the Node Names dialog (Figure 5-20), select the nodes for
configuring the volume groups.

Volume Groups

Move cursor to desired item and press Enter.

List All Volume Groups


Create a Volume Group
Create a Volume Group with Data Path Devices

Set Characteristics of a Volume Group


Enable a Volume Group for Fast Disk Takeover or Concurrent Access
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
• Node Names •
• •
• Move cursor to desired item and press Esc+7. •
• ONE OR MORE items can be selected. •
• Press Enter AFTER making all selections. •
• •
• > perth •
• > sydney •
• •
• F1=Help F2=Refresh F3=Cancel •
• Esc+7=Select Esc+8=Image Esc+0=Exit •
F1• Enter=Do /=Find n=Find Next •
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 5-20 Nodes selection

Chapter 5. Configuring a PowerHA cluster 87


In the Volume Groups panel (Figure 5-21), only the physical shared disks that are
accessible on the selected nodes are displayed (Physical Volume Names menu).
4. In the Physical Volume Names menu (inset in Figure 5-21), select the volume group type.

Volume Groups

Move cursor to desired item and press Enter.

List All Volume Groups


Create a Volume Group
Create a Volume Group with Data Path Devices

Set Characteristics of a Volume Group


Enable a Volume Group for Fast Disk Takeover or Concurrent Access
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
• Physical Volume Names •
• •
• Move cursor to desired item and press Esc+7. •
• ONE OR MORE items can be selected. •
• Press Enter AFTER making all selections. •
• •
• 00c1f170674f3d6b ( hdisk1 on all selected nodes ) •
• 00c1f1706751bc0d ( hdisk2 on all selected nodes ) •
• •
• F1=Help F2=Refresh F3=Cancel •
• Esc+7=Select Esc+8=Image Esc+0=Exit •
F1• Enter=Do /=Find n=Find Next •
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 5-21 Shared disk selection

PVID: This step automatically creates physical volume IDs (PVIDs) for the unused (no
PVID) shared disks. A shared disk might have different names on selected nodes, but
the PVID is the same.

88 IBM PowerHA SystemMirror 7.1 for AIX


5. In the Create a Volume Group panel (Figure 5-22), specify the volume group name and
the resource group name.
Use the Resource Group Name field to include the volume group into an existing resource
group or automatically create a resource group to hold this volume group. After the
resource group is created, synchronize the configuration for this change to take effect
across the cluster.

Create a Volume Group

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[TOP] [Entry Fields]


Node Names perth,sydney
Resource Group Name [dbrg]
+
PVID 00c1f170674f3d6b
VOLUME GROUP name [dbvg]
Physical partition SIZE in megabytes 4 +
Volume group MAJOR NUMBER [37] #
Enable Cross-Site LVM Mirroring Verification false +
Enable Fast Disk Takeover or Concurrent Access Fast Disk Takeover +
Volume Group Type Original
CRITICAL volume group? no +
Figure 5-22 Creating a volume group in C-SPOC

Chapter 5. Configuring a PowerHA cluster 89


6. Leave the resource group field empty and create or associate the resource group later.
When a volume group is known on multiple nodes, it is displayed in pick lists as <Not in a
Resource Group>. Figure 5-23 shows an example of a pick list.

Logical Volumes

Move cursor to desired item and press Enter.

List All Logical Volumes by Volume Group


Add a Logical Volume
Show Characteristics of a Logical Volume
Set Characteristics of a Logical Volume
Change a Logical Volume
Remove a Logical Volume
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
• Select the Volume Group that will hold the new Logical Volume •
• •
• Move cursor to desired item and press Enter. •
• •
• #Volume Group Resource Group Node List •
• appvg <Not in a Resource Group> perth,sydney •
• caavg_private <Not in a Resource Group> perth,sydney •
• dbvg dbrg perth,sydney •
• •
• F1=Help F2=Refresh F3=Cancel •
• Esc+8=Image Esc+0=Exit Enter=Do •
F1• /=Find n=Find Next •
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 5-23 Adding a logical volume in C-SPOC

7. In the C-SPOC Storage panel (Figure 5-19 on page 86), define the logical volumes and
file systems by selecting the Logical Volumes and File Systems options. The
intermediate and final panels for these actions are similar to those panels in previous
releases.
You can list the file systems that you created by following the path C-SPOC Storage 
File Systems  List All File Systems by Volume Group. The COMMAND STATUS
panel (Figure 5-24) shows the list of file systems for this example.

COMMAND STATUS

Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below.

#File System Volume Group Resource Group Node List


/appmp appvg <None> sydney,perth
/clrepos_private1 caavg_private <None> sydney,perth
/clrepos_private2 caavg_private <None> sydney,perth
/dbmp dbvg dbrg sydney,perth
Figure 5-24 Listing of file systems in C-SPOC

90 IBM PowerHA SystemMirror 7.1 for AIX


Resources and resource groups
By following the path smitty sysmirror  Cluster Applications and Resources, you see
the Cluster Applications and Resources menu (Figure 5-25) for resources and resource group
management.

Cluster Applications and Resources

Move cursor to desired item and press Enter.

Make Applications Highly Available (Use Smart Assists)


Resources
Resource Groups

Verify and Synchronize Cluster Configuration


Figure 5-25 Cluster Applications and Resources menu

Smart Assists: The “Make Applications Highly Available (Use Smart Assists)” function
leads to a menu of all installed Smart Assists. If you do not see the Smart Assist that you
need, verify that the corresponding Smart Assist file set is installed.

Configuring application controllers


To configure the application controllers, follow these steps:
1. From the Cluster Applications and Resources menu, select Resources.
2. In the Resources menu (Figure 5-26), select the Configure User Applications (Scripts
and Monitors) option to configure the application scripts.
Alternatively, use the smitty cm_user_apps fast path or smitty sysmirror  Cluster
Applications and Resources  Resources  Configure User Applications (Scripts
and Monitors).

Resources

Move cursor to desired item and press Enter.

Configure User Applications (Scripts and Monitors)


Configure Service IP Labels/Addresses
Configure Tape Resources

Verify and Synchronize Cluster Configuration


Figure 5-26 Resources menu

Chapter 5. Configuring a PowerHA cluster 91


3. In the Configure User Applications (Scripts and Monitors) panel (Figure 5-27), select the
Application Controller Scripts option.

Configure User Applications (Scripts and Monitors)

Move cursor to desired item and press Enter.

Application Controller Scripts


Application Monitors
Configure Application for Dynamic LPAR and CoD Resources

Show Cluster Applications


Figure 5-27 Configure user applications (scripts and monitors)

4. In the Application Controller Scripts panel (Figure 5-28), select the Add Application
Controller Scripts option.

Application Controller Scripts

Move cursor to desired item and press Enter.

Add Application Controller Scripts


Change/Show Application Controller Scripts
Remove Application Controller Scripts

What is an "Application Controller" anyway ?


Figure 5-28 Application controller scripts

92 IBM PowerHA SystemMirror 7.1 for AIX


5. In the Add Application Controller Scripts panel (Figure 5-29), which looks similar to the
panels in previous versions, follow these steps:
a. In the Application Controller Name field, type the name that you want use as a label for
your application. In this example, we use the name dbac.
b. As in previous versions, in the Start Script field, provide the location of your application
start script.
c. In the Stop Script field, specify the location of your stop script. In this example, we
specify /HA71/db_start.sh as the start script and /HA71/db_stop.sh as the stop script.
d. Optional: To monitor your application, in the Application Monitor Name(s) field, select
one or more application monitors. However, you must define the application monitors
before you can use them here. For an example, see “Configuring application
monitoring for the target resource group” on page 98.

Add Application Controller Scripts

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Application Controller Name [dbac]
* Start Script [/HA71/db_start.sh]
* Stop Script [/HA71/db_stop.sh]
Application Monitor Name(s) +
Figure 5-29 Adding application controller scripts

The configuration of the applications is completed. The next step is to configure the service IP
addresses.

Configuring IP service addresses


To configure the IP service addresses, follow these steps:
1. Return to the Resource panel (Figure 5-26 on page 91) by using the
smitty cm_resources_menu fast path or smitty sysmirror  Cluster Applications and
Resources  Resources.
2. In the Resource panel, select the Configure Service IP Labels/Addresses option.
3. In the Configure Service IP Labels/Addresses menu (Figure 5-30), select the Add a
Service IP Label/Address option.

Configure Service IP Labels/Addresses

Move cursor to desired item and press Enter.

Add a Service IP Label/Address


Change/ Show a Service IP Label/Address
Remove Service IP Label(s)/Address(es)
Configure Service IP Label/Address Distribution Preferences
Figure 5-30 Configure Service IP Labels/Addresses menu

Chapter 5. Configuring a PowerHA cluster 93


4. In the Network Name subpanel (Figure 5-31), select the network to which you want to add
the Service IP Address. In this example, only one network is defined.

Configure Service IP Labels/Addresses

Move cursor to desired item and press Enter.

Add a Service IP Label/Address


Change/ Show a Service IP Label/Address
Remove Service IP Label(s)/Address(es)
Configure Service IP Label/Address Distribution Preferences

+--------------------------------------------------------------------------+
| Network Name |
| |
| Move cursor to desired item and press Enter. |
| |
| ether01 (192.168.100.0/22 192.168.200.0/22) |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 5-31 Network Name subpanel for the Add a Service IP Label/Address option

5. In the Add a Service IP Label/Address panel, which changes as shown in Figure 5-32, in
the IP Label/Address field, select the service address that you want to add.

Service address defined: As in previous versions, the service address must be


defined in the /etc/hosts file. Otherwise, you cannot select it by using the F4 key.

You can use the Netmask(IPv4)/Prefix Length(IPv6) field to define the netmask. With IPv4,
you can leave this field empty. The Network Name field is prefilled.

Add a Service IP Label/Address

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* IP Label/Address sydneys +
Netmask(IPv4)/Prefix Length(IPv6) []
* Network Name ether01
Figure 5-32 Details of the Add a Service IP Label/Address panel

You have now finished configuring the resources. In this example, you defined one service IP
address. If you need to add more service IP addresses, repeat the steps as indicated in this
section.

As explained in the following section, the next step is to configure the resource groups.

94 IBM PowerHA SystemMirror 7.1 for AIX


Configuring resource groups
To configure the resource groups, follow these steps:
1. Go to the Cluster Applications and Resources panel (Figure 5-25 on page 91).
Alternatively, use the smitty cm_apps_resources fast path or smitty sysmirror  Cluster
Applications and Resources.
2. In the Cluster Applications and Resources panel, select Resource Groups.
3. In the Resource Groups menu (Figure 5-33), add a resource group by selecting the Add a
Resource Group option.

Resource Groups

Move cursor to desired item and press Enter.

Add a Resource Group


Change/Show Nodes and Policies for a Resource Group
Change/Show Resources and Attributes for a Resource Group
Remove a Resource Group
Configure Resource Group Run-Time Policies
Show All Resources by Node or Resource Group

Verify and Synchronize Cluster Configuration

What is a "Resource Group" anyway ?


Figure 5-33 Resource Groups menu

4. In the Add a Resource Group panel (Figure 5-34), as in previous versions of PowerHA,
specify the resource group name, the participating nodes, and the policies.

Add a Resource Group

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Resource Group Name [dbrg]
* Participating Nodes (Default Node Priority) [sydney perth] +

Startup Policy Online On Home Node O> +


Fallover Policy Fallover To Next Prio> +
Fallback Policy Fallback To Higher Pr> +
Figure 5-34 Add a Resource Group panel

Chapter 5. Configuring a PowerHA cluster 95


5. Configure the resources into the resource group. If you need more than one resource
group, repeat the previous step to add a resource group.
a. To configure the resources to the resource group, go back to the Resource Groups
panel (Figure 5-33 on page 95), and select the Change/Show Resources and
Attributes for a Resource Group.
b. In the Change/Show Resources and Attributes for a Resource Group panel
(Figure 5-35), define the resources for the resource group.

Change/Show All Resources and Attributes for a Resource Group

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[TOP] [Entry Fields]


Resource Group Name dbrg
Participating Nodes (Default Node Priority) sydney perth

Startup Policy Online On Home Node O>


Fallover Policy Fallover To Next Prio>
Fallback Policy Fallback To Higher Pr>
Fallback Timer Policy (empty is immediate) [] +

Service IP Labels/Addresses [sydneys] +


Application Controllers [dbac] +

Volume Groups [dbvg] +


Use forced varyon of volume groups, if necessary false +
[MORE...24]
Figure 5-35 Change/Show All Resources and Attributes for a Resource Group panel

You have now finished configuring the resource group.

Next, you synchronize the cluster nodes. If the Verify and Synchronize Cluster Configuration
task is successfully completed, you can start your cluster. However, you might first want to
see if the CAA cluster was successfully created by using the lscluster -c command.

5.1.6 Configuring Start After and Stop After resource group dependencies
In this section, you configure a Start After resource group dependency and similarly create a
Stop After resource group dependency. For more information about Start After and Stop After
resource group dependencies, see 2.5.1, “Start After and Stop After resource group
dependencies” on page 32.

96 IBM PowerHA SystemMirror 7.1 for AIX


You can manage Start After dependencies between resource groups by following the path
smitty sysmirror  Cluster Applications and Resources  Resource Groups 
Configure Resource Group Run-Time Policies  Configure Dependencies between
Resource Groups  Configure Start After Resource Group Dependency. Figure 5-36
shows the Configure Start After Resource Group Dependency menu.

Configure Start After Resource Group Dependency

Move cursor to desired item and press Enter.

Add Start After Resource Group Dependency


Change/Show Start After Resource Group Dependency
Remove Start After Resource Group Dependency
Display Start After Resource Group Dependencies
Figure 5-36 Configuring Start After Resource Group dependency menu

To add a new dependency, in the Configure Start After Resource Group Dependency menu,
select the Add Start After Resource Group Dependency option. In this example, we
already configured the dbrg and apprg resource groups. The apprg resource group is defined
as the source (dependent) resource group as shown in Figure 5-37.

Configure Start After Resource Group Dependency

Move cursor to desired item and press Enter.

Add Start After Resource Group Dependency


Change/Show Start After Resource Group Dependency
Remove Start After Resource Group Dependency
Display Start After Resource Group Dependencies

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
• Select the Source Resource Group •
• •
• Move cursor to desired item and press Enter. •
• •
• apprg •
• dbrg •
• •
• F1=Help F2=Refresh F3=Cancel •
• Esc+8=Image Esc+0=Exit Enter=Do •
F1• /=Find n=Find Next •
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 5-37 Selecting the source resource group of a Start After dependency

Chapter 5. Configuring a PowerHA cluster 97


Figure 5-38 shows dbrg resource group defined as the target resource group.

Configure Start After Resource Group Dependency

Move cursor to desired item and press Enter.

Add Start After Resource Group Dependency


Change/Show Start After Resource Group Dependency
Remove Start After Resource Group Dependency
Display Start After Resource Group Dependencies

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
• Select the Target Resource Group •
• •
• Move cursor to desired item and press Esc+7. •
• ONE OR MORE items can be selected. •
• Press Enter AFTER making all selections. •
• •
• dbrg •
• •
• F1=Help F2=Refresh F3=Cancel •
• Esc+7=Select Esc+8=Image Esc+0=Exit •
F1• Enter=Do /=Find n=Find Next •
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 5-38 Selecting the target resource group of a Start After dependency

Example 5-8 shows the result.

Example 5-8 Start After dependency configured


sydney:/ # clrgdependency -t'START_AFTER' -sl
#Source Target
apprg dbrg
sydney:/ #

Configuring application monitoring for the target resource group


The Start After dependency guarantees that only the source resource group is started after
the target resource group is started. You might need the application in your source resource
group (source startup script) to start only after a full and successful start of the application in
your target resource group (after target startup script returns 0). In this case, you must
configure the startup monitoring for your target application. The dummy scripts in
Example 5-9 show the configuration of the test cluster.

Example 5-9 Dummy scripts for target and source applications


sydney:/HA71 # ls -l
total 48
-rwxr--r-- 1 root system 226 Oct 12 07:00 app_mon.sh
-rwxr--r-- 1 root system 283 Oct 12 07:06 app_start.sh
-rwxr--r-- 1 root system 233 Oct 12 07:03 app_stop.sh
-rwxr--r-- 1 root system 201 Oct 12 06:03 db_mon.sh
-rwxr--r-- 1 root system 274 Oct 12 07:24 db_start.sh
-rwxr--r-- 1 root system 229 Oct 12 06:04 db_stop.sh

98 IBM PowerHA SystemMirror 7.1 for AIX


The remainder of this task continues from the configuration started in “Configuring application
controllers” on page 91. You only have to add a monitor for the dbac application controller that
you already configured.

Follow the path smitty sysmirror  Cluster Applications and Resources 


Resources  Configure User Applications (Scripts and Monitors)  Add Custom
Application Monitor. The Add Custom Application Monitor panel (Figure 5-39) is displayed.
We do not explain the fields here because they are the same as the fields in previous
versions. However, keep in mind that the Monitor Mode value Both means both startup
monitoring and long-running monitoring.

Add Custom Application Monitor

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Monitor Name [dbam]
* Application Controller(s) to Monitor dbac +
* Monitor Mode [Both] +
* Monitor Method [/HA71/db_mon.sh]
Monitor Interval [30] #
Hung Monitor Signal [] #
* Stabilization Interval [120] #
* Restart Count [3] #
Restart Interval [] #
* Action on Application Failure [fallover] +
Notify Method [/HA71/db_stop.sh]
Cleanup Method [/HA71/db_start.sh]
Restart Method []
Figure 5-39 Adding the dbam custom application monitor

Chapter 5. Configuring a PowerHA cluster 99


Similarly, you can configure an application monitor and an application controller for the apprg
resource group as shown in Figure 5-40.

Change/Show Custom Application Monitor

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Monitor Name appam
Application Controller(s) to Monitor appac +
* Monitor Mode [Long-running monitori> +
* Monitor Method [/HA71/app_mon.sh]
Monitor Interval [30] #
Hung Monitor Signal [9] #
* Stabilization Interval [15] #
Restart Count [3] #
Restart Interval [594] #
* Action on Application Failure [fallover] +
Notify Method []
Cleanup Method [/HA71/app_stop.sh]
Restart Method [/HA71/app_start.sh]
Figure 5-40 Configuring the appam application monitor and appac application controller

For a series of tests performed on this configuration, see 9.8, “Testing a Start After resource
group dependency” on page 297.

5.1.7 Creating a user-defined resource type


Now create a user-defined resource type by using SMIT:
1. To define a user-defined resource type, follow the path smitty sysmirror  Custom
Cluster Configuration  Resources  Configure User Defined Resources and
Types  Add a User Defined Resource Type.

Resource type management: PowerHA SystemMirror automatically manages most


resource types.

100 IBM PowerHA SystemMirror 7.1 for AIX


2. In the Add a User Defined Resource Type panel (Figure 5-41), define a resource type.
Also select the processing order from the pick list.

Add a User Defined Resource Type

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Resource Type Name [my_resource_type]
* Processing order [] +
Verification Method []
Verification Type [Script] +
Start Method []
Stop Method []
+--------------------------------------------------------------------------+
¦ Processing order ¦
¦ ¦
¦ Move cursor to desired item and press Enter. ¦
¦ ¦
¦ FIRST ¦
¦ WPAR ¦
¦ VOLUME_GROUP ¦
¦ FILE_SYSTEM ¦
¦ SERVICEIP ¦
¦ TAPE ¦
¦ APPLICATION ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
F1¦ Esc+8=Image Esc+0=Exit Enter=Do ¦
Es¦ /=Find n=Find Next ¦
Es+--------------------------------------------------------------------------+
Figure 5-41 Adding a user-defined resource type

3. After you create your own resource, add it to the resource group. The resource group can
be shown in the pick list. This information is stored in the HACMresourcetype,
HACMPudres_def, and HACMPudresouce cluster configuration files.

Chapter 5. Configuring a PowerHA cluster 101


5.1.8 Configuring the dynamic node priority (adaptive failover)
As mentioned in 2.5.3, “Dynamic node priority: Adaptive failover” on page 35, in PowerHA
7.1, you can decide node priority based on the return value of your own script. To configure
the dynamic node priority (DNP), follow these steps:
1. Follow the path smitty sysmirror  Cluster Applications and Resource  Resource
Groups  Change/Show Resources and Attributes for a Resource Group (if you
already have your resource group).
As you can see in Change/Show Resources and Attributes for a Resource Group panel
(Figure 5-42), the algeria_rg resource group has default node priority. The participating
nodes are algeria, brazil, and usa.
2. To configure DNP, choose the dynamic node priority policy. In this example, we chose
cl_lowest_nonzero_udscript_rc as the dynamic node priority. Usage of this DNP means
the node that has the lowest return from the DNP script gets the highest priority among the
nodes. Also define the DNP script path and timeout value.

DNP script for the nodes: Ensure that all nodes have the DNP script and that the
script has executable mode. Otherwise, you receive an error message while running the
synchronization or verification process.

For a description of this test scenario, see 9.9, “Testing dynamic node priority” on
page 302.

Change/Show All Resources and Attributes for a Resource Group

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[TOP] [Entry Fields]


Resource Group Name algeria_rg
Participating Nodes (Default Node Priority) algeria brazil usa
* Dynamic Node Priority Policy [cl_lowest_nonzero_uds> +
DNP Script path <HTTPServer/bin/DNP.sh] /
DNP Script timeout value [20] #

Startup Policy Online On Home Node O>


Fallover Policy Fallover Using Dynami>
Fallback Policy Fallback To Higher Pr>
Fallback Timer Policy (empty is immediate) [] +

[MORE...11]

F1=Help F2=Refresh F3=Cancel F4=List


Esc+5=Reset Esc+6=Command Esc+7=Edit Esc+8=Image
Esc+9=Shell Esc+0=Exit Enter=Do
Figure 5-42 Configuring DNP in a SMIT session

102 IBM PowerHA SystemMirror 7.1 for AIX


5.1.9 Removing a cluster
You can remove your cluster by using the path smitty sysmirror  Cluster Nodes and
Networks  Manage the Cluster  Remove the Cluster Definition.

Removing a cluster consists of deleting the PowerHA definition and deleting the CAA cluster
from AIX. Removing the CAA cluster is the last step of the Remove operation as shown in
Figure 5-43.

COMMAND STATUS

Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below.

Attempting to delete node "Perth" from the cluster...


Attempting to delete the local node from the cluster...
Attempting to delete the cluster from AIX ...

F1=Help F2=Refresh F3=Cancel Esc+6=Command


Esc+8=Image Esc+9=Shell Esc+0=Exit /=Find
n=Find Next
Figure 5-43 Removing the cluster

Normally, deleting the cluster with this method removes both the PowerHA SystemMirror and
the CAA cluster definitions from the system. If a problem is encountered while PowerHA is
trying to remove the CAA cluster, you might need to delete the CAA cluster manually. For
more information, see Chapter 10, “Troubleshooting PowerHA 7.1” on page 305.

After you remove the cluster, ensure that the caavg_private volume group is no longer
displayed as shown in Figure 5-44.

--- before ---


# lspv
caa_private0 000fe40120e16405 caavg_private active
hdisk6 000fe4114cf8d258 dbvg
hdisk7 000fe4114cf8d2ec applvg
hdisk0 000fe411201305c3 rootvg active
--- after ---
# lspv
hdisk5 000fe40120e16405 None
hdisk6 000fe4114cf8d258 dbvg
hdisk7 000fe4114cf8d2ec applvg
hdisk0 000fe411201305c3 rootvg active
Figure 5-44 The lspv command output before and after removing a cluster

Chapter 5. Configuring a PowerHA cluster 103


5.2 Cluster configuration using the clmgr tool
PowerHA 7.1 introduces the clmgr command-line tool. This tool is partially new. It is based on
the clvt tool with the following improvements:
 Consistent usage across the supported functions
 Improved usability
 Improved serviceability
 Uses fully globalized message catalog
 Multiple levels of debugging
 Automatic help

To see the possible values for the attributes, use the man clvt command.

5.2.1 The clmgr action commands


The following actions are currently supported in the clmgr command:
 add
 delete
 manage
 modify
 move
 offline
 online
 query
 recover
 sync
 view

For a list of actions, you can use clmgr command with no arguments. See “The clmgr
command” on page 106 and Example 5-10 on page 106.

Most of the actions in the list provide aliases. Table 5-1 shows the current actions and their
abbreviations and aliases.

Table 5-1 Command aliases


Actual Synonyms or aliases

add a, create, make, mk

query q, ls, get

modify mod, ch, set

delete de, rem, rm er

online on, start

offline off, stop

move mov, mv

recover rec

sync sy

verify ve

view vi, cat

104 IBM PowerHA SystemMirror 7.1 for AIX


Actual Synonyms or aliases

manage mg

5.2.2 The clmgr object classes


The following object classes are currently supported:
 application_controller
 application_monitor
 cluster
 dependency
 fallback_timer
 file_collection
 file_system (incomplete coverage)
 interface
 logical_volume (incomplete coverage)
 method (incomplete coverage)
 network
 node
 persistent_ip
 physical_volume (incomplete coverage)
 report
 resource_group
 service_ip
 snapshot
 site
 tape
 volume_group (incomplete coverage)

For a list, you can use clmgr with no arguments. See “The clmgr query command” on
page 107 and Example 5-11 on page 107.

Most of these object classes in the list provide aliases. Table 5-2 on page 105 lists the current
object classes and their abbreviations and aliases.

Table 5-2 Object classes with aliases


Actual Minimum string

Cluster cl

site si

node no

interface in, if

network ne, nw

resource_group rg

service_ip se

persistent_ip pe, pi

application_controller ac, app, appctl

Chapter 5. Configuring a PowerHA cluster 105


5.2.3 Examples of using the clmgr command
This section provides information about some of the clmgr commands. An advantage of the
clmgr command compared to the clvt command is that it is not case-sensitive. For more
details about the clmgr command, see Appendix D, “The clmgr man page” on page 501.

For a list of the actions that are currently supported, see 5.2.1, “The clmgr action commands”
on page 104.

For a list, you can use clmgr command with no arguments. See “The clmgr command” on
page 106 and Example 5-10 on page 106.

For a list of object classes that are currently supported, see 5.2.2, “The clmgr object classes”
on page 105.

For a list, use the clmgr command with no arguments. See “The clmgr query command” on
page 107 and Example 5-11 on page 107.

For most of these actions and object classes, abbreviations and aliases are available. These
commands are not case-sensitive. You can find more details about the actions and their
aliases in “The clmgr action commands” on page 104. For more information about object
classes, see “The clmgr object classes” on page 105.

Error messages: At the time of writing, the clmgr error messages referred to clvt. This
issue will be fixed in a future release so that it references clmgr.

The clmgr command


Running the clmgr command with no arguments or with the -h option shows the operations
that you can perform. Example 5-10 shows the output that you see just by using the clmgr
command. You see similar output if you use the -h option. The difference between the clmgr
and clmgr -h commands is that, in the output of the clmgr -h command, the line with the
error message is missing. For more details about the -h option, see “Using help in clmgr” on
page 111.

Example 5-10 Output of the clmgr command with no arguments


# clmgr

ERROR: an invalid operation was requested:

clmgr [-c|-x] [-S] [-v] [-f] [-D] [-l {low|med|high|max}] [-T <ID>] \
[-a {<ATTR#1>,<ATTR#2>,<ATTR#n>,...}] <ACTION> <CLASS> [<NAME>] \
[-h | <ATTR#1>=<VALUE#1> <ATTR#2>=<VALUE#2> <ATTR#n>=<VALUE#n> ...]
clmgr [-c|-x] [-S] [-v] [-f] [-D] [-l {low|med|high|max}] [-T <ID>] \
[-a {<ATTR#1>,<ATTR#2>,<ATTR#n>,...}] -M "
<ACTION> <CLASS> [<NAME>] [<ATTR#1>=<VALUE#1> <ATTR#n>=<VALUE#n> ...]
.
.
."

ACTION={add|modify|delete|query|online|offline|...}
CLASS={cluster|site|node|network|resource_group|...}

clmgr {-h|-?} [-v]


clmgr [-v] help

106 IBM PowerHA SystemMirror 7.1 for AIX


# Available actions for clvt:
add
delete
help
manage
modify
move
offline
online
query
recover
sync
verify
view

The clmgr query command


Running the clmgr command with only the query argument generates a list of the supported
object classes as shown in Example 5-11. You see similar output if you use the -h option. The
difference between the clmgr query and clmgr query -h commands is that, in the output of
the clmgr query -h command, the lines with the object class names are indented. For more
details about the -h option, see “Using help in clmgr” on page 111.

Example 5-11 Output of the clmgr query command


# clmgr query

# Available classes for clvt action "query":


application_controller
application_monitor
cluster
dependency
fallback_timer
file_collection
file_system
interface
log
logical_volume
method
network
node
persistent_ip
physical_volume
resource_group
service_ip
site
smart_assist
snapshot
tape
volume_group

Chapter 5. Configuring a PowerHA cluster 107


The clmgr query cluster command
You use the clmgr query cluster command to obtain detailed information about your cluster.
Example 5-12 show the output from the cluster used in the test environment.

Example 5-12 Output of the clmgr query cluster command


# clmgr query cluster
CLUSTER_NAME="hacmp29_cluster"
CLUSTER_ID="1126895238"
STATE="STABLE"
VERSION="7.1.0.1"
VERSION_NUMBER="12"
EDITION="STANDARD"
CLUSTER_IP=""
REPOSITORY="caa_private0"
SHARED_DISKS=""
UNSYNCED_CHANGES="false"
SECURITY="Standard"
FC_SYNC_INTERVAL="10"
RG_SETTLING_TIME="0"
RG_DIST_POLICY="node"
MAX_EVENT_TIME="180"
MAX_RG_PROCESSING_TIME="180"
SITE_POLICY_FAILURE_ACTION="fallover"
SITE_POLICY_NOTIFY_METHOD=""
DAILY_VERIFICATION="Enabled"
VERIFICATION_NODE="Default"
VERIFICATION_HOUR="0"
VERIFICATION_DEBUGGING="Enabled"
LEVEL=""
ALGORITHM=""
GRACE_PERIOD=""
REFRESH=""
MECHANISM=""
CERTIFICATE=""
PRIVATE_KEY=""

As mentioned previously, most clmgr actions and object classes provide aliases. Another
helpful feature of the clmgr command is the ability to understand abbreviated commands. For
example, the previous command can be shortened as follows:
# clmgr q cl

For more details about the capability of the clmgr command, see 5.2.1, “The clmgr action
commands” on page 104, and 5.2.2, “The clmgr object classes” on page 105. See also the
man pages listed in Appendix D, “The clmgr man page” on page 501.

108 IBM PowerHA SystemMirror 7.1 for AIX


The enhanced search capability
An additional feature of the clmgr command is that it provides an easy search capability with
the query action. Example 5-13 shows how to list all the defined resource groups.

Example 5-13 List of all defined resource groups


# clmgr query rg
rg1
rg2
rg3
rg4
rg5
rg6
#

You can also use more complex search expressions. Example 5-14 shows how you can use
simple regular expression command. In addition, you can search on more than one field, and
only those objects that match all provided searches are displayed.

Example 5-14 Simple regular expression command


# clmgr query rg name=rg[123]
rg1
rg2
rg3
#

The -a option
Some query commands produce a rather long output. You can use the -a (attributes) option
to obtain shorter output and for information about a single value as shown in Example 5-15.
You can also use this option to get information about several values as shown in
Example 5-16.

Example 5-15 List state of the cluster node


munich:/ # clmgr -a state query cluster
STATE="STABLE"
munich:/ #

Example 5-16 shows how to get information about the state and the location of a resource
group. The full output of the query command for the nfsrg resource group is shown in
Example 5-31 on page 123.

Example 5-16 List state and location of a resource group


munich:/ # clmgr -a STATE,Current query rg nfsrg
STATE="ONLINE"
CURRENT_NODE="berlin"
munich:/ #

Chapter 5. Configuring a PowerHA cluster 109


You can also use wildcards for getting information about some values as shown in
Example 5-17.

Example 5-17 The -a option and wildcards


munich:/ # clmgr -a "*NODE*" query rg nfsrg
CURRENT_NODE="berlin"
NODES="berlin munich"
PRIMARYNODES=""
PRIMARYNODES_STATE=""
SECONDARYNODES=""
SECONDARYNODES_STATE=""
NODE_PRIORITY_POLICY="default"
munich:/ #

The -v option
The -v (verbose) option is helpful when used with the query action as shown in
Example 5-18. You use this option almost exclusively in IBM Systems Director to scan the
cluster for information.

Example 5-18 The -v option for query all resource groups


munich:/ # clmgr -a STATE,current -v query rg
STATE="ONLINE"
CURRENT_NODE="munich"

STATE="ONLINE"
CURRENT_NODE="berlin"
munich:/ #

If you do not use the -v option with the query action as shown in Example 5-18, you see an
error message similar to the one in Example 5-19.

Example 5-19 Error message when not using the -v option for query all resource groups
munich:/ # clmgr -a STATE,current query rg

ERROR: a name/label must be provided.

munich:/ #

Returning only one value


You might want only one value returned for a clmgr command. This requirement mainly
happens if you prefer to use the clmgr command in a script and do not like to get the
ATTR="VALUE" format. You only need the VALUE. Example 5-20 shows how you can ensure
that only one value is returned.

The command has the following syntax:


clmgr -cSa <ATTR> query <CLASS> <OBJECT>

Example 5-20 The command to return a single value from the clmgr command
# clmgr -cSa state query rg rg1
ONLINE
#

110 IBM PowerHA SystemMirror 7.1 for AIX


5.2.4 Using help in clmgr
You can use the -h option in combination with actions and object classes. For example, if you
want to know how to add a resource group to an existing cluster, you can use the clmgr add
resource_group -h command. Example 5-21 shows the output of using this command. For an
example of using the clmgr add resource_group command, see Example 5-28 on page 121.

Example 5-21 Help for adding resource group using the clmgr command
# clmgr add resource_group -h
# Available options for "clvt add resource_group":
<RESOURCE_GROUP_NAME>
NODES
PRIMARYNODES
SECONDARYNODES
FALLOVER
FALLBACK
STARTUP
FALLBACK_AT
SERVICE_LABEL
APPLICATIONS
VOLUME_GROUP
FORCED_VARYON
VG_AUTO_IMPORT
FILESYSTEM
FSCHECK_TOOL
RECOVERY_METHOD
FS_BEFORE_IPADDR
EXPORT_FILESYSTEM
EXPORT_FILESYSTEM_V4
MOUNT_FILESYSTEM
STABLE_STORAGE_PATH
WPAR_NAME
NFS_NETWORK
SHARED_TAPE_RESOURCES
DISK
AIX_FAST_CONNECT_SERVICES
COMMUNICATION_LINKS
WLM_PRIMARY
WLM_SECONDARY
MISC_DATA
CONCURRENT_VOLUME_GROUP
NODE_PRIORITY_POLICY
NODE_PRIORITY_POLICY_SCRIPT
NODE_PRIORITY_POLICY_TIMEOUT
SITE_POLICY

Object class names between the angle brackets (<>) are required information, which does not
mean that all the other items are optional. Some items might not be marked because of other
dependencies. In Example 5-22 on page 112, only CLUSTER_NAME is listed as required, but
because of the new CAA dependency, the REPOSITORY (disk) is also required. For more
details about how to create a cluster using the clmgr command, see “Configuring a new
cluster using the clmgr command” on page 113.

Chapter 5. Configuring a PowerHA cluster 111


Example 5-22 Help for creating a cluster
# clmgr add cluster -h
# Available options for "clvt add cluster":
<CLUSTER_NAME>
FC_SYNC_INTERVAL
NODES
REPOSITORY
SHARED_DISKS
CLUSTER_IP
RG_SETTLING_TIME
RG_DIST_POLICY
MAX_EVENT_TIME
MAX_RG_PROCESSING_TIME
SITE_POLICY_FAILURE_ACTION
SITE_POLICY_NOTIFY_METHOD
DAILY_VERIFICATION
VERIFICATION_NODE
VERIFICATION_HOUR
VERIFICATION_DEBUGGING

5.2.5 Configuring a PowerHA cluster using the clmgr command


In this section, you configure the two-node mutual takeover cluster with a focus on the
PowerHA configuration only. The system names are munich and berlin. This task does not
include the preliminary steps, which include setting up the IP interfaces and the shared disks.
For details and an example of the output, see “Starting the cluster using the clmgr command”
on page 127. All the steps in the referenced section were executed on the munich system.

To configure a PowerHA cluster by using the clmgr command, follow these steps:
1. Configure the cluster:
# clmgr add cluster de_cluster NODES=munich,berlin REPOSITORY=hdisk4
For details, see “Configuring a new cluster using the clmgr command” on page 113.
2. Configure the service IP addresses:
# clmgr add service_ip alleman NETWORK=net_ether_01 NETMASK=255.255.255.0
# clmgr add service_ip german NETWORK=net_ether_01 NETMASK=255.255.255.0
For details, see “Defining the service address using the clmgr command” on page 118.
3. Configure the application server:
# clmgr add application_controller http_app \
> STARTSCRIPT="/usr/IBM/HTTPServer/bin/apachectl -k start" \
> STOPSCRIPT="/usr/IBM/HTTPServer/bin/apachectl -k stop"
For details, see “Defining the application server using the clmgr command” on page 120.
4. Configure a resource group:
# clmgr add resource_group httprg VOLUME_GROUP=httpvg NODES=munich,berlin \
> SERVICE_LABEL=alleman APPLICATIONS=http_app

# clmgr add resource_group nfsrg VOLUME_GROUP=nfsvg NODES=berlin,munich \


> SERVICE_LABEL=german FALLBACK=NFB RECOVERY_METHOD=parallel \
> FS_BEFORE_IPADDR=true EXPORT_FILESYSTEM="/nfsdir" \
> MOUNT_FILESYSTEM="/sap;/nfsdir"

112 IBM PowerHA SystemMirror 7.1 for AIX


For details, see “Defining the resource group using the clmgr command” on page 120.
5. Sync the cluster:
clmgr sync cluster
For details, see “Synchronizing the cluster definitions by using the clmgr command” on
page 124.
6. Start the cluster:
clmgr online cluster start_cluster BROADCAST=false CLINFO=true
For details, see “Starting the cluster using the clmgr command” on page 127.

Command and syntax of clmgr: To ensure a robust and easy-to-use SMIT interface,
when using the clmgr command or CLI to configure or manage the PowerHA cluster, you
must use the correct command and syntax.

Configuring a new cluster using the clmgr command


Creating a cluster using the clmgr command is similar to using the typical configuration
through SMIT (described in 5.1.3, “Typical configuration of a cluster topology” on page 69). If
you want a method that is similar to the custom configuration in SMIT, you must use a
combination of the classical PowerHA commands and the clmgr command. The steps in the
following sections use the clmgr command only.

Preliminary setup

Prerequisite: In this section, you must know how to set up the prerequisites for a PowerHA
cluster.

The IP interfaces are already defined and the shared volume groups and file systems have
been created. The host names of the two systems are munich and berlin. Figure 5-45 shows
the disks and shared volume groups that are defined so far. hdisk4 is used as the CAA
repository disk.

munich:/ # lspv
hdisk1 00c0f6a012446137 httpvg
hdisk2 00c0f6a01245190c httpvg
hdisk3 00c0f6a012673312 nfsvg
hdisk4 00c0f6a01c784107 None
hdisk0 00c0f6a07c5df729 rootvg active
munich:/ #
Figure 5-45 List of available disks

Chapter 5. Configuring a PowerHA cluster 113


Figure 5-46 shows the network interfaces that are defined on the munich system.

munich:/ # netstat -i
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en0 1500 link#2 a2.4e.58.a0.41.3 23992 0 24516 0 0
en0 1500 192.168.100 munich 23992 0 24516 0 0
en1 1500 link#3 a2.4e.58.a0.41.4 2 0 7 0 0
en1 1500 100.168.200 munichb1 2 0 7 0 0
en2 1500 link#4 a2.4e.58.a0.41.5 4324 0 7 0 0
en2 1500 100.168.220 munichb2 4324 0 7 0 0
lo0 16896 link#1 16039 0 16039 0 0
lo0 16896 127 localhost.locald 16039 0 16039 0 0
lo0 16896 localhost6.localdomain6 16039 0 16039 0 0
munich:/ #
Figure 5-46 Defined network interfaces

Creating the cluster


To begin, define the cluster along with the repository disk. If you do not remember all the
options for creating a cluster with the clmgr command, use the clmgr add cluster -h
command. Example 5-22 on page 112 shows the output of this command.

Before you use the clmgr add cluster command, you must know which disk will be used for
the CAA repository disk. Example 5-23 shows the command and its output.

Table 5-3 provides more details about the command and arguments that are used.

Table 5-3 Creating a cluster using the clmgr command


Action, object class, or Value used Comment
argument

add Basic preferred action

cluster Basic object class used

CLUSTER_NAME de_cluster Optional argument name, but required value

NODES munich, berlin Preferred node to use in the cluster

REPOSITORY hdisk4 The disk for the CAA repository

Example 5-23 Creating a cluster using the clmgr command


munich:/ # clmgr add cluster de_cluster NODES=munich,berlin REPOSITORY=hdisk4
Cluster Name: de_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: None
Cluster IP Address:
There are 2 node(s) and 2 network(s) defined
NODE berlin:
Network net_ether_01
berlinb2 100.168.220.141
berlinb1 100.168.200.141
Network net_ether_010

114 IBM PowerHA SystemMirror 7.1 for AIX


berlin 192.168.101.141
NODE munich:
Network net_ether_01
munichb2 100.168.220.142
munichb1 100.168.200.142
Network net_ether_010
munich 192.168.101.142

No resource groups defined


clharvest_vg: Initializing....
Gathering cluster information, which may take a few minutes...
clharvest_vg: Processing...
Storing the following information in file
/usr/es/sbin/cluster/etc/config/clvg_config

berlin:

Hdisk: hdisk1
PVID: 00c0f6a012446137
VGname: httpvg
VGmajor: 100
Conc-capable: Yes
VGactive: No
Quorum-required:Yes
Hdisk: hdisk2
PVID: 00c0f6a01245190c
VGname: httpvg
VGmajor: 100
Conc-capable: Yes
VGactive: No
Quorum-required:Yes

munich:

Hdisk: hdisk1
PVID: 00c0f6a012446137
VGname: httpvg
VGmajor: 100
Conc-capable: Yes
VGactive: No
Quorum-required:Yes

berlin:

Hdisk: hdisk3
PVID: 00c0f6a012673312
VGname: nfsvg
VGmajor: 200
Conc-capable: Yes
VGactive: No
Quorum-required:Yes

munich:

Hdisk: hdisk2

Chapter 5. Configuring a PowerHA cluster 115


PVID: 00c0f6a01245190c
VGname: httpvg
VGmajor: 100
Conc-capable: Yes
VGactive: No
Quorum-required:Yes

berlin:

Hdisk: hdisk4
PVID: 00c0f6a01c784107
VGname: None
VGmajor: 0
Conc-capable: No
VGactive: No
Quorum-required:No

munich:

Hdisk: hdisk3
PVID: 00c0f6a012673312
VGname: nfsvg
VGmajor: 200
Conc-capable: Yes
VGactive: No
Quorum-required:Yes

berlin:

Hdisk: hdisk0
PVID: 00c0f6a048cf8bfd
VGname: rootvg
VGmajor: 10
Conc-capable: No
VGactive: Yes
Quorum-required:Yes
FREEMAJORS: 35..99,101..199,201...

munich:

Hdisk: hdisk4
PVID: 00c0f6a01c784107
VGname: None
VGmajor: 0
Conc-capable: No
VGactive: No
Quorum-required:No
Hdisk: hdisk0
PVID: 00c0f6a07c5df729
VGname: rootvg
VGmajor: 10
Conc-capable: No
VGactive: Yes
Quorum-required:Yes
FREEMAJORS: 35..99,101..199,201...

116 IBM PowerHA SystemMirror 7.1 for AIX


Cluster Name: de_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: hdisk4
Cluster IP Address:
There are 2 node(s) and 2 network(s) defined
NODE berlin:
Network net_ether_01
berlinb2 100.168.220.141
berlinb1 100.168.200.141
Network net_ether_010
berlin 192.168.101.141
NODE munich:
Network net_ether_01
munichb1 100.168.200.142
munichb2 100.168.220.142
Network net_ether_010
munich 192.168.101.142

No resource groups defined

Warning: There is no cluster found.


cllsclstr: No cluster defined
cllsclstr: Error reading configuration
Communication path berlin discovered a new node. Hostname is berlin. Adding it to
the configuration with

Nodename berlin.
Communication path munich discovered a new node. Hostname is munich. Adding it to
the configuration with

Nodename munich.
Discovering IP Network Connectivity
Discovered [10] interfaces
IP Network Discovery completed normally

Current cluster configuration:

Discovering Volume Group Configuration

Current cluster configuration:

munich:/ #

Chapter 5. Configuring a PowerHA cluster 117


To see the configuration up to this point, you can use the cltopinfo command. Keep in mind
that this information is local to the system on which you are working. Example 5-24 shows the
configuration up to this point.

Example 5-24 Output of the cltopinfo command after creating cluster definitions
munich:/ # cltopinfo
Cluster Name: de_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: hdisk4
Cluster IP Address:
There are 2 node(s) and 2 network(s) defined
NODE berlin:
Network net_ether_01
berlinb2 100.168.220.141
berlinb1 100.168.200.141
Network net_ether_010
berlin 192.168.101.141
NODE munich:
Network net_ether_01
munichb1 100.168.200.142
munichb2 100.168.220.142
Network net_ether_010
munich 192.168.101.142

No resource groups defined


munich:/ #

Defining the service address using the clmgr command


Next you define the service addresses. Example 5-25 on page 119 shows the command and
its output.

The clmgr add cluster command: The clmgr add cluster command automatically runs
discovery on IP and volume group harvesting. It results in adding the IP network interfaces
automatically to the cluster configuration.

Table 5-4 provides more details about the command and arguments that are used.

Table 5-4 Defining the service address using the clmgr command
Action, object class, or Value used Comment
argument

add Basic preferred action

service_ip Basic object class used

SERVICE_IP_NAME alleman Optional argument name, but required value


german

NETWORK net_ether_01 The network name from the cltopinfo command used
previously

118 IBM PowerHA SystemMirror 7.1 for AIX


Action, object class, or Value used Comment
argument

NETMASK 255.255.255.0 Optional; when you specify a value, use the same one
that you used in setting up the interface.

Example 5-25 Defining the service address


munich:/ # clmgr add service_ip alleman NETWORK=net_ether_01 NETMASK=255.255.255.0
munich:/ # clmgr add service_ip german NETWORK=net_ether_01 NETMASK=255.255.255.0
munich:/ #

To check the configuration up to this point, use the cltopinfo command again. Example 5-26
shows the current configuration.

Example 5-26 The cltopinfo output after creating cluster definitions


munich:/ # cltopinfo
Cluster Name: de_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: hdisk4
Cluster IP Address:
There are 2 node(s) and 2 network(s) defined
NODE berlin:
Network net_ether_01
german 10.168.101.141
alleman 10.168.101.142
berlinb1 100.168.200.141
berlinb2 100.168.220.141
Network net_ether_010
berlin 192.168.101.141
NODE munich:
Network net_ether_01
german 10.168.101.141
alleman 10.168.101.142
munichb1 100.168.200.142
munichb2 100.168.220.142
Network net_ether_010
munich 192.168.101.142

No resource groups defined


munich:/ #

Chapter 5. Configuring a PowerHA cluster 119


Defining the application server using the clmgr command
Next you define the application server. Example 5-27 shows the command and its output. The
application is named server http_app.

Table 5-5 provides more details about the command and arguments that are used.

Table 5-5 Defining the application server using the clmgr command
Action, object class, or Value used Comment
argument

add Basic preferred action

application_controller Basic object class used

APPLICATION_SERVER_NAME http_app Optional argument name, but


required value

STARTSCRIPT "/usr/IBM/HTTPServer/bin/ The start script used for the


apachectl -k start" application

STOPSCRIPT "/usr/IBM/HTTPServer/bin/ The stop script used for the


apachectl -k stop" application

Example 5-27 Defining the service address


munich:/ # munich:/ # clmgr add application_controller http_app \
> STARTSCRIPT="/usr/IBM/HTTPServer/bin/apachectl -k start" \
> STOPSCRIPT="/usr/IBM/HTTPServer/bin/apachectl -k stop"
munich:/ #

Defining the resource group using the clmgr command


Next you define the resource groups. Example 5-28 on page 121 shows the commands and
their output.

Compared to the smit functions, by using the clmgr command, you create a resource group
and its resources in one step. Therefore, you must ensure that you have defined all the
service IP addresses and your application servers.

Two resource groups have been created. The first one uses only the items needed for this
resource group (httprg), so that the system used the default values for the remaining
arguments. Table 5-6 provides more details about the command and arguments that are
used.

Table 5-6 Defining the resource groups using the clmgr (httprg) command
action, object class, or Value used comment
argument

add Basic preferred action.

resource_group Basic object class used.

RESOURCE_GROUP_NAME httprg Optional argument name, but required value.

VOLUME_GROUP httpvg The volume group used for this resource group.

NODES munich,berlin The sequence of the nodes is important. The first


node is the primary node.

120 IBM PowerHA SystemMirror 7.1 for AIX


action, object class, or Value used comment
argument

SERVICE_LABEL alleman The service address used for this resource


group.

APPLICATIONS http_app The application server label created in a previous


step.

For the second resource group in the test environment, we specified more details because we
did not want to use the default values (nfsrg). Table 5-7 provides more details about the
command and arguments that we used.

Table 5-7 Defining the resource groups using the clmgr (nfsrg) command
Action, object class, or Value used Comment
argument

add Basic preferred action.

resource_group Basic object class used.

RESOURCE_GROUP_NAME httprg Optional argument name, but required value.

VOLUME_GROUP nfsvg The volume group use for this resource group.

NODES berlin,munich The sequence of the nodes is important. The


first node is the primary node.

SERVICE_LABEL german The service address used for this resource


group.

FALLBACK NFB Never Fall Back (NFB) preferred for this


resource group (the default is FBHPN).

RECOVERY_METHOD parallel Parallel preferred as the recovery method for


this resource group. (The default is sequential.)

FS_BEFORE_IPADDR true Because we want to define an NFS cross


mount, we must use the value true here. (The
default is false.)

EXPORT_FILESYSTEM /nfsdir The file system for NFS to export.

MOUNT_FILESYSTEM "/sap;/nfsdir" Requires the same syntax because we used it in


smit to define the NFS cross mount.

Example 5-28 shows the commands that are used to define the resource groups listed in
Table 5-6 on page 120 and Table 5-7.

Example 5-28 Defining the resource groups


munich:/ # clmgr add resource_group httprg VOLUME_GROUP=httpvg \
> NODES=munich,berlin SERVICE_LABEL=alleman APPLICATIONS=http_app

Auto Discover/Import of Volume Groups was set to true.


Gathering cluster information, which may take a few minutes.
munich:/ #
munich:/ # clmgr add resource_group nfsrg VOLUME_GROUP=nfsvg \
> NODES=berlin,munich SERVICE_LABEL=german FALLBACK=NFB \
> RECOVERY_METHOD=parallel FS_BEFORE_IPADDR=true EXPORT_FILESYSTEM="/nfsdir" \
> MOUNT_FILESYSTEM="/sap;/nfsdir"

Chapter 5. Configuring a PowerHA cluster 121


Auto Discover/Import of Volume Groups was set to true.
Gathering cluster information, which may take a few minutes.
munich:/ #

To see the configuration up to this point, use the clmgr query command. Example 5-29
shows how to check which resource groups you defined.

Example 5-29 Listing the defined resource groups using the clmgr command
munich:/ # clmgr query rg
httprg
nfsrg
munich:/ #

Next, you can see the content that you created for the resource groups. Example 5-30 shows
the content of the httprg. As discussed previously, the default values for this resource group
were used as much as possible.

Example 5-30 Contents listing of httprg


munich:/ # clmgr query rg httprg
NAME="httprg"
STATE="UNKNOWN"
CURRENT_NODE=""
NODES="munich berlin"
PRIMARYNODES=""
PRIMARYNODES_STATE="UNKNOWN"
SECONDARYNODES=""
SECONDARYNODES_STATE="UNKNOWN"
TYPE=""
APPLICATIONS="http_app"
STARTUP="OHN"
FALLOVER="FNPN"
FALLBACK="FBHPN"
NODE_PRIORITY_POLICY="default"
SITE_POLICY="ignore"
DISK=""
VOLUME_GROUP="httpvg"
CONCURRENT_VOLUME_GROUP=""
FORCED_VARYON="false"
FILESYSTEM=""
FSCHECK_TOOL="fsck"
RECOVERY_METHOD="sequential"
EXPORT_FILESYSTEM=""
SHARED_TAPE_RESOURCES=""
AIX_CONNECTIONS_SERVICES=""
AIX_FAST_CONNECT_SERVICES=""
COMMUNICATION_LINKS=""
MOUNT_FILESYSTEM=""
SERVICE_LABEL="alleman"
MISC_DATA=""
SSA_DISK_FENCING="false"
VG_AUTO_IMPORT="false"
INACTIVE_TAKEOVER="false"
CASCADE_WO_FALLBACK="false"

122 IBM PowerHA SystemMirror 7.1 for AIX


FS_BEFORE_IPADDR="false"
NFS_NETWORK=""
MOUNT_ALL_FS="true"
WLM_PRIMARY=""
WLM_SECONDARY=""
FALLBACK_AT=""
RELATIONSHIP=""
SRELATIONSHIP="ignore"
GMD_REP_RESOURCE=""
PPRC_REP_RESOURCE=""
ERCMF_REP_RESOURCE=""
SRDF_REP_RESOURCE=""
TRUCOPY_REP_RESOURCE=""
SVCPPRC_REP_RESOURCE=""
GMVG_REP_RESOURCE=""
EXPORT_FILESYSTEM_V4=""
STABLE_STORAGE_PATH=""
WPAR_NAME=""
VARYON_WITH_MISSING_UPDATES="true"
DATA_DIVERGENCE_RECOVERY="ignore"
munich:/ #

Now you can see the content that was created for the resource groups. Example 5-31 shows
the content of the nfsrg resource group.

Example 5-31 List the content of nfsrg resource group


munich:/ # clmgr query rg nfsrg
NAME="nfsrg"
STATE="UNKNOWN"
CURRENT_NODE=""
NODES="berlin munich"
PRIMARYNODES=""
PRIMARYNODES_STATE="UNKNOWN"
SECONDARYNODES=""
SECONDARYNODES_STATE="UNKNOWN"
TYPE=""
APPLICATIONS=""
STARTUP="OHN"
FALLOVER="FNPN"
FALLBACK="NFB"
NODE_PRIORITY_POLICY="default"
SITE_POLICY="ignore"
DISK=""
VOLUME_GROUP="nfsvg"
CONCURRENT_VOLUME_GROUP=""
FORCED_VARYON="false"
FILESYSTEM=""
FSCHECK_TOOL="fsck"
RECOVERY_METHOD="parallel"
EXPORT_FILESYSTEM="/nfsdir"
SHARED_TAPE_RESOURCES=""
AIX_CONNECTIONS_SERVICES=""
AIX_FAST_CONNECT_SERVICES=""
COMMUNICATION_LINKS=""
MOUNT_FILESYSTEM="/sap;/nfsdir"

Chapter 5. Configuring a PowerHA cluster 123


SERVICE_LABEL="german"
MISC_DATA=""
SSA_DISK_FENCING="false"
VG_AUTO_IMPORT="false"
INACTIVE_TAKEOVER="false"
CASCADE_WO_FALLBACK="false"
FS_BEFORE_IPADDR="true"
NFS_NETWORK=""
MOUNT_ALL_FS="true"
WLM_PRIMARY=""
WLM_SECONDARY=""
FALLBACK_AT=""
RELATIONSHIP=""
SRELATIONSHIP="ignore"
GMD_REP_RESOURCE=""
PPRC_REP_RESOURCE=""
ERCMF_REP_RESOURCE=""
SRDF_REP_RESOURCE=""
TRUCOPY_REP_RESOURCE=""
SVCPPRC_REP_RESOURCE=""
GMVG_REP_RESOURCE=""
EXPORT_FILESYSTEM_V4=""
STABLE_STORAGE_PATH=""
WPAR_NAME=""
VARYON_WITH_MISSING_UPDATES="true"
DATA_DIVERGENCE_RECOVERY="ignore"
munich:/ #

Synchronizing the cluster definitions by using the clmgr command


After you create all topology and resource information, synchronize the cluster.

Verifying and propagating the changes: After using the clmgr command to modify the
cluster configuration, enter the clmgr verify cluster and clmgr sync cluster commands
to verify and propagate the changes to all nodes.

Example 5-32 shows usage of the clmgr sync cluster command to synchronize the cluster
and the command output.

Example 5-32 Synchronizing the cluster using the clmgr sync cluster command
munich:/ # clmgr sync cluster

Verification to be performed on the following:


Cluster Topology
Cluster Resources

Retrieving data from available cluster nodes. This could take a few minutes.

Start data collection on node berlin


Start data collection on node munich
Waiting on node berlin data collection, 15 seconds elapsed
Waiting on node munich data collection, 15 seconds elapsed
Collector on node berlin completed

124 IBM PowerHA SystemMirror 7.1 for AIX


Collector on node munich completed
Data collection complete

Verifying Cluster Topology...

Completed 10 percent of the verification checks

berlin net_ether_010
munich net_ether_010

Completed 20 percent of the verification checks


Completed 30 percent of the verification checks

Verifying Cluster Resources...

Completed 40 percent of the verification checks

http_app httprg
Completed 50 percent of the verification checks
Completed 60 percent of the verification checks
Completed 70 percent of the verification checks
Completed 80 percent of the verification checks
Completed 90 percent of the verification checks
Completed 100 percent of the verification checks

Remember to redo automatic error notification if configuration has changed.

Verification has completed normally.

Committing any changes, as required, to all available nodes...


Adding any necessary PowerHA SystemMirror for AIX entries to /etc/inittab and
/etc/rc.net for IP Address

Takeover on node munich.


Adding any necessary PowerHA SystemMirror for AIX entries to /etc/inittab and
/etc/rc.net for IP Address

Takeover on node berlin.

Verification has completed normally.

WARNING: Multiple communication interfaces are recommended for networks that


use IP aliasing in order to prevent the communication interface from
becoming a single point of failure. There are fewer than the recommended
number of communication interfaces defined on the following node(s) for
the given network(s):

Node: Network:
---------------------------------- ----------------------------------
WARNING: Not all cluster nodes have the same set of HACMP filesets installed.
The following is a list of fileset(s) missing, and the node where the
fileset is missing:

Fileset: Node:
-------------------------------- --------------------------------

Chapter 5. Configuring a PowerHA cluster 125


WARNING: There are IP labels known to HACMP and not listed in file
/usr/es/sbin/cluster/etc/clhosts.client on

node: berlin. Clverify can automatically populate this file to be used on a client
node, if executed in

auto-corrective mode.
WARNING: There are IP labels known to HACMP and not listed in file
/usr/es/sbin/cluster/etc/clhosts.client on

node: munich. Clverify can automatically populate this file to be used on a client
node, if executed in

auto-corrective mode.
WARNING: Network option "nonlocsrcroute" is set to 0 and will be set to 1 on
during HACMP startup on the

following nodes:

berlin
munich

WARNING: Network option "ipsrcrouterecv" is set to 0 and will be set to 1 on


during HACMP startup on the

following nodes:

berlin
munich

WARNING: Application monitors are required for detecting application failures


in order for HACMP to recover from them. Application monitors are started
by HACMP when the resource group in which they participate is activated.
The following application(s), shown with their associated resource group,
do not have an application monitor configured:

Application Server Resource Group


-------------------------------- ---------------------------------
WARNING: Node munich has cluster.es.nfs.rte installed however grace periods are
not fully enabled on this node.

Grace periods must be enabled before NFSv4 stable storage can be used.

HACMP will attempt to fix this opportunistically when acquiring NFS resources on
this node however the change

won't take effect until the next time that nfsd is started.

If this warning persists, the administrator should perform the following steps to
enable grace periods on

126 IBM PowerHA SystemMirror 7.1 for AIX


munich at the next planned downtime:
1. stopsrc -s nfsd
2. smitty nfsgrcperiod
3. startsrc -s nfsd

munich:/ #

When the migration finishes successfully, the CAA repository disk is now defined. Figure 5-47
shows the disks before the cluster synchronization, which are the same as those shown in
Figure 5-45 on page 113.

munich:/ # lspv
hdisk1 00c0f6a012446137 httpvg
hdisk2 00c0f6a01245190c httpvg
hdisk3 00c0f6a012673312 nfsvg
hdisk4 00c0f6a01c784107 None
hdisk0 00c0f6a07c5df729 rootvg active
munich:/ #
Figure 5-47 List of available disks before sync

Figure 5-48 shows the output of the lspv command after the synchronization. In our example,
hdisk4 is now converted into a CAA repository disk and is listed as caa_private0.

munich:/ # lspv
hdisk1 00c0f6a012446137 httpvg
hdisk2 00c0f6a01245190c httpvg
hdisk3 00c0f6a012673312 nfsvg
caa_private0 00c0f6a01c784107 caavg_private active
hdisk0 00c0f6a07c5df729 rootvg active
munich:/ #
Figure 5-48 List of available disks after using the cluster sync command

Starting the cluster using the clmgr command


To determine whether the cluster is configured correctly, test the cluster. To begin, start the
cluster nodes.

Example 5-33 show the command that we used and some of the output from using this
command. To start the clinfo command, we used the CLINFO=true argument. We did not
want a broadcast message. Therefore, we also defined the BROADCAST=false argument.

Example 5-33 Starting the cluster by using the clmgr command


munich:/ # clmgr online cluster start_cluster BROADCAST=false CLINFO=true

Warning: "WHEN" must be specified. Since it was not,


a default of "now" will be used.

Warning: "MANAGE" must be specified. Since it was not,


a default of "auto" will be used.

Chapter 5. Configuring a PowerHA cluster 127


/usr/es/sbin/cluster/diag/cl_ver_alias_topology[42] [[ high = high ]]

--- skipped lines ---

/usr/es/sbin/cluster/diag/cl_ver_alias_topology[335] return 0

WARNING: Multiple communication interfaces are recommended for networks that


use IP aliasing in order to prevent the communication interface from
becoming a single point of failure. There are fewer than the recommended
number of communication interfaces defined on the following node(s) for
the given network(s):

Node: Network:
---------------------------------- ----------------------------------
berlin net_ether_010
munich net_ether_010

WARNING: Network option "nonlocsrcroute" is set to 0 and will be set to 1 on during HACMP
startup on the following nodes:

munich

WARNING: Network option "ipsrcrouterecv" is set to 0 and will be set to 1 on during HACMP
startup on the following nodes:

munich

WARNING: Application monitors are required for detecting application failures


in order for HACMP to recover from them. Application monitors are started
by HACMP when the resource group in which they participate is activated.
The following application(s), shown with their associated resource group,
do not have an application monitor configured:

Application Server Resource Group


-------------------------------- ---------------------------------
http_app httprg
/usr/es/sbin/cluster/diag/clwpardata[23] [[ high == high ]]

--- skipped lines ---

/usr/es/sbin/cluster/diag/clwpardata[325] exit 0
WARNING: Node munich has cluster.es.nfs.rte installed however grace periods are not fully
enabled on this node. Grace periods must be enabled before NFSv4 stable storage can be used.

HACMP will attempt to fix this opportunistically when acquiring NFS resources on this node
however the change won't take effect until the next time that nfsd is started.

If this warning persists, the administrator should perform the following steps to enable grace
periods on munich at the next planned downtime:
1. stopsrc -s nfsd
2. smitty nfsgrcperiod
3. startsrc -s nfsd

berlin: start_cluster: Starting PowerHA SystemMirror

128 IBM PowerHA SystemMirror 7.1 for AIX


berlin: 2359456 - 0:09 syslogd
berlin: Setting routerevalidate to 1
berlin: 0513-059 The clevmgrdES Subsystem has been started. Subsystem PID is 10682520.
berlin: 0513-059 The clinfoES Subsystem has been started. Subsystem PID is 10027062.
munich: start_cluster: Starting PowerHA SystemMirror
munich: 3408044 - 0:07 syslogd
munich: Setting routerevalidate to 1
munich: 0513-059 The clevmgrdES Subsystem has been started. Subsystem PID is 5505122.
munich: 0513-059 The clinfoES Subsystem has been started. Subsystem PID is 6029442.

The cluster is now online.

munich:/ #

Starting all nodes in a cluster: The clmgr online cluster start_cluster command
starts all nodes in a cluster by default.

Example 5-49 shows that all nodes are now up and running.

clstat - HACMP Cluster Status Monitor


-------------------------------------

Cluster: de_cluster (1126819374)


Wed Oct 13 17:27:30 EDT 2010
State: UP Nodes: 2
SubState: STABLE

Node: berlin State: UP


Interface: berlinb1 (0) Address: 100.168.200.141
State: UP
Interface: berlinb2 (0) Address: 100.168.220.141
State: UP
Interface: berlin (1) Address: 192.168.101.141
State: UP
Interface: german (0) Address: 10.168.101.141
State: UP
Resource Group: nfsrg State: On line

Node: munich State: UP


Interface: munichb1 (0) Address: 100.168.200.142
State: UP
Interface: munichb2 (0) Address: 100.168.220.142
State: UP
Interface: munich (1) Address: 192.168.101.142
State: UP
Interface: alleman (0) Address: 10.168.101.142
State: UP
Resource Group: httprg State: On line

************************ f/forward, b/back, r/refresh, q/quit *****************


Figure 5-49 Output of the clstat -a command showing that all nodes are running

Chapter 5. Configuring a PowerHA cluster 129


5.2.6 Alternative output formats for the clmgr command
All of the previous examples use the ATTR="VALUE" format. However, two other formats are
supported. One format is colon-delimited (by using -c). The other format is simple XML (by
using -x).

Colon-delimited format
When using the colon-delimited output format (-c), you can use the -S option to silence or
eliminate the header line.

Example 5-34 shows the colon-delimited output format.

Example 5-34 The colon-delimited output format


# clmgr query ac appctl1
NAME="appctl1"
MONITORS=""
STARTSCRIPT="/bin/hostname"
STOPSCRIPT="/bin/hostname"

# clmgr -c query ac appctl1


# NAME:MONITORS:STARTSCRIPT:STOPSCRIPT
appctl1::/bin/hostname:/bin/hostname

# clmgr -cS query ac appctl1


appctl1::/bin/hostname:/bin/hostname

Simple XML format


Example 5-35 shows the simple XML-based output format.

Example 5-35 Simple XML-based output format


# clmgr -x query ac appctl1
<APPLICATION_CONTROLLERS>
<APPLICATION_CONTROLLER>
<NAME>appctl1</NAME>
<MONITORS></MONITORS>
<STARTSCRIPT>/bin/hostname</STARTSCRIPT>
<STOPSCRIPT>/bin/hostname</STOPSCRIPT>
</APPLICATION_CONTROLLER>
</APPLICATION_CONTROLLERS>

5.2.7 Log file of the clmgr command


The traditional PowerHA practice of setting VERBOSE_LOGGING to produce debug output is
supported with the clmgr command. You can also set VERBOSE_LOGGING on a per-run
basis with the clmgr -l command. The -l flag has the following options:
low Typically of interest to support personnel; shows simple function entry and exit.
med Typically of interest to support personnel; shows the same information as the low
option, but includes input parameters and return codes.
high The recommended setting for customer use; turns on set -x in scripts (equivalent
to VERBOSE_LOGGING=high) but leaves out internal utility functions.

130 IBM PowerHA SystemMirror 7.1 for AIX


max Turns on everything that the high option does and omits nothing. Is likely to make
debugging more difficult because of the volume of output that is produced.

Attention: The max value might have a negative impact on performance.

The main log file for clmgr debugging is the /var/hacmp/log/clutils.log file. This log file
includes all standard error and output from each command.

The return codes used by the clmgr command are standard for all commands:
RC_UNKNOWN=-1 A result is not known. It is useful as an initializer.
RC_SUCCESS=0 No errors were detected; the operation seems to have
been successful.
RC_ERROR=1 A general error has occurred.
RC_NOT_FOUND=2 A specified resource does not exist or could not be
found.
RC_MISSING_INPUT=3 Some required input was missing.
RC_INCORRECT_INPUT=4 Some detected input was incorrect.
RC_MISSING_DEPENDENCY=5 A required dependency does not exist.
RC_SEARCH_FAILED=6 A specified search failed to match any data.

Example 5-36 lists the format of the trace information in the clutils.log file.

Example 5-36 The trace information in the clutils.log file


<SENTINEL>:<RETURN_CODE>:<FILE>:<FUNCTION>[<LINENO>](<ELAPSED_TIME>):
<TRANSACTION_ID>:<PID>:<PPID>: <SCRIPT_LINE>

The following line shows an example of how the clutils.log file might be displayed:
CLMGR:0:resource_common:SerializeAsAssociativeArray()[537](0.704):13327:9765002:90
44114: unset 'array[AIX_LEVEL0]'

Example 5-37 shows some lines from the clutils.log file (not using trace).

Example 5-37 The clutils.log file


CLMGR STARTED (243:7667890:9437234): Wed Oct 6 23:51:22 CDT 2010
CLMGR USER (243:7667890:9437234): ::root:system
CLMGR COMMAND (243:7012392:7667890): clmgr -T 243 modify cluster
hacmp2728_cluster REPOSITORY=hdisk2
CLMGR ACTUAL (243:7012392:7667890): modify_cluster properties hdisk2
CLMGR RETURN (243:7012392:7667890): 0
CLMGR STDERR -- BEGIN (243:7667890:9437234): Wed Oct 6 23:51:26 CDT 2010

Current cluster configuration:

CLMGR STDERR -- END (243:7667890:9437234): Wed Oct 6 23:51:26 CDT 2010


CLMGR ENDED (243:7667890:9437234): Wed Oct 6 23:51:26 CDT 2010
CLMGR ELAPSED (243:7667890:9437234): 3.720

Chapter 5. Configuring a PowerHA cluster 131


5.2.8 Displaying the log file content by using the clmgr command
You can use the clmgr action view command to view the log content.

Defining the number of lines returned


By using the TAIL argument, you can define the number of clmgr command-related lines that
are returned from the clutils.log file. Example 5-38 shows how you can specify 1000 lines
of clmgr log information.

Example 5-38 Using the TAIL argument when viewing the content of the clmgr log file
# clmgr view log clutils.log TAIL=1000 | wc -l
1000
#

Filtering special items by using the FILTER argument


You can use the FILTER argument to filter special items that you are looking for.
Example 5-39 shows how to list just the last 10 clmgr commands that were run.

Example 5-39 Listing the last 10 clmgr commands


# clmgr view log clutils.log TAIL=10 FILTER="CLMGR COMMAND"
CLMGR COMMAND (12198:13828308:15138846): clmgr -T 12198 add
application_controller appctl1 start=/bin/hostname stop=/bin/hostname
CLMGR COMMAND (2629:15138850:17891482): clmgr -T 2629 query
application_controller appctl1
CLMGR COMMAND (4446:19464210:17891482): clmgr -c -T 4446 query
application_controller appctl1
CLMGR COMMAND (23101:19464214:17891482): clmgr -c -S -T 23101 query
application_controller appctl1
CLMGR COMMAND (24919:17826012:17891482): clmgr -x -T 24919 query
application_controller appctl1
CLMGR COMMAND (464:14352476:15138926): clmgr -T 464 view log clutils.log
CLMGR COMMAND (18211:15728818:15138928): clmgr -T 18211 view log clutils.log
CLMGR COMMAND (10884:13828210:14156024): clmgr -T 10884 view log clutils.log
CLMGR COMMAND (28631:17629296:14156026): clmgr -T 28631 view log clutils.log
CLMGR COMMAND (19061:17825922:14156028): clmgr -T 19061 view log clutils.log
TAIL=1000
#

Example 5-40 shows how to list the last five clmgr query commands that were run.

Example 5-40 Listing the last five clmgr query commands


# clmgr view log clutils.log TAIL= FILTER="CLMGR COMMAND",query
CLMGR COMMAND (9047:17825980:17891482): clmgr -x -T 9047 query resource_group rg1
CLMGR COMMAND (2629:15138850:17891482): clmgr -T 2629 query
application_controller appctl1
CLMGR COMMAND (4446:19464210:17891482): clmgr -c -T 4446 query
application_controller appctl1
CLMGR COMMAND (23101:19464214:17891482): clmgr -c -S -T 23101 query
application_controller appctl1
CLMGR COMMAND (24919:17826012:17891482): clmgr -x -T 24919 query
application_controller appctl1
#

132 IBM PowerHA SystemMirror 7.1 for AIX


5.3 PowerHA SystemMirror for IBM Systems Director
Using the web browser graphical user interface makes it easy to complete the configuration
and management tasks by mouse clicks. For example, you can easily create a cluster, verify
and synchronize a cluster, and add nodes to a cluster)

Director client agent of PowerHA SystemMirror is installed on cluster nodes in the same
manner as PowerHA SystemMirror itself by using the installp command. The Director
server and PowerHA server plug-in installation require a separate effort. You must download
them from the external website and manually install them on a dedicated system. This system
does not have to be a PowerHA system.

To learn about installing the Systems Director and PowerHA components, and their use for
configuration and management tasks, see Chapter 12, “Creating and managing a cluster
using IBM Systems Director” on page 333.

Chapter 5. Configuring a PowerHA cluster 133


134 IBM PowerHA SystemMirror 7.1 for AIX
6

Chapter 6. IBM PowerHA SystemMirror


Smart Assist for DB2
PowerHA SystemMirror Smart Assist for DB2 is included in the base Standard Edition
software. It simplifies and minimizes the time and effort of making a non-DPF DB2 database
highly available. The Smart Assist automatically discovers DB2 instances and databases and
creates start and stop scripts for the instances. The Smart Assist also creates process and
custom PowerHA application monitors that help to keep the DB2 instances highly available.

This chapter explains how to configure a hot standby two-node IBM PowerHA SystemMirror
7.1 cluster using the Smart Assist for DB2. The lab cluster korea is used for the examples with
the participating nodes seoul and busan.

This chapter includes the following topics:


 Prerequisites
 Implementing a PowerHA SystemMirror cluster and Smart Assist for DB2 7.1

© Copyright IBM Corp. 2011. All rights reserved. 135


6.1 Prerequisites
This section describes the prerequisites for the Smart Assist implementation.

6.1.1 Installing the required file sets


You must install two additional file sets, as shown in Example 6-1, before using Smart Assist
for DB2.

Example 6-1 Additional file sets required for installing Smart Assist
seoul:/ # clcmd lslpp -l cluster.es.assist.common cluster.es.assist.db2
-------------------------------
NODE seoul
-------------------------------
Fileset Level State Description
----------------------------------------------------------------------------
Path: /usr/lib/objrepos
cluster.es.assist.common 7.1.0.1 COMMITTED PowerHA SystemMirror Smart
Assist Common Files
cluster.es.assist.db2 7.1.0.1 COMMITTED PowerHA SystemMirror Smart
Assist for DB2
-------------------------------
NODE busan
-------------------------------
Fileset Level State Description
----------------------------------------------------------------------------
Path: /usr/lib/objrepos
cluster.es.assist.common 7.1.0.1 COMMITTED PowerHA SystemMirror Smart
Assist Common Files
cluster.es.assist.db2 7.1.0.1 COMMITTED PowerHA SystemMirror Smart
Assist for DB2

6.1.2 Installing DB2 on both nodes


The DB2 versions supported by the PowerHA Smart Assist are versions 8.1, 8.2, 9.1, and 9.5.
For the example in this appendix, DB2 9.5 has been installed on both nodes, seoul and
busan, as shown in Example 6-2.

Example 6-2 DB2 version installed


seoul:/db2/db2pok # db2pd -v

Instance db2pok uses 64 bits and DB2 code release SQL09050


with level identifier 03010107
Informational tokens are DB2 v9.5.0.0, s071001, AIX6495, Fix Pack 0.

136 IBM PowerHA SystemMirror 7.1 for AIX


6.1.3 Importing the shared volume group and file systems
The storage must be accessible from both nodes with the logical volume structures created
and imported on both sides. If the volume groups are not imported on the secondary node,
Smart Assist for DB2 does it automatically as shown in Example 6-3.

Example 6-3 Volume groups imported in the nodes


seoul:/db2/db2pok # clcmd lspv
-------------------------------
NODE seoul
-------------------------------
hdisk0 00c0f6a088a155eb rootvg active
caa_private0 00c0f6a01077342f caavg_private active
cldisk2 00c0f6a0107734ea pokvg
cldisk1 00c0f6a010773532 pokvg

-------------------------------
NODE busan
-------------------------------
hdisk0 00c0f6a089390270 rootvg active
caa_private0 00c0f6a01077342f caavg_private active
cldisk2 00c0f6a0107734ea pokvg
cldisk1 00c0f6a010773532 pokvg

6.1.4 Creating the DB2 instance and database on the shared volume group
Before launching the PowerHA Smart Assist for DB2, you must have already created the DB2
instance and DB2 database over the volume groups that are shared by both nodes.

In Example 6-4, the home for the POK database was created in the /db2/POK/db2pok shared
file system of the volume group pokvg. The instance was created in the /db2/db2pok shared
file system, which is the home directory for user db2pok. The instance was created in the
primary node only as far as the structures are created over a shared volume group.

Example 6-4 Displaying the logical volume groups of pokvg


seoul:/ # lsvg -l pokvg
pokvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
loglv001 jfs2log 1 1 1 open/syncd N/A
poklv001 jfs2 96 96 1 open/syncd /db2/POK/db2pok
poklv002 jfs2 192 192 2 open/syncd /db2/POK/sapdata1
poklv003 jfs2 32 32 1 open/syncd /db2/POK/sapdatat1
poklv004 jfs2 48 48 1 open/syncd /db2/POK/log_dir
poklv005 jfs2 64 64 1 open/syncd /export/sapmnt/POK
poklv006 jfs2 64 64 1 open/syncd /export/usr/sap/trans
poklv008 jfs2 32 32 1 open/syncd /usr/sap/POK
poklv009 jfs2 4 4 1 open/syncd /db2/POK/db2dump
poklv007 jfs2 32 32 1 open/syncd /db2/db2pok

seoul:/ # clcmd grep db2pok /etc/passwd


-------------------------------
NODE seoul
-------------------------------

Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2 137


db2pok:!:203:101::/db2/db2pok:/usr/bin/ksh

-------------------------------
NODE busan
-------------------------------
db2pok:!:203:101::/db2/db2pok:/usr/bin/ksh

seoul:/ # /opt/IBM/db2/V9.5/instance/db2icrt -a SERVER -s ese -u db2fenc1 -p


db2c_db2pok db2pok

seoul:/ # su - db2pok

seoul:/db2/db2pok # ls -ld sqllib


drwxrwsr-t 19 db2pok db2iadm1 4096 Sep 21 13:12 sqllib

seoul:/db2/db2pok # db2start

seoul:/db2/db2pok # db2 "create database pok on /db2/POK/db2pok CATALOG TABLESPACE


managed by database using (file '/db2/POK/sapdata1/catalog.tbs' 100000) EXTENTSIZE
4 PREFETCHSIZE 4 USER TABLESPACE managed by database using (file
'/db2/POK/sapdata1/sapdata.tbs' 500000) EXTENTSIZE 4 PREFETCHSIZE 4 TEMPORARY
TABLESPACE managed by database using (file '/db2/POK/sapdatat1/temp.tbs' 200000)
EXTENTSIZE 4 PREFETCHSIZE 4"

seoul:/db2/db2pok # db2 list db directory


System Database Directory
Number of entries in the directory = 1
Database 1 entry:
Database alias = POK
Database name = POK
Local database directory = /db2/POK/db2pok
Database release level = c.00
Comment =
Directory entry type = Indirect
Catalog database partition number = 0
Alternate server hostname =
Alternate server port number =

seoul:/db2/db2pok # db2 update db cfg for pok using NEWLOGPATH /db2/POK/log_dir

seoul:/db2/db2pok # db2 update db cfg for pok using LOGRETAIN on

seoul:/db2/db2pok # db2 backup db pok to /tmp

seoul:/db2/db2pok # db2stop; db2start

seoul:/db2/db2pok # db2 connect to pok

Database Connection Information

Database server = DB2/AIX64 9.5.0


SQL authorization ID = DB2POK
Local database alias = POK

138 IBM PowerHA SystemMirror 7.1 for AIX


seoul:/db2/db2pok # db2 connect reset
DB20000I The SQL command completed successfully.

Non-DPF database support: Smart Assist for DB2 supports only non-DPF databases.

6.1.5 Updating the /etc/services file on the secondary node


When the instance is created on the primary node, the /etc/services file is updated with
information for DB2 use. You must also add these lines to the /etc/services file on the
secondary node as in the following example:
db2c_db2pok 50000/tcp
DB2_db2pok 60000/tcp
DB2_db2pok_1 60001/tcp
DB2_db2pok_2 60002/tcp
DB2_db2pok_END 60003/tcp

6.1.6 Configuring IBM PowerHA SystemMirror


You must configure the topology of the PowerHA cluster before using Smart Assist for DB2. In
Example 6-5, the cluster korea was configured with two Ethernet interfaces in each node.

Example 6-5 Cluster korea configuration


seoul:/ # cllsif
busan-b2 boot net_ether_01 ether public busan 192.168.201.144 en2 255.255.252.0 22
busan-b1 boot net_ether_01 ether public busan 192.168.101.144 en0 255.255.252.0 22
poksap-db service net_ether_01 ether public busan 10.168.101.143 255.255.252.0 22
seoul-b1 boot net_ether_01 ether public seoul 192.168.101.143 en0 255.255.252.0 22
seoul-b2 boot net_ether_01 ether public seoul 192.168.201.143 en2 255.255.252.0 22
poksap-db service net_ether_01 ether public seoul 10.168.101.143 255.255.252.0 22

6.2 Implementing a PowerHA SystemMirror cluster and Smart


Assist for DB2 7.1
This section explains the preliminary steps that are required before you start Smart Assist for
DB2. Then it explains how to start Smart Assist for DB2.

6.2.1 Preliminary steps


Before starting Smart Assist for DB2, complete the following steps:
1. Stop the PowerHA cluster services on both nodes by issuing the lssrc -ls clstrmgrES
command on both nodes as shown in Example 6-6 on page 140. A ST_INIT state indicates
that the cluster services are stopped. The shared volume group is active, with file systems
mounted, on the node where Smart Assist for DB2 is going to be installed.

Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2 139


Example 6-6 Checking for PowerHA stopped cluster services
seoul:/ # lssrc -ls clstrmgrES
Current state: ST_INIT
sccsid = "$Header: @(#) 61haes_r710_integration/13
43haes/usr/sbin/cluster/hacmprd/main.C, hacmp, 61haes_r710, 1034A_61haes_r710
2010-08-19T1
0:34:17-05:00$"

busan:/ # lssrc -ls clstrmgrES


Current state: ST_INIT
sccsid = "$Header: @(#) 61haes_r710_integration/13
43haes/usr/sbin/cluster/hacmprd/main.C, hacmp, 61haes_r710, 1034A_61haes_r710
2010-08-19T1
0:34:17-05:00$"

2. Mount the file systems as shown in Example 6-7 so that Smart Assist for DB2 can
discover the available instances and databases.

Example 6-7 Checking for mounted file systems in node seoul


seoul:/db2/db2pok # lsvg -l pokvg
pokvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
loglv001 jfs2log 1 1 1 open/syncd N/A
poklv001 jfs2 96 96 1 open/syncd /db2/POK/db2pok
poklv002 jfs2 192 192 2 open/syncd /db2/POK/sapdata1
poklv003 jfs2 32 32 1 open/syncd /db2/POK/sapdatat1
poklv004 jfs2 48 48 1 open/syncd /db2/POK/log_dir
poklv005 jfs2 64 64 1 open/syncd /export/sapmnt/POK
poklv006 jfs2 64 64 1 open/syncd /export/usr/sap/trans
poklv008 jfs2 32 32 1 open/syncd /usr/sap/POK
poklv009 jfs2 4 4 1 open/syncd /db2/POK/db2dump
poklv007 jfs2 32 32 1 open/syncd /db2/db2pok

The DB2 instance is active on the node where Smart Assist for DB2 is going to be
executed as shown in Example 6-8.

Example 6-8 Checking for active DB2 instances


seoul:/ # su - db2pok

seoul:/db2/db2pok # db2ilist
db2pok

seoul:/db2/db2pok # db2start
09/24/2010 11:38:53 0 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.

seoul:/db2/db2pok # ps -ef | grep db2sysc | grep -v grep


db2pok 15794218 8978496 0 11:38:52 - 0:00 db2sysc 0

seoul:/db2/db2pok # db2pd -
Database Partition 0 -- Active -- Up 0 days 00:00:10

140 IBM PowerHA SystemMirror 7.1 for AIX


3. After the instance is running, edit the $INSTHOME/sqllib/db2nodes.cfg file as shown in
Example 6-9 to add the service IP label. This service IP label is going to be used in the
IBM PowerHA resource group. If you edited it before, the database instance will not start
because the service IP label is not configured on the network interface when PowerHA is
down.

Example 6-9 Editing and adding the service IP label to the db2nodes.cfg file
seoul:/ # cat /db2/db2pok/sqllib/db2nodes.cfg
0 poksap-db 0

The .rhosts file (Example 6-10) for the DB2 instance owner has all the base, persistent,
and service addresses. It also has the right permissions.

Example 6-10 Checking the .rhosts file


seoul:/ # cat /db2/db2pok/.rhosts
seoul db2pok
busan db2pok
seoul-b1 db2pok
busan-b1 db2pok
seoul-b2 db2pok
busan-b2 db2pok
poksap-db db2pok

seoul:/db2/db2pok # ls -ld .rhosts


-rw------- 1 db2pok system 107 Oct 4 15:10 .rhosts

4. Find the path for the binary files and then export the variable as shown in Example 6-11.
The DSE_INSTALL_DIR environment variable is exported as a root user with the actual path
for the DB2 binary files. If more than one DB2 version is installed, choose the version that
you to use for your high available instance.

Example 6-11 Finding the DB2 binary files and exporting them
seoul:/db2/db2pok # db2level
DB21085I Instance "db2pok" uses "64" bits and DB2 code release "SQL09050" with
level identifier "03010107".
Informational tokens are "DB2 v9.5.0.0", "s071001", "AIX6495", and Fix Pack
"0".
Product is installed at "/opt/IBM/db2/V9.5".

seoul:/ # export DSE_INSTALL_DIR=/opt/IBM/db2/V9.5

6.2.2 Starting Smart Assist for DB2


After completing the steps in 6.2.1, “Preliminary steps” on page 139, you are ready to start
Smart Assist for DB2 as explained in the following steps:
1. Launch Smart Assist for DB2 by using the path for seoul: smitty sysmirror  Cluster
Applications and Resources  Make Applications Highly Available (Use Smart
Assists)  Add an Application to the PowerHA SystemMirror Configuration.
2. In the Add an Application to the PowerHA SystemMirror Configuration panel, select Select
a Smart Assist From the List of Available Smart Assists.

Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2 141


3. In the Select a Smart Assist From the List of Available Smart Assists panel (Figure 6-1),
select DB2 UDB non-DPF Smart Assist.

Select a Smart Assist From the List of Available Smart Assists

Move cursor to desired item and press Enter.

DB2 UDB non-DPF Smart Assist # busan seoul


DHCP Smart Assist # busan seoul
DNS Smart Assist # busan seoul
Lotus Domino Smart Assist # busan seoul
FileNet P8 Smart Assist # busan seoul
IBM HTTP Server Smart Assist # busan seoul
SAP MaxDB Smart Assist # busan seoul
Oracle Database Smart Assist # busan seoul
Oracle Application Server Smart Assist # busan seoul
Print Subsystem Smart Assist # busan seoul
SAP Smart Assist # busan seoul
Tivoli Directory Server Smart Assist # busan seoul
TSM admin smart assist # busan seoul
TSM client smart assist # busan seoul
TSM server smart assist # busan seoul
WebSphere Smart Assist # busan seoul

F1=Help F2=Refresh F3=Cancel


Esc+8=Image Esc+0=Exit Enter=Do
/=Find n=Find Next
Figure 6-1 Selecting DB2 UDB non-DPF Smart Assist

4. In the Add an Application to the PowerHA SystemMirror Configuration panel, select Select
Configuration Mode.
5. In the Select Configuration Mode panel (Figure 6-2), select Automatic Discovery and
Configuration.

Select Configuration Mode

Move cursor to desired item and press Enter.

Automatic Discovery And Configuration


Manual Configuration

F1=Help F2=Refresh F3=Cancel


Esc+8=Image Esc+0=Exit Enter=Do
/=Find n=Find Next
Figure 6-2 Selecting the configuration mode

142 IBM PowerHA SystemMirror 7.1 for AIX


6. In the Add an Application to the PowerHA SystemMirror Configuration panel, select Select
the Specific Configuration You Wish to Create.
7. In the Select the Specific Configuration You Wish to Create panel (Figure 6-3), select DB2
Single Instance.

Select The Specific Configuration You Wish to Create

Move cursor to desired item and press Enter.

DB2 Single Instance # busan seoul

F1=Help F2=Refresh F3=Cancel


Esc+8=Image Esc+0=Exit Enter=Do
/=Find n=Find Next
Figure 6-3 Selecting the configuration to create

8. Select the DB2 instance name. In this case, only one instance, db2pok, is available as
shown in Figure 6-4.

Select a DB2 Instance

Move cursor to desired item and press Enter.

db2pok

F1=Help F2=Refresh F3=Cancel


Esc+8=Image Esc+0=Exit Enter=Do
/=Find n=Find Next
Figure 6-4 Selecting the DB2 instance name

9. Using the available pick lists (F4), edit the Takeover Node, DB2 Instance Database to
Monitor, and Service IP Label fields as shown in Figure 6-5. Press Enter.

Add a DB2 Highly Available Instance Resource Group

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

* Application Name [DB2_Instance_db2pok]

* DB2 Instance Owning Node [seoul] +


* Takeover Node(s) [busan] +
* DB2 Instance Name db2pok +
* DB2 Instance Database to Monitor POK +
* Service IP Label [poksap-db] +
Figure 6-5 Adding the DB2 high available instance resource group

Tip: You can edit the Application Name field and change it to have a more meaningful
name.

Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2 143


A new PowerHA resource group, called db2pok_ResourceGroup, is created. The volume
group pokvg and the service IP label poksap-db are automatically added to the resource
group as shown in Example 6-12.

Example 6-12 The configured resource group for the DB2 instance
seoul:/ # /usr/es/sbin/cluster/utilities/cllsres
APPLICATIONS="db2pok_ApplicationServer"
FILESYSTEM=""
FORCED_VARYON="false"
FSCHECK_TOOL="logredo"
FS_BEFORE_IPADDR="false"
RECOVERY_METHOD="parallel"
SERVICE_LABEL="poksap-db"
SSA_DISK_FENCING="false"
VG_AUTO_IMPORT="false"
VOLUME_GROUP="pokvg"
USERDEFINED_RESOURCES=""

seoul:/ # /usr/es/sbin/cluster/utilities/cllsgrp
db2pok_ResourceGroup

10.Administrator task: Verify the start and stop scripts that were created for the resource
group.
a. To verify the scripts, use the odmget or cllsserv commands or the SMIT tool as shown
in Example 6-13.

Example 6-13 Verifying the start and stop scripts


busan:/ # odmget HACMPserver
HACMPserver:
name = "db2pok_ApplicationServer"
start = "/usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok"
stop = "/usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok"
min_cpu = 0
desired_cpu = 0
min_mem = 0
desired_mem = 0
use_cod = 0
min_procs = 0
min_procs_frac = 0
desired_procs = 0
desired_procs_frac = 0

seoul:/ # /usr/es/sbin/cluster/utilities/cllsserv
db2pok_ApplicationServer /usr/es/sbin/cluster/sa/db2/sbin/cl_db2start
db2pok /usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok

b. Follow the path on seoul: smitty sysmirror  Cluster Applications and


Resources  Resources  Configure User Applications (Scripts and
Monitors)  Application Controller Scripts  Change/Show Application
Controller Scripts.

144 IBM PowerHA SystemMirror 7.1 for AIX


c. Select the application controller (Figure 6-8) and press Enter.

Select Application Controller

Move cursor to desired item and press Enter.

db2pok_ApplicationServer

F1=Help F2=Refresh F3=Cancel


Esc+8=Image Esc+0=Exit Enter=Do
/=Find n=Find Next
Figure 6-6 Selecting the DB2 application controller

The characteristics of the application controller displayed as shown in Figure 6-7.

Change/Show Application Controller Scripts

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

Application Controller Name db2pok_ApplicationServer


New Name [db2pok_ApplicationServer]
Start Script [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok]
Stop Script [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok]
Application Monitor Name(s) db2pok_SQLMonitor
db2pok_ProcessMonitor
Figure 6-7 Change/Show Application Controller Scripts panel

11.Administrator task: Verify which custom and process application monitors were created by
Smart Assist for DB2. In our example, the application monitors are db2pok_SQLMonitor
and db2pok_ProcessMonitor.
a. Run the following path for seoul: smitty sysmirror  Cluster Applications and
Resources  Resources  Configure User Applications (Scripts and
Monitors)  Application Monitors  Configure Custom Application Monitors 
Change/Show Custom Application Monitor.
b. In the Application Monitor to Change panel (Figure 6-8), select db2pok_SQLMonitor
and press Enter.

Application Monitor to Change

Move cursor to desired item and press Enter.

db2pok_SQLMonitor

F1=Help F2=Refresh F3=Cancel


Esc+8=Image Esc+0=Exit Enter=Do
/=Find n=Find Next
Figure 6-8 Selecting the application monitor to change

Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2 145


c. In the Change/Show Custom Application Monitor panel (Figure 6-9), you see the
attributes of the application monitor.

Change/Show Custom Application Monitor

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

* Monitor Name db2pok_SQLMonitor


Application Controller(s) to Monitor db2pok_ApplicationServer +
* Monitor Mode [Long-running
monitoring] +
* Monitor Method [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2cmon -i db2pok
-A po>
Monitor Interval [120] #
Hung Monitor Signal [9] #
* Stabilization Interval [240] #
Restart Count [3] #
Restart Interval [1440] #
* Action on Application Failure [fallover] +
Notify Method []
Cleanup Method [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok]
Restart Method [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok]
Figure 6-9 Change/Show Custom Application Monitor panel

d. Run the following path for seoul: smitty sysmirror  Cluster Applications and
Resources  Resources  Configure User Applications (Scripts and
Monitors)  Application Monitors  Configure Process Application Monitors 
Change/Show Process Application Monitor.
e. In the Application Monitor to Change panel (Figure 6-10), select
db2pok_ProcessMonitor and press Enter.

Application Monitor to Change

Move cursor to desired item and press Enter.

db2pok_ProcessMonitor

F1=Help F2=Refresh F3=Cancel


Esc+8=Image Esc+0=Exit Enter=Do
/=Find n=Find Next
Figure 6-10 Selecting the application monitor to change

146 IBM PowerHA SystemMirror 7.1 for AIX


In the Change/Show Process Application Monitor panel, you see the attributes of the
application monitor (Figure 6-11).

Change/Show Process Application Monitor


Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
* Monitor Name db2pok_ProcessMonitor
Application Controller(s) to Monitor db2pok_ApplicationServer +
* Monitor Mode [Long-running monitoring] +
* Processes to Monitor [db2sysc]
* Process Owner [db2pok]
Instance Count [1] #
* Stabilization Interval [240] #
* Restart Count [3] #
Restart Interval [1440] #
* Action on Application Failure [fallover] +
Notify Method []
Cleanup Method [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok]
Restart Method [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok]

Figure 6-11 Change/Show Process Application Monitor panel

6.2.3 Completing the configuration


After the Smart Assist for DB2 is started, complete the configuration:
1. Stop the DB2 instance on the primary node as shown in Example 6-14. Keep in mind that
it was active only for the sake of the Smart Assist for DB2 discovery process.

Example 6-14 Stopping the DB2 instance


seoul:/ # su - db2pok

seoul:/db2/db2pok # db2stop
09/24/2010 12:02:56 0 0 SQL1064N DB2STOP processing was successful.
SQL1064N DB2STOP processing was successful.

2. Unmount the shared file systems as shown in Example 6-15.

Example 6-15 Unmounting the shared file systems


seoul:/db2/db2pok # lsvg -l pokvg
pokvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
loglv001 jfs2log 1 1 1 closed/syncd N/A
poklv001 jfs2 96 96 1 closed/syncd /db2/POK/db2pok
poklv002 jfs2 192 192 2 closed/syncd /db2/POK/sapdata1
poklv003 jfs2 32 32 1 closed/syncd /db2/POK/sapdatat1
poklv004 jfs2 48 48 1 closed/syncd /db2/POK/log_dir
poklv005 jfs2 64 64 1 closed/syncd /export/sapmnt/POK
poklv006 jfs2 64 64 1 closed/syncd /export/usr/sap/trans
poklv008 jfs2 32 32 1 closed/syncd /usr/sap/POK
poklv009 jfs2 4 4 1 closed/syncd /db2/POK/db2dump
poklv007 jfs2 32 32 1 closed/syncd /db2/db2pok

Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2 147


3. Deactivate the shared volume group as shown in Example 6-16.

Example 6-16 Deactivating the shared volume group of pokvg


seoul:/ # varyoffvg pokvg

seoul:/ # lsvg -o
caavg_private
rootvg

4. Synchronize the PowerHA cluster by using SMIT:


a. Follow the path smitty sysmirror  Custom Cluster Configuration  Verify and
Synchronize Cluster Configuration (Advanced).
b. In the PowerHA SystemMirror Verification and Synchronization panel (Figure 6-12),
press Enter to accept the default option.

PowerHA SystemMirror Verification and Synchronization

Type or select values in entry fields.


Press Enter AFTER making all desired changes.
[Entry Fields]
* Verify, Synchronize or Both [Both] +
* Include custom verification library checks [Yes] +
* Automatically correct errors found during [Yes] +
verification?
* Force synchronization if verification fails? [No] +
* Verify changes only? [No] +
* Logging [Standard] +
Figure 6-12 Accepting the default actions on the Verification and Synchronization panel

5. Start the cluster on both nodes, seoul and busan, by running smitty clstart.
6. In the Start Cluster Services panel (Figure 6-13 on page 149), complete these steps:
a. For Start now, on system restart or both, select now.
b. For Start Cluster SErvices on these nodes, enter [seoul busan].
c. For Manage Resource Groups, select Automatically.
d. For BROADCAST message at startup, select false.
e. For Startup Cluster Information Daemon, select true.
f. For Ignore verification errors, select false.
g. For Automatically correct errors found during cluster start?, select yes.
h. Press Enter.

148 IBM PowerHA SystemMirror 7.1 for AIX


Start Cluster Services

Type or select values in entry fields.


Press Enter AFTER making all desired changes.
[Entry Fields]
* Start now, on system restart or both now +
Start Cluster Services on these nodes [seoul busan] +
* Manage Resource Groups Automatically +
BROADCAST message at startup? false +
Startup Cluster Information Daemon? true +
Ignore verification errors? false +
Automatically correct errors found during yes +
cluster start?
Figure 6-13 Specifying the options for starting cluster services

Tip: The log file for the Smart Assist is in the /var/hacmp/log/sa.log file. You can use the
clmgr utility to easily view the log, as in the following example:
clmgr view log sa.log

When the PowerHA cluster starts, the DB2 instance is automatically started. The application
monitors start after the defined stabilization interval as shown in Example 6-17.

Example 6-17 Checking the status of the high available cluster and the DB2 instance
seoul:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
db2pok_Resourc ONLINE seoul
OFFLINE busan

seoul:/ # ps -ef | grep /usr/es/sbin/cluster/clappmond | grep -v grep


root 7340184 15728806 0 12:17:53 - 0:00
/usr/es/sbin/cluster/clappmond db2pok1_SQLMonitor
root 11665630 4980958 0 12:17:53 - 0:00
/usr/es/sbin/cluster/clappmond db2pok_ProcessMonitor

seoul:/ # su - db2pok

seoul:/db2/db2pok # db2pd -
Database Partition 0 -- Active -- Up 0 days 00:19:38

Your DB2 instance and database are now configured for high availability in a hot-standby
PowerHA SystemMirror configuration.

Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2 149


150 IBM PowerHA SystemMirror 7.1 for AIX
7

Chapter 7. Migrating to PowerHA 7.1


This chapter includes the following topics for migrating to PowerHA 7.1:
 Considerations before migrating
 Understanding the PowerHA 7.1 migration process
 Snapshot migration
 Rolling migration
 Offline migration

© Copyright IBM Corp. 2011. All rights reserved. 151


7.1 Considerations before migrating
Before migrating your cluster, you must be aware of the following considerations:
 The required software
– AIX
– Virtual I/O Server (VIOS)
 Multicast address
 Repository disk
 FC heartbeat support
 All non-IP networks support removed
– RS232
– TMSCSI
– TMSSA
– Disk heartbeat (DISKHB)
– Multinode disk heartbeat (MNDHB)
 IP networks support removed
– Asynchronous transfer mode (ATM)
– Fiber Distributed Data Interface (FDDI)
– Token ring
 IP Address Takeover (IPAT) via replacement support removed
 Heartbeat over alias support removed
 Site support not available in this version
 IPV6 support not available in this version

You can migrate from High-Availability Cluster Multi-Processing (HACMP) or PowerHA


versions 5.4.1, 5.5, and 6.1 only. If you are running a version earlier than HACMP 5.4.1, you
must upgrade to a newer version first.

TL6: AIX must be at a minimum version of AIX 6.1 TL6 (6.1.6.0) on all nodes before
migration. Use of AIX 6.1 TL6 SP2 or later is preferred.

For more information about migration considerations, see 3.4, “Migration planning” on
page 46.

Only the following migration methods are supported:


 Snapshot migration (as explained in 7.3, “Snapshot migration” on page 161)
 Rolling migration (as explained in 7.4, “Rolling migration” on page 177)
 Offline migration (as explained in 7.5, “Offline migration” on page 191)

Important: A nondisruptive upgrade is not available in PowerHA 7.1, because this version
is the first one to use Cluster Aware AIX (CAA).

152 IBM PowerHA SystemMirror 7.1 for AIX


7.2 Understanding the PowerHA 7.1 migration process
Before you begin a migration, you must understand the migration process and all migration
scenarios. The process is different from the previous versions of PowerHA (HACMP).

With the introduction of PowerHA 7.1, you now use the features of CAA introduced in AIX 6.1
TL6 and AIX 7.1. For more information about the new features of this release, see 2.2, “New
features” on page 24.

The migration process now has two main cluster components: CAA and PowerHA. This
process involves updating your existing PowerHA product and configuring the CAA cluster
component.

7.2.1 Stages of migration


Migrating to PowerHA 7.1 involves the following stages:
 Stage 1: Upgrading to AIX 6.1 TL6 or AIX 7.1
Before you can migrate, you must have working a cluster-aware version of AIX. You can
perform this task as part of a two-stage rolling migration or upgrade to AIX first before you
start the PowerHA migration. This version is required before you can start premigration
checking (stage 2).
 Stage 2: Performing the premigration check (clmigcheck)
During this stage, you use the clmigcheck command to upgrade PowerHA to PowerHA
7.1:
a. Stage 2a: Run the clmigcheck command on the first node to choose Object Data
Manager (ODM) or snapshot. Run it again to choose the repository disk (and optionally
the IP multicast address).
b. Stage 2b: Run the clmigcheck command on each node (including the first node) to see
the “OK to install the new version” message and then upgrade the node to
PowerHA 7.1.

The clmigcheck command: The clmigcheck command automatically creates the CAA
cluster when it is run on the last node.

For a detailed explanation about the clmigcheck process, see 7.2.2, “Premigration
checking: The clmigcheck program” on page 157.

Chapter 7. Migrating to PowerHA 7.1 153


 Stage 3: Upgrading to PowerHA 7.1
After stage 2 is completed, you upgrade to PowerHA 7.1 on the node. Figure 7-1 shows
the state of the cluster in the test environment after updating to PowerHA 7.1 on one node.
Topology services are still active so that the newly migrated PowerHA 7.1 node can
communicate with the previous version, PowerHA 6.1. The CAA configuration has been
completed, but the CAA cluster is not yet created.

Figure 7-1 Mixed version cluster after migrating node 1

 Stage 4: Creating the CAA cluster (last node)


When you are on the last node of the cluster, you create the CAA cluster after running the
clmigcheck command a final time. CAA is required for PowerHA 7.1 to work, making this
task a critical step. Figure 7-2 shows the state of the environment after running the
clmigcheck command on the last node of the cluster, but before completing the migration.

Figure 7-2 Mixed version cluster after migrating node 2

At this stage, the clmigcheck process has run on the last node of the cluster. The CAA
cluster is now created and CAA has established communication with the other node.

154 IBM PowerHA SystemMirror 7.1 for AIX


However, PowerHA is still using the Topology Services (topsvcs) function because the
migration switchover to CAA is not yet completed.
 Stage 5: Starting the migration protocol
As soon as you create the CAA cluster and install PowerHA 7.1, you must start the cluster.
The node_up event checks whether all nodes are running PowerHA 7.1 and starts the
migration protocol. The migration protocol has two phases:
– Phase 1
You call ha_gs_migrate_to_caa_prep(0) to start the migration from groups services to
CAA. Ensure that each node can proceed with the migration.
– Phase 2
During the second phase, you update the DCD and ACD ODM entries in HACMPnode
and HACMPcluster to the latest version. You call ha_gs_migrate_to_caa_commit() to
complete the migration and issue the following command:
/usr/es/sbin/cluster/utilities/clmigcleanup
The clmigcleanup process removes existing non-IP entries from the HACMPnetwork,
HACMPadapter, and HACMPnim ODM entries, such as any diskhb entries. Figure 7-3
shows sections from the clstrmgr.debug log file showing the migration protocol stages.

Migration phase one - extract from clstrmgr.debug


Mon Sep 27 20:22:51 nPhaseCb: First phase of the migration protocol, call
ha_gs_caa_migration_prep()
Mon Sep 27 20:22:51 domainControlCb: Called, state=ST_STABLE
Mon Sep 27 20:22:51 domainControlCb: Notification type: HA_GS_DOMAIN_NOTIFICATION
Mon Sep 27 20:22:51 domainControlCb: HA_GS_MIGRATE_TO_CAA
Mon Sep 27 20:22:51 domainControlCb: Sub-Type: HA_GS_DOMAIN_CAA_MIGRATION_COORD
Mon Sep 27 20:22:51 domainControlCb: reason: HA_GS_VOTE_FOR_MIGRATION
Mon Sep 27 20:22:51 domainControlCb: Called, state=ST_STABLE
Mon Sep 27 20:22:51 domainControlCb: Notification type: HA_GS_DOMAIN_NOTIFICATION
Mon Sep 27 20:22:51 domainControlCb: HA_GS_MIGRATE_TO_CAA
Mon Sep 27 20:22:51 domainControlCb: Sub-Type: HA_GS_DOMAIN_CAA_MIGRATION_APPRVD
Mon Sep 27 20:22:51 domainControlCb: reason: HA_GS_MIGRATE_TO_CAA_PREP_DONE
Mon Sep 27 20:22:51 domainControlCb: Set RsctMigPrepComplete flag
Mon Sep 27 20:22:51 domainControlCb: Voting to CONTINUE with RsctMigrationPrepMsg.

Migration phase two - updating cluster version


Mon Sep 27 20:22:51 DoNodeOdm: Called for DCD HACMPnode class
Mon Sep 27 20:22:51 GetObjects: Called with criteria:
name=chile
Mon Sep 27 20:22:51 DoNodeOdm: Updating DCD HACMPnode stanza with node_id = 1 and
version = 12 for object NAME_SERVER of node chile
Mon Sep 27 20:22:51 DoNodeOdm: Updating DCD HACMPnode stanza with node_id = 1 and
version = 12 for object DEBUG_LEVEL of node chile

Finishing migration - calling clmigcleanup


Mon Sep 27 20:23:51 finishMigrationGrace: resetting MigrationGracePeriod
Mon Sep 27 20:23:51 finishMigrationGrace: Calling ha_gs_migrate_to_caa_commit()
Mon Sep 27 20:23:51 finifhMigration Grace: execute clmigcleanup command

Figure 7-3 Extract from the clstrmgr.debug file showing the migration protocol

Chapter 7. Migrating to PowerHA 7.1 155


 Stage 6: Switching over from Group Services (grpsvcs) to CAA
When migration is complete, switch over the grpsvcs communication function from
topsvcs to the new communication with CAA. The topsvcs function is now inactive, but the
service is still part of Reliable Scalable Cluster Technology (RSCT) and is not removed.

CAA communication: The grpsvcs SRC subsystem is active until you restart the
system. This subsystem is now communicating with CAA and not topsvcs as shown in
Figure 7-4.

Figure 7-4 Switching over Group Services to use CAA

Figure 7-5 shows the services that are running after migration, including cthags.

chile:/ # lssrc -a | grep cluster


clstrmgrES cluster 4391122 active
clevmgrdES cluster 11862228 active

chile:/ # lssrc -a | grep cthags


cthags cthags 7405620 active

chile:/ # lssrc -a | grep caa


cld caa 4063436 active
clcomd caa 3670224 active
solid caa 7864338 active
clconfd caa 5505178 active
solidhac caa 7471164 active
Figure 7-5 Services running after migration

156 IBM PowerHA SystemMirror 7.1 for AIX


Table 7-1 shows the changes to the SRC subsystem before and after migration.

Table 7-1 Changes in the SRC subsystems


Older PowerHA PowerHA 7.1 or later

Topology Services topsvcs N/A

Group Services grpsvcs cthags

The clcomdES and clcomd subsystems


When running in a mixed-version cluster, you must handle the changes in the clcomd
subsystem. During a rolling or mixed-cluster situation, you can have two separate instances
of the communication daemon running: clcomd and clcomdES.

clcomd instances: You can have two instances of the clcomd daemon in the cluster, but
never on a given node. After PowerHA 7.1 is installed on a node, the clcomd daemon is
run, and the clcomdES daemon does not exist. AIX 6.1.6.0 and later with a back-level
PowerHA version (before version 7.1) only runs the clcomdES daemon even though the
clcomd daemon exists.

The clcomd daemon uses port 16191, and the clcomdES daemon uses port 6191. When
migration is complete, the clcomdES daemon is removed.

The clcomdES daemon: The clcomdES daemon is removed when the older PowerHA
software version is removed (snapshot migration) or overwritten by the new PowerHA 7.1
version (rolling or offline migration).

7.2.2 Premigration checking: The clmigcheck program


Before starting migration, you must run the clmigcheck program to prepare the cluster for
migration. The clmigcheck program has two functions. First, it validates the current cluster
configuration (by using ODM or snapshot) for migration. If the configuration is not valid, the
clmigcheck program notifies you of any unsupported elements, such as disk heartbeating or
IPAT via replacement. It also indicates any actions that might be required before you can
migrate. Second, this program prepares for the new cluster by obtaining the disk to be used
for the repository disk and multicast address.

Command profile: The clmigcheck command is not a PowerHA command, but the
command is part of bos.cluster and is in the /usr/sbin directory.

Chapter 7. Migrating to PowerHA 7.1 157


High-level overview of the clmigcheck process
Figure 7-6 shows a high-level view of how the clmigcheck program works. The clmigcheck
program must go through several stages to complete the cluster migration.

Figure 7-6 High-level process of the clmigcheck command

The clmigcheck program goes through the following stages:


1. Performing the first initial run
When the clmigcheck program runs, it checks whether it has been run before by looking
for a /var/clmigcheck/clmigcheck.txt file. If this file does not exist, the clmigcheck
program runs and opens the menu shown in Figure 7-8 on page 159.
2. Verifying that the cluster configuration is suitable for migration
From the clmigcheck menu, you can select options 1 or 2 to check your existing ODM or
snapshot configuration to see if your environment is ready for migration.
3. Creating the CAA required configuration
After performing option 1 or 2, choose option 3. Option 3 creates the /var/clmigcheck
/clmigcheck.txt file with the information entered and is copied to all nodes in the cluster.
4. Performing the second run on the first node, or first run on any other node that is not the
first or the last node in the cluster to be migrated
If the clmigcheck program is run again and the clmigcheck.txt file already exists, a
message is returned indicating that you can proceed with the upgrade of PowerHA.

158 IBM PowerHA SystemMirror 7.1 for AIX


5. Verifying whether the last node in the cluster is upgraded
When the clmigcheck program runs, apart from checking for the presence of the
clmigcheck.txt file, it verifies if it is the last node in the cluster to be upgraded. The lslpp
command is run against each node in the cluster to establish whether PowerHA has been
upgraded. If all other nodes are upgraded, this command confirms that this node is the last
node of the cluster and can now create the CAA cluster.

The clmigcheck program uses the mkcluster command and passes the cluster parameters
from the existing PowerHA cluster, along with the repository disk and multicast address (if
applicable). Figure 7-7 shows an example of the mkcluster command being called.

usr/sbin/mkcluster -n newyork -r hdisk1 -m


chile{cle_globid=4},scotland{cle_globid=5},serbia{cle_globid=6}
Figure 7-7 The clmigcheck command calling the mkcluster command

Running the clmigcheck command


Figure 7-8 shows the main clmigcheck panel. You choose option 1 or 2 depending on which
type of migration you want to perform. Option 1 is for a rolling or offline migration. Option 2 is
for a snapshot migration. When you choose either option, a check of the cluster configuration
is performed to verify if the cluster can be migrated. If any problems are detected, a warning
or error message is displayed.

------------[ PowerHA SystemMirror Migration Check ]-------------

Please select one of the following options:

1 = Check ODM configuration.

2 = Check snapshot configuration.

3 = Enter repository disk and multicast IP addresses.

Select one of the above,"x"to exit or "h" for help:


Figure 7-8 The clmigcheck menu

A warning message is displayed for certain unsupported elements, such as disk heartbeat as
shown in Figure 7-9.

------------[ PowerHA SystemMirror Migration Check ]-------------

CONFIG-WARNING: The configuration contains unsupported hardware: Disk


Heartbeat network. The PowerHA network name is net_diskhb_01. This will be
removed from the configuration during the migration
to PowerHA SystemMirror 7.1.

Hit <Enter> to continue


Figure 7-9 The disk heartbeat warning message when running the clmigcheck command

Chapter 7. Migrating to PowerHA 7.1 159


Non-IP networks can be dynamically removed during the migration process by using the
clmigcleanup command. However, other configurations, such as IPAT via replacement,
require manual steps to remove or change them to a supported configuration. After the
changes are made, run clmigcheck again to ensure that the error is resolved.

The second function of the clmigcheck program is to prepare the CAA cluster environment.
This function is performed when you select option 3 (Enter repository disk and multicast IP
addresses) from the menu.

When you select this option, the clmigcheck program stores the information entered in the
/var/clmigcheck/clmigcheck.txt file. This file is also copied to the /var/clmigcheck
directory on all nodes in the cluster. This file contains the physical volume identifier (PVID) of
the repository disk and the chosen multicast address. If PowerHA is allowed to choose a
multicast address automatically, the NULL setting is specified in the file. Figure 7-10 shows
an example of the clmigcheck.txt file.

CLUSTER_TYPE:STANDARD
CLUSTER_REPOSITORY_DISK:000fe40120e16405
CLUSTER_MULTICAST:NULL
Figure 7-10 Contents of the clmigcheck.txt file

Upon running the clmigcheck command, the command checks to see if the clmigcheck.txt
file exists. If the clmigcheck.txt file exists and the node is not the last node in the cluster to
be migrated, the panel shown in Figure 7-11 is displayed. It contains a message indicating
that you can now upgrade to the later level of PowerHA.

------------[ PowerHA SystemMirror Migration Check ]-------------

clmigcheck: This is not the first node or last node clmigcheck was run on.
No further checking is required on this node. You can install the new
version of PowerHA SystemMirror.

Hit <Enter> to continue

-----------------------------------------------------------------------
Figure 7-11 The clmigcheck panel after it has been run once and before the PowerHA upgrade

The clmigcheck program checks the installed version of PowerHA to see if it has been
upgraded. This step is important to determine which node is the last node to be upgraded in
the cluster. If it is the last node in the cluster, then additional configuration operations must be
completed along with creating and activating the CAA cluster.

Important: You must run the clmigcheck program before you upgrade PowerHA. Then
upgrade PowerHA one node at a time, and run the clmigcheck program on the next node
only after you complete the migration on the previous node. If you do not run the
clmigcheck program specifically on the last node, the cluster is still in migration mode
without creating the CAA cluster. For information about how to resolve this situation, see
10.4.7, “The ‘Cluster services are not active’ message” on page 323.

160 IBM PowerHA SystemMirror 7.1 for AIX


After you upgrade PowerHA, if you run the clmigcheck program again, you see an error
message similar to the one shown in Figure 7-12. The message indicates that all migration
steps for this node of the cluster have been completed.

ERROR: This program is intended for PowerHA configurations prior to version 7.1
The version currently installed appears to be: 7.1.0
Figure 7-12 clmigcheck panel after PowerHA has been installed on a node.

Figure 7-13 shows an extract from the /tmp/clmigcheck/clmigcheck.log file that was taken
when the clmigcheck command ran on the last node in a three-node cluster migration. This
file shows the output by the clmigcheck program when checking whether this node is the last
node of the cluster.

ck_lastnode: Getting version of cluster.es.server.rte on node chile

ck_lastnode: lslpp from node (chile) is


/etc/objrepos:cluster.es.server.rte:7.1.
0.1::COMMITTED:F:Base Server Runtime:

ck_lastnode: cluster.es.server.rte on node chile is (7.1.0.1)

ck_lastnode: Getting version of cluster.es.server.rte on node serbia

ck_lastnode: lslpp from node (serbia) is


/etc/objrepos:cluster.es.server.rte:7.1
.0.1::COMMITTED:F:Base Server Runtime:

ck_lastnode: cluster.es.server.rte on node serbia is (7.1.0.1)

ck_lastnode: Getting version of cluster.es.server.rte on node scotland

ck_lastnode: lslpp from node (scotland) is


/etc/objrepos:cluster.es.server.rte:6
.1.0.2::COMMITTED:F:ES Base Server Runtime:

ck_lastnode: cluster.es.server.rte on node scotland is (6.1.0.2)

ck_lastnode: oldnodes = 1

ck_lastnode: This is the last node to run clmigcheck.

clmigcheck: This is the last node to run clmigcheck, create the CAA cluster
Figure 7-13 Extract from clmigcheck.log file showing the lslpp last node checking

7.3 Snapshot migration


To illustrate a snapshot migration, the environment in this scenario entails a two-node AIX
6.1.3 and PowerHA 5.5 SP4 cluster being migrated to AIX 6.1 TL6 and PowerHA 7.1 SP1.
The nodes are IBM POWER6® 550 systems and configured as VIO client partitions. Virtual
devices are used for network and storage configuration.

Chapter 7. Migrating to PowerHA 7.1 161


The network topology consists of one IP network and one non-IP network, which is the disk
heartbeat network. The initial IPAT method is IPAT via replacement, which must be changed
before starting the migration, because PowerHA 7.1 only supports IPAT via aliasing.

Also the environment has one resource group that includes one service IP, two volume
groups, and application monitoring. This environment also has an IBM HTTP server as the
application. Figure 7-14 shows the relevant resource group settings.

Resource Group Name testrg


Participating Node Name(s) algeria brazil
Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node
Fallback Policy Never Fallback
Site Relationship ignore
Node Priority
Service IP Label algeria_svc
Volume Groups algeria_vg brazil_vg

Figure 7-14 Cluster resource group configuration using snapshot migration

7.3.1 Overview of the migration process


A major difference from previous migration versions is the clmigcheck script, which is
mandatory for the migration procedure. As stated in 1.2, “Cluster Aware AIX” on page 7,
PowerHA 7.1 uses CAA for monitoring and event management. By running the clmigcheck
script (option 3), you can specify a repository disk and a multicast address, which are
required for the CAA service.

The snapshot migration method requires all cluster nodes to be offline for some time. It
requires removing previous versions of PowerHA and installing AIX 6.1 TL6 or later and the
new version of PowerHA 7.1.

In this scenario, to begin, PowerHA 5.5 SP4 is on AIX 6.1.3 and migrated to PowerHA 7.1
SP1 on AIX 6.1 TL6. The network topology consists of one IP network using IPAT via
replacement and the disk heartbeat network. Both of these network types are no longer
supported. However, if you have an IPAT via replacement configuration, the clmigcheck script
generates an error message as shown in Figure 7-15. You must remove this configuration to
proceed with the migration.

------------[ PowerHA SystemMirror Migration Check ]-------------

CONFIG-ERROR: The configuration contains unsupported options: IP Address


Takeover via Replacement. The PowerHA network name is net_ether_01.

This will have to be removed from the configuration before


migration to PowerHA SystemMirror

Hit <Enter> to continue


Figure 7-15 The clmigcheck error message for IPAT via replacement

IPAT via replacement configuration: If your cluster has an IPAT via replacement
configuration, remove or change to the IPAT via alias method before starting the migration.

162 IBM PowerHA SystemMirror 7.1 for AIX


7.3.2 Performing a snapshot migration
The next steps are followed to migrate the cluster.

Creating a snapshot
Create a snapshot by entering the smit cm_add_snap.dialog command while your cluster is
running.

Stopping the cluster


Run the smit clstop command on all nodes to take down the cluster. Ensure that the cluster
is down by using the lssrc -ls clstrmgrES command (Figure 7-16) for each node.

# lssrc -ls clstrmgrES


Current state: ST_INIT
sccsid = "$Header: @(#) 61haes_r710_integration/13 43haes/usr/sbin/cluster/hacmprd/main.C,
hacmp, 61haes_r710, 1034A_61haes_r710 2010-08-19T1
0:34:17-05:00$"
Figure 7-16 The lssrc -ls clstrmgrES command to ensure that each cluster is down

Installing AIX 6.1.6 and clmigcheck


To install AIX 6.1.6 and the clmigcheck program, follow these steps:
1. By using the AIX 6.1.6 installation media or TL6 updates, perform a smitty update_all.
2. After updating AIX, check whether the bos.cluster and bos.ahafs file sets are correctly
installed as shown in Figure 7-17. These two file sets are new for the CAA services. You
might need to install them separately.

brazil:/ # lslpp -l |grep bos.cluster


bos.cluster.rte 6.1.6.1 APPLIED Cluster Aware AIX
bos.cluster.solid 6.1.6.1 APPLIED POWER HA Business Resiliency
bos.cluster.rte 6.1.6.1 APPLIED Cluster Aware AIX
bos.cluster.solid 6.1.6.0 COMMITTED POWER HA Business Resiliency
brazil:/ #
Figure 7-17 Verifying additional required file sets

The clcomd subsystem is now part of AIX and requires the fully qualified host names of all
nodes in the cluster to be listed in the /etc/cluster/rhosts file. Because AIX was
updated, a restart is required.
3. Because you updated the AIX image, restart the system before you continue with the next
step.
After restarting the system, you can see the clcomd subsystem from the caa subsystem
group that is up and running. The clcomdES daemon, which is part of PowerHA, is also
running as shown in Figure 7-18.

algeria:/usr/es/sbin/cluster/etc # lssrc -a|grep com


clcomd caa 4128960 active
clcomdES clcomdES 2818102 active
algeria:/usr/es/sbin/cluster/etc #
Figure 7-18 Two clcomd daemons exist

Chapter 7. Migrating to PowerHA 7.1 163


Now AIX 6.1.6 is installed and you ready for the clmigcheck step.
4. Run the clmigcheck command on the first node (algeria). Figure 7-19 shows the
clmigcheck menu.

------------[ PowerHA SystemMirror Migration Check ]-------------

Please select one of the following options:

1 = Check ODM configuration.

2 = Check snapshot configuration.

3 = Enter repository disk and multicast IP addresses.

Select one of the above,"x"to exit or "h" for help:


Figure 7-19 Options on the clmigcheck menu

The clmigcheck menu options: In the clmigcheck menu, option 1 and 2 review the
cluster configurations. Option 3 gathers information that is necessary to create the CAA
cluster during its execution on the last node of the cluster. In option 3, you define a cluster
repository disk and multicast IP address. Selecting option 3 means that you are ready to
start the migration.

In option 3 of the clmigcheck menu, you select two configurations:


 The disk to use for the repository
 The multicast address for internal cluster communication

Option 2: Checking the snapshot configuration


When you choose option 2 from the clmigcheck menu, a prompt is displayed for you to
provide the snapshot file name. The clmigcheck review specifies the snapshot file and shows
an error or warning message if any unsupported elements are discovered.

164 IBM PowerHA SystemMirror 7.1 for AIX


In the test environment, a disk heartbeat network is not supported in PowerHA 7.1. The
warning message from clmigcheck is for the disk heartbeat configuration as Figure 7-20
shows.

------------[ PowerHA SystemMirror Migration Check ]-------------

h = help

Enter snapshot name (in /usr/es/sbin/cluster/snapshots): snapshot_mig

clsnapshot: Removing any existing temporary HACMP ODM entries...


clsnapshot: Creating temporary HACMP ODM object classes...
clsnapshot: Adding HACMP ODM entries to a temporary directory..
clsnapshot: Succeeded generating temporary ODM containing Cluster Snapshot:
snapshot_mig

------------[ PowerHA SystemMirror Migration Check ]-------------

CONFIG-WARNING: The configuration contains unsupported hardware: Disk


Heartbeat network. The PowerHA network name is net_diskhb_01. This will be
removed from the configuration during the migration
to PowerHA SystemMirror 7.1.

Hit <Enter> to continue


Figure 7-20 The clmigcheck warning message for a disk heartbeat configuration

Figure 7-20 shows the warning message “This will be removed from the configuration
during the migration”. Because it is only a warning message, you can continue with the
migration. After completing the migration, verify that the disk heartbeat is removed.

When option 2 of clmigcheck is completed without error, proceed with option 3 as shown in
Figure 7-21.

------------[ PowerHA SystemMirror Migration Check ]-------------

The ODM has no unsupported elements.

Hit <Enter> to continue


Figure 7-21 clmigcheck passed for snapshot configurations

Chapter 7. Migrating to PowerHA 7.1 165


Option 3: Entering the repository disk and multicast IP addresses
In option 3, clmigcheck lists all shared disks on both nodes. In this scenario, hdisk1 is
specified as the repository disk as shown in Figure 7-22.

------------[ PowerHA SystemMirror Migration Check ]-------------

Select the disk to use for the repository

1 = 000fe4114cf8d1ce(hdisk1)
2 = 000fe4114cf8d3a1(hdisk4)
3 = 000fe4114cf8d441(hdisk5)
4 = 000fe4114cf8d4d5(hdisk6)
5 = 000fe4114cf8d579(hdisk7)

Select one of the above or "x" to exit: 1


Figure 7-22 Selecting the repository disk

You can create a NULL entry for the multicast address. Then, AIX generates one such
address as shown in Figure 7-23. Keep this value as the default so that AIX can generate the
multicast address.

------------[ PowerHA SystemMirror Migration Check ]-------------

PowerHA SystemMirror uses multicast address for internal


cluster communication and monitoring. These must be in the
multicast range, 224.0.0.0 - 239.255.255.255.

If you make a NULL entry, AIX will generate an appropriate address for you.
You should only specify an address if you have an explicit reason to do
so, but are cautioned that this address cannot be changed once the
configuration is activated (i.e. migration is complete).

h = help

Enter the multicast IP address to use for network monitoring:


Figure 7-23 Defining a multicast address

166 IBM PowerHA SystemMirror 7.1 for AIX


The clmigcheck process is logged in the /tmp/clmigcheck/clmigcheck.log file (Figure 7-24).

validate_disks: No sites, only one repository disk needed.

validate_disks: Disk 000fe4114cf8d1ce exists

prompt_mcast: Called

prompt_mcast: User entered:

validate_mcast: Called

write_file: Called

write_file: Copying /tmp/clmigcheck/clmigcheck.txt to algeria:/var/clmigcheck/clmigcheck.txt

write_file: Copying /tmp/clmigcheck/clmigcheck.txt to brazil:/var/clmigcheck/clmigcheck.txt


Figure 7-24 /tmp/clmigcheck/clmigcheck.log

The completed clmigcheck program


When the clmigcheck program is completed, it creates a /var/clmigcheck/clmigcheck.txt
file on each node of the cluster. The text file contains a PVID of the repository disk and the
multicast address for the CAA cluster as shown in Figure 7-25.

# cat /var/clmigcheck/clmigcheck.txt
CLUSTER_TYPE:STANDARD
CLUSTER_REPOSITORY_DISK:000fe4114cf8d1ce
CLUSTER_MULTICAST:NULL
Figure 7-25 The /var/clmigcheck/clmigcheck.txt file

When PowerHA 7.1 is installed, this information is used to create the HACMPsircol.odm file as
shown in Figure 7-26. This file is created when you finish restoring the snapshot in this
scenario.

algeria:/ # odmget HACMPsircol

HACMPsircol:
name = "canada_cluster_sircol"
id = 0
uuid = "0"
repository = "000fe4114cf8d1ce"
ip_address = ""
nodelist = "brazil,algeria"
backup_repository1 = ""
backup_repository2 = ""
algeria:/ #
Figure 7-26 The HACMPsircol.odm file

Chapter 7. Migrating to PowerHA 7.1 167


Running clmigcheck on one node: Compared to the rolling migration method, the
snapshot migration method entails running the clmigcheck command on one node. Do not
run the clmigcheck command on another node while you are doing a snapshot migration or
the migration will fail. If you run the clmigcheck command on every node, the CAA cluster
is created upon executing the clmigcheck command on the last node and goes into the
rolling migration phase.

Uninstalling PowerHA SystemMirror 5.5


To uninstall PowerHA SystemMirror 5.5, follow these steps:
1. Run smit install_remove and specify cluster.* from all nodes. Verify this step by
running the following command to show that all PowerHA file sets are removed:
lslpp -l cluster.*
2. Install PowerHA 7.1 by using the following command:
smit install_all
3. Verify that the file sets are installed correctly:
lslpp -l cluster.*

After you install the new PowerHA 7.1 file sets, you can see that the clcomdES daemon has
disappeared. You now have the clcomd daemon, which is part of CAA, instead of the clcomdES
daemon.

Updating the /etc/cluster/rhosts file


After you complete the installation of PowerHA 7.1, update the /etc/cluster/rhosts file:
1. Update the /etc/cluster/rhosts file with the fully qualified domain name of each node in
the cluster. (For example, you might use the output from the hostname command).
2. Restart the clcomd subsystem as shown in Figure 7-27.

algeria:/ # stopsrc -s clcomd


0513-044 The clcomd Subsystem was requested to stop.
algeria:/ # startsrc -s clcomd
0513-059 The clcomd Subsystem has been started. Subsystem PID is 12255420.
algeria:/ #
Figure 7-27 Restarting the clcomd subsystem on both nodes

3. Stop and start the clcomd daemon instead by using the following command:
refresh -s clcomd
4. To verify that the clcomd subsystem is working, use the clrsh command. If it does not
work, correct any problems before proceeding as explained in Chapter 10,
“Troubleshooting PowerHA 7.1” on page 305.

Converting the snapshot


Now convert the snapshot from PowerHA 5.5. On PowerHA 7.1, run the clconvert_snapshot
command before you restore it. (In some older versions of PowerHA, you do not need to run
this command.) While converting the snapshot, the clconvert_snapshot command refers to
the /var/clmigcheck/clmigcheck.txt file and adds the HACMPsircol stanza with the
repository disk and multicast address, which are newly introduced in PowerHA 7.1. After you

168 IBM PowerHA SystemMirror 7.1 for AIX


restore the snapshot, you can see that the HACMPsircol ODM contains this information as
illustrated in Figure 7-26 on page 167.

Restoring a snapshot
To restore a snapshot, follow the path smitty hacmp  Cluster Nodes and Networks 
Manage the Cluster  Snapshot Configuration  Restore the Cluster Configuration
From a Snapshot for restoring a snapshot.

Failure to restore a snapshot


When you restore the snapshot with the default option, an error message about clcomd
communication is displayed. Because there is no configuration, the snapshot fails at the
communication_check function in the clsnapshot program as shown in Figure 7-28.

cllsnode: Error reading configuration


/usr/es/sbin/cluster/utilities/clsnapshot[2127]: apply_CS[116]: communication_check: line 49:
local: not found

Warning: unable to verify inbound clcomd communication from node "algeria" to the local node,
"".

/usr/es/sbin/cluster/utilities/clsnapshot[2127]: apply_CS[116]: communication_check: line 49:


local: not fou
nd

Warning: unable to verify inbound clcomd communication from node "brazil" to the local node,
"".

clsnapshot: Verifying configuration using temporary PowerHA SystemMirror ODM entries...


Cannot get local HACMPnode ODM.
Cannot get local HACMPnode ODM.
FATAL ERROR: CA_invoke_client nodecompath == NULL! @ line: of file: clver_ca_main.c
FATAL ERROR: CA_invoke_client nodecompath == NULL! @ line: of file: clver_ca_main.c
FATAL ERROR: CA_invoke_client nodecompath == NULL! @ line: of file: clver_ca_main.c
FATAL ERROR: CA_invoke_client nodecompath == NULL! @ line: of file: clver_ca_main.c
Figure 7-28 A failed snapshot restoration

Chapter 7. Migrating to PowerHA 7.1 169


If you are at PowerHA 7.1 SP2, you should not see the failure message. However, some error
messages concern the disk heartbeat network (Figure 7-29), which is not supported in
PowerHA 7.1. You can ignore this error message.

clsnapshot: Removing any existing temporary PowerHA SystemMirror ODM entries...


clsnapshot: Creating temporary PowerHA SystemMirror ODM object classes...
clsnapshot: Adding PowerHA SystemMirror ODM entries to a temporary
directory..ODMDIR set to /tmp/snapshot

Error: Network's network type diskhb is not known.


Error: Interface/Label's network type diskhb is not known.
cllsclstr: Error reading configuration
Error: Network's network type diskhb is not known.
Error: Interface/Label's network type diskhb is not known.
cllsnode: Error reading configuration
clodmget: Could not retrieve object for HACMPnode, odm errno 5904
/usr/es/sbin/cluster/utilities/clsnapshot[2139]: apply_CS[125]:
communication_check: line 52: local: not found

Warning: unable to verify inbound clcomd communication from node "algeria" to


the local node, "
".

/usr/es/sbin/cluster/utilities/clsnapshot[2139]: apply_CS[125]:
communication_check: line 52: local: not found

Warning: unable to verify inbound clcomd communication from node "brazil" to


the local node, ""
Figure 7-29 The snapshot restoring the error with the new clsnapshot command

When you finish restoring the snapshot, the CAA cluster is created based on the repository
disk and multicast address based in the /var/clmigcheck/clmigcheck.txt file.

Sometimes the synchronization or verification fails because the snapshot cannot create the
CAA cluster. If you see an error message similar to the one shown in Figure 7-30, look in the
/var/adm/ras/syslog.caa file and correct the problem.

ERROR: Problems encountered creating the cluster in AIX. Use the syslog
facility to see output from the mkcluster command.

ERROR: Creating the cluster in AIX failed. Check output for errors in local
cluster configuration, correct them, and try synchronization again.

ERROR: Updating the cluster in AIX failed. Check output for errors in local
cluster configuration, correct them, and try synchronization again.

cldare: Error detected during synchronization.


Figure 7-30 Failure of CAA creation during synchronization or verification

170 IBM PowerHA SystemMirror 7.1 for AIX


Figure 7-30 on page 170 shows a sample CAA creation failure, which is a clrepos_private1
file system mount point that is used for the CAA service. Assuming you have enabled syslog,
you can easily find it in the syslog.caa file, which you can find by searching on “odmadd
HACMPsircol.add.”

After completing all the steps, check the CAA cluster configuration and status on both nodes.
First, the caavg_private volume group is created and varied on as shown in Figure 7-31.

algeria:/ # lspv
hdisk2 000fe4114cf8d258 algeria_vg
hdisk3 000fe4114cf8d2ec brazil_vg
hdisk8 000fe4114cf8d608 diskhb
caa_private0 000fe40120e16405 caavg_private active
hdisk0 000fe4113f087018 rootvg active
algeria:/ #
Figure 7-31 The caavg_private volume group varied on

Chapter 7. Migrating to PowerHA 7.1 171


From the lscluster command, you can see information about the CAA cluster including the
repository disk, the multicast address, and so on, as shown in Figure 7-32.

algeria:/ # lscluster -m
Calling node query for all nodes
Node query number of nodes examined: 2

Node name: algeria


Cluster shorthand id for node: 1
uuid for node: 0410c158-c6ca-11df-88bc-c21e45bc6603
State of node: UP NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
canada_cluster local e8fbea82-c6c9-11df-b8d6-c21e4a9e5103

Number of points_of_contact for node: 0


Point-of-contact interface & contact state
n/a

------------------------------

Node name: brazil


Cluster shorthand id for node: 2
uuid for node: e8ff0dde-c6c9-11df-b8d6-c21e4a9e5103
State of node: UP
Smoothed rtt to node: 7
Mean Deviation in network rtt to node: 3
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
canada_cluster local e8fbea82-c6c9-11df-b8d6-c21e4a9e5103

Number of points_of_contact for node: 2


Point-of-contact interface & contact state
en1 UP
en0 UP
algeria:/mnt/HA71 # lscluster -c
Cluster query for cluster canada_cluster returns:
Cluster uuid: e8fbea82-c6c9-11df-b8d6-c21e4a9e5103
Number of nodes in cluster = 2
Cluster id for node algeria is 1
Primary IP address for node algeria is 192.168.101.101
Cluster id for node brazil is 2
Primary IP address for node brazil is 192.168.101.102
Number of disks in cluster = 0
Multicast address for cluster is 228.168.101.102
algeria:/mnt/HA71 #
Figure 7-32 The lscluster command after creating the CAA cluster

172 IBM PowerHA SystemMirror 7.1 for AIX


You can also check whether the multicast address is correctly defined for each interface by
running the netstat -a -I en0 command as shown in Figure 7-33.

algeria:/ # netstat -a -I en0


Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en0 1500 link#2 c2.1e.45.bc.66.3 1407667 0 1034372 0 0
01:00:5e:28:65:65
01:00:5e:7f:ff:fd
01:00:5e:00:00:01
en0 1500 192.168.100 algeria 1407667 0 1034372 0 0
228.168.101.101
239.255.255.253
224.0.0.1
en0 1500 10.168.100 algeria_svc 1407667 0 1034372 0 0
228.168.101.101
239.255.255.253
224.0.0.1

algeria:/ # netstat -a -I en1


Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en1 1500 link#3 c2.1e.45.bc.66.4 390595 0 23 0 0
01:00:5e:28:65:65
01:00:5e:7f:ff:fd
01:00:5e:00:00:01
en1 1500 192.168.200 algeria_boot 390595 0 23 0 0
228.168.101.101
239.255.255.253
224.0.0.1
Figure 7-33 The multicast address for CAA service

After the clmigcheck command is done running, you can remove the older version of
PowerHA and install PowerHA 7.1.

Optional: Adding a shared disk to the CAA services


After the migration, the shared volume group is not included in the CAA service as shown in
Figure 7-34.

# lspv
caa_private0 000fe40120e16405 caavg_private active
hdisk2 000fe4114cf8d258 algeria_vg
hdisk3 000fe4114cf8d2ec brazil_vg
hdisk0 000fe4113f087018 rootvg active
#
Figure 7-34 The lspv output after restoring the snapshot

Chapter 7. Migrating to PowerHA 7.1 173


To add the shared volume group disks to the CAA service, run the following command:
chcluster -n <cluster_name> -d +hdiskX, hdiskY

where:
<cluster_name> is canada_cluster.
+hdiskX is +hdisk2.
hdsiskY is hdisk3.

The two shared disks are now included in the CAA shared disk as shown in Figure 7-35.

algeria: # chcluster -n canada_cluster -d +hdisk2,hdisk3


chcluster: Cluster shared disks are automatically renamed to names such as
cldisk1, [cldisk2, ...] on all cluster nodes. However, this cannot
take place while a disk is busy or on a node which is down or not
reachable. If any disks cannot be renamed now, they will be renamed
later by the clconfd daemon, when the node is available and the disks
are not busy.
algeria: #
Figure 7-35 Using the chcluster command for shared disks

Now hdisk2 and hdisk3 are changed to cldisk. The hdisk name from the lspv command
shows the cldiskX instead of the hdiskX as shown in Figure 7-36.

algeria:/ # lspv
caa_private0 000fe40120e16405 caavg_private active
cldisk1 000fe4114cf8d258 algeria_vg
cldisk2 000fe4114cf8d2ec brazil_vg
hdisk8 000fe4114cf8d608 diskhb
hdisk0 000fe4113f087018 rootvg active
algeria:/ #
Figure 7-36 The lspv command showing cldisks for shared disks

When you use the lscluster command to perform the check, you can see that the shared
disks (cldisk1 and cldisk2) are monitored by the CAA service. Keep in mind that two types
of disks are in CAA. One type is the repository disk that is shown as REPDISK, and the other
type is the shared disk that is shown as CLUSDISK. See Figure 7-37 on page 175.

174 IBM PowerHA SystemMirror 7.1 for AIX


algeria:/ # lscluster -d
Storage Interface Query

Cluster Name: canada_cluster


Cluster uuid: 97833c9e-c5b8-11df-be00-c21e45bc6603
Number of nodes reporting = 2
Number of nodes expected = 2
Node algeria
Node uuid = 88cff8be-c58f-11df-95ab-c21e45bc6604
Number of disk discovered = 3
cldisk2
state : UP
uDid : 533E3E213600A0B80001146320000F1A74C18BDAA0F1815
FAStT03IBMfcp05VDASD03AIXvscsi
uUid : 600a0b80-0011-4632-0000-f1a74c18bdaa
type : CLUSDISK
cldisk1
state : UP
uDid : 533E3E213600A0B8000291B080000D3CB053B7EA60F1815
FAStT03IBMfcp05VDASD03AIXvscsi
uUid : 600a0b80-0029-1b08-0000-d3cb053b7ea6
type : CLUSDISK
caa_private0
state : UP
uDid :
uUid : 600a0b80-0029-1b08-0000-d3cd053b7f0d
type : REPDISK
Node
Node uuid = 00000000-0000-0000-0000-000000000000
Number of disk discovered = 0
algeria:/ #
Figure 7-37 The shared disks monitored by the CAA service

Verifying the cluster


To verify the snapshot migration, check the components shown in Table 7-2 on each node.

Table 7-2 Components to verify after the snapshot migration


Component Command

The CAA services are active. lssrc -g caa


lscluster -m

The RSCT services are active. lssrc -s cthags

Start the cluster service one by one. smitty clstart

Chapter 7. Migrating to PowerHA 7.1 175


7.3.3 Checklist for performing a snapshot migration
Because the entire migration can be confusing, Table 7-3 provides a step-by-step checklist for
the snapshot migration of each node in the cluster.

Table 7-3 Checklist for performing a snapshot migration


Step Node 1 Node 2 Check

0 Ensure that the cluster is Ensure that the cluster is


running. running.

1 Create a snapshot.

2 Stop the cluster. Stop the cluster. lssrc -ls clstrmgrES

3 Update AIX 6.1.6. Update AIX 6.1.6. oslevel -s


install bos.cluster and
bos.ahafs filesets

4 Restart the system. Restart the system.

5 Select option 2 from the Check for unsupported


clmigcheck menu. configurations.

6 Select option 3 from the /var/clmigcheck/clgmicheck.txt


clmigcheck menu.

7 Remove PowerHA 5.5 Remove PowerHA 5.5 lslpp -l | grep cluster


and install PowerHA 7.1. and install PowerHA 7.1.

8 Convert the snapshot. clconvert_snapshot

9 Restore the snapshot.

10 Start the cluster. lssrc -ls clstrmgrES, hacmp.out

11 Start the cluster. lssrc -ls clstrmgrES, hacmp.out

7.3.4 Summary
A snapshot migration to PowerHA 7.1 entails running the clmigcheck program. Before you
begin the migration, you must prepare for it by installing AIX 6.1.6 or later and checking if any
part of the configuration is unsupported.

Then you run the clmigcheck command to review your PowerHA configuration and verify that
is works with PowerHA 7.1. After verifying the configuration, you specify a repository disk and
multicast address for the CAA service, which are essential components for the CAA service.

After you successfully complete the clmigcheck procedure, you can install PowerHA 7.1. The
CAA service is made while you restore your snapshot. PowerHA 7.1 uses the newly
configured CAA service for event monitoring and heartbeating.

176 IBM PowerHA SystemMirror 7.1 for AIX


7.4 Rolling migration
This section explains how to perform a three-node rolling migration of AIX and PowerHA. The
test environment begins with PowerHA 6.1 SP3 and AIX 6.1 TL3 versions. The step-by-step
instructions in this topic explain how to perform a three-node rolling migration of AIX to 6.1
TL6 and PowerHA to 7.1 SP1 versions as illustrated in Figure 7-38.

Figure 7-38 Three-node cluster before migration

The cluster is using virtualized resources provided by VIOS for network and storage. Rootvg
(hdisk0) is also hosted from the VIOS. The backing devices are provided from a DS4800
storage system.

The network topology is configured as IPAT via aliasing. Also disk heartbeating is used over
the shared storage between all the nodes.

The cluster contains two resource groups: newyork_rg and test_rg. The newyork_rg resource
group hosts the IBM HTTP Server application, and the test_rg resource group hosts a test
script application. The node priority for newyork_rg is node chile, and test_rg is node
serbia. Node scotland is running in a standby node capacity.

Chapter 7. Migrating to PowerHA 7.1 177


Figure 7-39 shows the relevant attributes of the newyork_rg and test_rg resource groups.

Resource Group Name newyork_rg


Participating Node Name(s) chile scotland serbia
Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node
Fallback Policy Never Fallback
Volume Groups ny_datavg
Application Servers httpd_app

Resource Group Name test_app_rg


Participating Node Name(s) serbia chile scotland
Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node
Fallback Policy Fallback To Higher Priority Node
Application Servers test_app

Figure 7-39 Three-node cluster resource groups

7.4.1 Planning
Before beginning a rolling migration, you must properly plan to ensure that you are ready to
proceed. For more information, see 7.1, “Considerations before migrating” on page 152. The
migration to PowerHA 7.1 is different from previous releases, because of the support for CAA
integration. Therefore, see also 7.2, “Understanding the PowerHA 7.1 migration process” on
page 153.

Ensure that the cluster is stable on all nodes and is synchronized. With a rolling migration,
you must be aware of the following restrictions while performing the migration, because a
mixed-software-version cluster is involved:
 Do not perform synchronization or verification while a mixed-software-version cluster
exists. Such actions are not allowed in this case.
 Do not make any cluster configuration changes.
 Do not perform a Cluster Single Point Of Control (C-SPOC) operation while a
mixed-software-version cluster exists. Such action is not allowed in this case.
 Try to perform the migration during one maintenance period, and do not leave your cluster
in a mixed state for any significant length of time.

7.4.2 Performing a rolling migration


In this example, a two-phase migration is performed in which you migrate AIX from version
6.1 TL3 to version 6.1 TL6, restart the system, and then migrate PowerHA. You perform this
migration on one node at a time, ensuring that any resource group that the node is hosting is
moved to another node first.

178 IBM PowerHA SystemMirror 7.1 for AIX


Migrating the first node
Figure 7-40 shows the cluster before upgrading AIX.

Figure 7-40 Rolling migration: Scotland before the AIX upgrade

To migrate the first node, follow these steps:


1. Shut down PowerHA services on the standby node (scotland). Specify the smitty clstop
command to stop this node. Because this node is a standby node, no resource groups are
hosted. Therefore, you do not need to perform any resource group operations first. Ensure
that cluster services are stopped by running the following command:
lssrc -ls clstrmgres
Look for the ST_INIT status, which indicates that cluster services on this node are in a
stopped state.
2. Update AIX to version 6.1 TL6 (scotland node). To perform this task, run the smitty
update_all command by using the TL6 images, which you can download by going to:
http://www.ibm.com/support/entry/portal/Downloads/IBM_Operating_Systems/AIX

CAA-specific file sets: You must install the CAA specific bos.cluster and bos.ahafs
file sets because update_all does not install them.

After you complete the installation, restart the node.

Chapter 7. Migrating to PowerHA 7.1 179


When AIX is upgraded, you are at the stage shown in Figure 7-41.

Figure 7-41 Rolling migration: Scotland post AIX upgrade

3. Decide which shared disk you to use for the CAA private repository (scotland node). See
7.1, “Considerations before migrating” on page 152, for more information.

Previous volume disk group: The disk must be a clean logical unit number (LUN) that
does not contain a previous volume group. If you have a previous volume group on this
disk, you must remove it. See 10.4.5, “Volume group name already in use” on
page 320.

4. Run the clmigcheck command on the first node (scotland).


You have now upgraded AIX to a CAA version and chosen the CAA disk. When you start
the clmigcheck command, you see the panel shown in Figure 7-42 on page 181. For more
information about the clmigcheck command, see 7.2, “Understanding the PowerHA 7.1
migration process” on page 153.

180 IBM PowerHA SystemMirror 7.1 for AIX


------------[ PowerHA SystemMirror Migration Check ]-------------

Please select one of the following options:

1 = Check ODM configuration.

2 = Check snapshot configuration.

3 = Enter repository disk and multicast IP addresses.

Select one of the above,"x"to exit or "h" for help:


Figure 7-42 Running the clmigcheck command first during a rolling migration

a. Select option 1 (Check the ODM configuration).


When choosing this option, the clmigcheck command checks your configuration and
reports any problems that cannot be migrated.
This migration scenario uses disk-based heartbeating. The clmigcheck command
detects this method and shows a message similar to the one in Figure 7-43, indicating
that this configuration will be removed during migration.

------------[ PowerHA SystemMirror Migration Check ]-------------

CONFIG-WARNING: The configuration contains unsupported hardware: Disk


Heartbeat network. The PowerHA network name is net_diskhb_01. This will
be
removed from the configuration during the migration
to PowerHA SystemMirror 7.1.

Hit <Enter> to continue


Figure 7-43 The disk heartbeat warning message from the clmigcheck command

You do not need to take any action because the disk-based heartbeating is
automatically removed during migration. Because three disk heartbeat networks are in
the configuration, this warning message is displayed three times, once for each
network. If no errors are detected, you see the message shown in Figure 7-44.

------------[ PowerHA SystemMirror Migration Check ]-------------

The ODM has no unsupported elements.

Hit <Enter> to continue


Figure 7-44 ODM no unsupported elements message

Press Enter after this last panel, and you return to the main menu.

Chapter 7. Migrating to PowerHA 7.1 181


b. Select option 3 to enter the repository disk. As shown in Figure 7-45, in this scenario,
we chose option 1 to use hdisk1 (PVID 000fe40120e16405).

-----------[ PowerHA SystemMirror Migration Check ]-------------

Select the disk to use for the repository

1 = 000fe40120e16405(hdisk1)
2 = 000fe4114cf8d258(hdisk2)
3 = 000fe4114cf8d2ec(hdisk3)
4 = 000fe4013560cc77(hdisk5)
5 = 000fe4114cf8d4d5(hdisk6)
6 = 000fe4114cf8d579(hdisk7)

Select one of the above or "x" to exit:


Figure 7-45 Choosing a CAA disk

c. Enter the multicast address as shown in Figure 7-46. You can specify a multicast, or
you can have clmigcheck automatically assign one. For more information about
multicast addresses, see 1.3.1, “Communication interfaces” on page 13. Press Enter
and you return to the main menu.

------------[ PowerHA SystemMirror Migration Check ]-------------

PowerHA SystemMirror uses multicast address for internal


cluster communication and monitoring. These must be in the
multicast range, 224.0.0.0 - 239.255.255.255.

If you make a NULL entry, AIX will generate an appropriate address for
you.
You should only specify an address if you have an explicit reason to do
so, but are cautioned that this address cannot be changed once the
configuration is activated (i.e. migration is complete).

h = help

Enter the multicast IP address to use for network monitoring:


Figure 7-46 Choosing a multicast address

d. Exit the clmigcheck tool.

182 IBM PowerHA SystemMirror 7.1 for AIX


5. Verify whether you are ready for the PowerHA upgrade on the node scotland by running
the clmigcheck tool again. If you are ready, you see the panel shown in Figure 7-47.

------------[ PowerHA SystemMirror Migration Check ]-------------

clmigcheck: This is not the first node or last node clmigcheck was run on.
No further checking is required on this node. You can install the new
version of PowerHA SystemMirror.

Hit <Enter> to continue


Figure 7-47 Verifying readiness for migration

6. Upgrade PowerHA on the scotland node to PowerHA 7.1 SP1. Because the cluster
services are down, you can perform a smitty update_all to upgrade PowerHA.
7. When this process is complete, modify the new rhosts definition for CAA as shown in
Figure 7-48. Although in this scenario, we used network addresses, you can also add the
short name for the host name into rhosts considering that you configured the /etc/hosts
file correctly. See “Creating a cluster with host names in the FQDN format” on page 75, for
more information.

/etc/cluster
# cat rhosts
192.168.101.111
192.168.101.112
192.168.101.113
Figure 7-48 Extract showing the configured rhosts file

Populating the /etc/cluster/rhosts file: The /etc/cluster/rhosts file must be


populated with all cluster IP addresses before using PowerHA SystemMirror. This
process was done automatically in previous releases but is now a required, manual
process. The addresses that you enter in this file must include the addresses that
resolve to the host name of the cluster nodes. If you update this file, you must refresh
the clcomd subsystem by using the following command:
refresh -s clcomd

Restarting the cluster: You do not need to restart the cluster after you upgrade
PowerHA.

8. Start PowerHA on the scotland node by issuing the smitty clstart command. The node
should be able to rejoin the cluster. However, you receive warning messages about mixed
versions of PowerHA.
After PowerHA is started on this node, move any resource groups that the next node is
hosting onto this node so that you can migrate the second node in the cluster. In this
scenario, the serbia node is hosting the test_app_rg resource group. Therefore, we
perform a resource group move request to move this resource to the newly migrated
scotland node. The serbia node is then available to migrate.

Chapter 7. Migrating to PowerHA 7.1 183


You have now completed the first node migration of the three-node cluster. You have
rejoined the cluster and are now in a mixed version. Figure 7-49 shows the starting point
for migrating the next node in the cluster, with the test_app_rg resource group moved to
the newly migrated scotland node.

Figure 7-49 Rolling migration: Scotland post HA upgrade

184 IBM PowerHA SystemMirror 7.1 for AIX


Migrating the second node
Figure 7-50 shows that you are ready to proceed with migration of the second node (serbia).

Figure 7-50 Rolling migration: Serbia before the AIX upgrade

To migrate the second node, follow these steps:


1. Shut down PowerHA services on the serbia node. You must stop cluster services on this
node before you begin the migration.
2. Upgrade to AIX 6.1 TL6 (serbia node) similar to the process you used for the scotland
node. After the update is complete, ensure that AIX is rebooted.

Chapter 7. Migrating to PowerHA 7.1 185


You are now in the state as shown in Figure 7-51.

Figure 7-51 Rolling migration: Serbia post AIX upgrade

3. Run the clmigcheck command to ensure that the migration worked and that you can
proceed with the PowerHA upgrade. This step is important even though you have already
performed the cluster configuration migration check and CAA configuration on the first
node (scotland) is complete.
Figure 7-52 shows the panel that you see now.

------------[ PowerHA SystemMirror Migration Check ]-------------

clmigcheck: This is not the first node or last node clmigcheck was run on.
No further checking is required on this node. You can install the new
version of PowerHA SystemMirror.

Hit <Enter> to continue


Figure 7-52 The clmigcheck panel on the second node

4. Upgrade PowerHA on the serbia node to PowerHA 7.1 SP1. Follow the same migration
procedure as in the first node.

Reminder: Update the /etc/cluster/rhosts file so that it is the same as the first node
that you upgraded. See step 6 on page 183.

186 IBM PowerHA SystemMirror 7.1 for AIX


5. Start PowerHA on the serbia node and rejoin this node to the cluster.
After this node is started, check and move the newyork_rg resource group from the chile
node to the scotland node. By performing this task, you are ready to proceed with
migration of the final node in the cluster (the chile node).

At this stage, two of the three nodes in the cluster are migrated to AIX 6.1 TL6 and PowerHA
7.1. The chile node is the last node in the cluster to be upgraded. Figure 7-53 shows how the
cluster looks now.

Figure 7-53 Rolling migration: The serbia node post HA upgrade

Chapter 7. Migrating to PowerHA 7.1 187


Migrating the final node
Figure 7-54 shows that you are ready to proceed with migration of the final node of the chile
cluster. The newyork_rg resource group has been moved to the scotland node and the
cluster services are down and ready for the AIX migration.

Figure 7-54 Rolling migration: The chile node before the AIX upgrade

To migrate the final node, follow these steps:


1. Shut down PowerHA services on the chile node.
2. Upgrade to AIX 6.1 TL6 (chile node). Remember to reboot the node after the upgrade.
Then run the clmigcheck command for the last time.
When the clmigcheck command is run for the last time, it recognizes that this node is the
last node of the cluster to migrate. This command then initiates the final phase of the
migration, which configures CAA. You see the message shown in Figure 7-55.

clmigcheck: You can install the new version of PowerHA SystemMirror.


Figure 7-55 Final message from the clmigcheck command

188 IBM PowerHA SystemMirror 7.1 for AIX


If a problem exists at this stage, you might see the message shown in Figure 7-56.

chile:/ # clmigcheck
Verifying clcomd communication, please be patient.

clmigcheck: Running
/usr/sbin/rsct/install/bin/ct_caa_set_disabled_for_migration
on each node in the cluster

Creating CAA cluster, please be patient.

ERROR: Problems encountered creating the cluster in AIX.


Use the syslog facility to see output from the mkcluster command.
Figure 7-56 Error condition from clmigcheck

If you see a message similar to the one shown in Figure 7-56, the final mkcluster phase
has failed. For more information about this problem, see 10.2, “Troubleshooting the
migration” on page 308.
At this stage, you have upgraded AIX and run the final clmigcheck process. Figure 7-57
shows how the cluster looks now.

Figure 7-57 Rolling migration: Chile post AIX upgrade

Chapter 7. Migrating to PowerHA 7.1 189


3. Upgrade PowerHA on the chile node by following the same procedure that you previously
used.

Reminder: Update the /etc/cluster/rhosts file so that it is the same as the other
nodes that you upgraded. See step 6 on page 183.

In this scenario, you started PowerHA on the chile node and performed a synchronization or
verification of the cluster, which is the final stage of the migration. The newyork_rg resource
group was moved back to the chile node. The cluster migration is now completed.
Figure 7-58 shows how the cluster looks now.

Figure 7-58 Rolling migration completed

190 IBM PowerHA SystemMirror 7.1 for AIX


7.4.3 Checking your newly migrated cluster
After the migration is completed, perform the following checks to ensure that everything has
migrated correctly:
 Verify that CAA is configured and running on all nodes.

Check that CAA is working by running the lscluster -m command. This command returns
information about your cluster from all your nodes. If a problem exists, you see a message
similar to the one shown in Figure 7-59.

# lscluster -m
Cluster services are not active.
Figure 7-59 Message indicating that CAA is not running

If you receive this message, see 10.4.7, “The ‘Cluster services are not active’ message” on
page 323, for details about how to fix this problem.
 Verify that CAA private is defined and active on all nodes.

Check the lspv output to ensure that the CAA repository is defined and varied on for each
node. You see output similar to what is shown in Figure 7-60.

chile:/ # lspv
caa_private0 000fe40120e16405 caavg_private active
hdisk2 000fe4114cf8d258 None
Figure 7-60 Extract from lspv showing the CAA repository disk

 Check conversion of PowerHA ODM.

Review the /tmp/clconvert.log file to ensure that the conversion of the PowerHA ODM has
been successful. For additional details about the log files and troubleshooting information,
see 10.1, “Locating the log files” on page 306.
 Synchronize or verify the cluster.

Run verification on your cluster to ensure that it operates as expected.

Troubleshooting: For information about common problems and solutions, see


Chapter 10, “Troubleshooting PowerHA 7.1” on page 305.

7.5 Offline migration


This section explains how to perform an offline migration. The test environment begins with
AIX 6.1.3.2 and PowerHA 6.1.0.2. The migration leads to AIX 7.1.0.1 and PowerHA 7.1.0.1.

7.5.1 Planning the offline migration


Part of planning for any migration is to ensure that you meet all the hardware and software
requirements. For more details, see 7.1, “Considerations before migrating” on page 152, and
7.2, “Understanding the PowerHA 7.1 migration process” on page 153.

Chapter 7. Migrating to PowerHA 7.1 191


Starting configuration
Figure 7-61 on page 192 shows a simplified layout of the cluster that is migrated in this
scenario. Both systems are running AIX 6.1 TL3 SP 2. The installed PowerHA version is 6.1
SP 2.

The cluster layout is a mutual takeover configuration. The munich system is the primary server
for the HTTP application. The berlin system is the primary server for the Network File
System (NFS), which is cross mounted by the system munich.

Because of resource limitations, the disk heartbeat is using one of the existing shared disks.
Two networks are defined:
 The net_ether_01 network is the administrative network and is used only by the system
administration team.
 The net_ether_10 network is used by the applications and its users.

Figure 7-61 Start point for offline migration

192 IBM PowerHA SystemMirror 7.1 for AIX


Planned target configuration
The plan is to update both systems to AIX 7.1 and to PowerHA SystemMirror 7.1. Because
PowerHA SystemMirror 6.1 SP2 is not supported on AIX 7.1, the quickest way to update it is
through an offline migration. A rolling migration is also possible, but requires the following
migration steps:
1. Update to PowerHA 6.1 SP3 or later (which can be performed by using a nondisruptive
upgrade method).
2. Migrate to AIX 7.1.
3. Migrate to PowerHA 7.1.

PowerHA 6.1 support on AIX 7.1: PowerHA 6.1 SP2 is not supported on AIX 7.1. You
need a minimum of PowerHA 6.1 SP3.

As mentioned in 1.2.3, “The central repository” on page 9, an additional shared disk is


required for the new CAA repository disk. Figure 7-62 shows the results of the completed
migration. To perform the migration, see 7.5.3, “Performing an offline migration” on page 195.

Figure 7-62 Planned configuration for offline migration

Chapter 7. Migrating to PowerHA 7.1 193


7.5.2 Offline migration flow
Figure 7-63 shows a high-level overview of the offline migration flow. First and most
importantly, you must have fulfilled all the new hardware requirements. Then you ensure that
AIX has been upgraded on all cluster nodes before continuing with the update of PowerHA.
To perform the migration, see 7.5.3, “Performing an offline migration” on page 195.

Figure 7-63 Offline migration flow

194 IBM PowerHA SystemMirror 7.1 for AIX


7.5.3 Performing an offline migration
Before you start the migration, you must complete all hardware and software requirements.
For a list of the requirements, see 7.1, “Considerations before migrating” on page 152.
1. Create a snapshot and copy it to a safe place and create a system backup (mksysb).
The snapshot and the mksysb are not required to complete the migration, but they might be
helpful if something goes wrong. You can also use the snapshot file to perform a snapshot
migration. You can use the system backup to re-install the system back to its original
starting point if necessary.
2. Stop cluster services on all nodes by running the smitty clstop command. Before you
continue, ensure that cluster services are stopped on all nodes.
3. Update to AIX 6.1.6 or later. Alternatively perform a migration installation of AIX to version
7.1. or later.
In this test scenario, a migration installation to version 7.1 is performed on both systems in
parallel.
4. Ensure that the new AIX cluster file sets are installed, specifically the bos.ahafs and
bos.cluster file sets. These file sets are not installed as part of the AIX migration.
5. Restart the systems.

Important: You must restart the systems to ensure that all needed processes for CAA
are running.

6. Verify that the new clcomd subsystem is running.


If the clcomd subsystem is not running, a required file set is missing (see step 4).
Figure 7-64 shows an example of the output indicating that the subsystems are running.

# lssrc -a | grep clcom


clcomd caa 3866824 active
clcomdES clcomdES 5243068 active
#
Figure 7-64 Verifying if the clcomd subsystem is running

Beginning with PowerHA 6.1 SP3 or later, you can start the cluster if preferred, but we do
not start it now in this scenario.
7. Run the clmigcheck program on one of the cluster nodes.

Important: You must run the clmigcheck program (in the /usr/sbin/ directory) before
you install PowerHA 7.1. Keep in mind that you must run this program on each node
one-at-a-time in the cluster.

Chapter 7. Migrating to PowerHA 7.1 195


The following steps are required for offline migration when running the clmigcheck
program. The steps might differ slightly if you perform a rolling or snapshot migration.
a. Select option 1 (check ODM configuration) from the first clmigcheck panel
(Figure 7-65).

------------[ PowerHA SystemMirror Migration Check ]-------------

Please select one of the following options:

1 = Check ODM configuration.

2 = Check snapshot configuration.

3 = Enter repository disk and multicast IP addresses.

Select one of the above,"x"to exit or "h" for help: 1


Figure 7-65 The clmigcheck main panel

While checking the configuration, you might see warning or error messages. You must
correct errors manually, but can clean up issues identified by warning messages during
the migration process. In this case, a warning message (Figure 7-66) is displayed
indicating the disk heartbeat network will be removed at the end of the migration.

------------[ PowerHA SystemMirror Migration Check ]-------------

CONFIG-WARNING: The configuration contains unsupported hardware: Disk


Heartbeat network. The PowerHA network name is net_diskhb_01. This will
be
removed from the configuration during the migration
to PowerHA SystemMirror 7.1.

Hit <Enter> to continue


Figure 7-66 Warning message after selecting clmigcheck option 1

b. Continue with the next clmigcheck panel.


Only one error or warning is displayed at a time. Press the Enter key, and any
additional messages are displayed. In this case, only one warning message is
displayed.
Manually correct or fix all issues that are identified by error messages before
continuing with the process. After you fix an issue, restart the system as explained in
step 5 on page 195.

196 IBM PowerHA SystemMirror 7.1 for AIX


c. Verify that you receive a message similar to the one in Figure 7-67 indicating that ODM
has no supported elements. You must receive this message before you continue with
the clmigcheck process and the installation of PowerHA.

------------[ PowerHA SystemMirror Migration Check ]-------------

The ODM has no unsupported elements.

Hit <Enter> to continue


Figure 7-67 ODM check successful message

Press Enter, and the main clmigcheck panel (Figure 7-65 on page 196) is displayed
again.
d. Select option 3 (Enter repository disk and multicast IP addresses).
The next panel (Figure 7-68) lists all available shared disks that might be used for the
CAA repository disk. You need one shared disk for the CAA repository.

------------[ PowerHA SystemMirror Migration Check ]-------------

Select the disk to use for the repository

1 = 00c0f6a01c784107(hdisk4)

Select one of the above or "x" to exit: 1


Figure 7-68 Selecting the repository disk

e. Configure the multicast address as shown in Figure 7-69 on page 198. The system
automatically creates an appropriate address for you. By default, PowerHA creates a
multicast address by replacing the first octet of the IP communication path of the lowest
node in the cluster by 228. Press Enter.

Manually specifying an address: Only specify an address manually if you have an


explicit reason to do so.

Important:
 You cannot change the selected IP multicast address after the configuration is
activated.
 You must set up any routers in the network topology to forward multicast
messages.

Chapter 7. Migrating to PowerHA 7.1 197


------------[ PowerHA SystemMirror Migration Check ]-------------

PowerHA SystemMirror uses multicast address for internal


cluster communication and monitoring. These must be in the
multicast range, 224.0.0.0 - 239.255.255.255.

If you make a NULL entry, AIX will generate an appropriate address for
you.
You should only specify an address if you have an explicit reason to do
so, but are cautioned that this address cannot be changed once the
configuration is activated (i.e. migration is complete).

h = help

Enter the multicast IP address to use for network monitoring:


Figure 7-69 Configuring a multicast address

f. From the main clmigcheck panel, type an x to exit the clmigcheck program.
g. In the next panel (Figure 7-70), confirm the exit request by typing y.

------------[ PowerHA SystemMirror Migration Check ]-------------

You have requested to exit clmigcheck.

Do you really want to exit? (y) y


Figure 7-70 The clmigcheck exit confirmation message

A warning message (Figure 7-71) is displayed as a reminder to complete all the


previous steps before you exit.

Note - If you have not completed the input of repository disks and
multicast IP addresses, you will not be able to install
PowerHA SystemMirror

Additional details for this session may be found in


/tmp/clmigcheck/clmigcheck.log.
Figure 7-71 The clmigcheck exit warning message

198 IBM PowerHA SystemMirror 7.1 for AIX


8. Install PowerHA only on the node where the clmigcheck program was executed.
If the clmigcheck program is not run, a failure message (Figure 7-72) is displayed when
you try to install PowerHA 7.1. In this case, return to step 7 on page 195.

COMMAND STATUS

Command: failed stdout: yes stderr: no

Before command completion, additional instructions may appear below.

[MORE...94]
restricted by GSA ADP Schedule Contract with IBM Corp.
. . . . . << End of copyright notice for cluster.es.migcheck >>. . . .

The /usr/sbin/clmigcheck command must be run to


verify the back level configuration before you can
install this version. If you are not migrating the
back level configuration you must remove it before
before installing this version.

Failed /usr/sbin/clmigcheck has not been run

instal: Failed while executing the cluster.es.migcheck.pre_i script.


[MORE...472]

F1=Help F2=Refresh F3=Cancel F6=Command


F8=Image F9=Shell F10=Exit /=Find
n=Find Next
Figure 7-72 PowerHA 7.1 installation failure message

9. Add the host names of your cluster nodes to the /etc/cluster/rhosts file. The names
must match the PowerHA node names.
10.Refresh the clcomd subsystem.
refresh -s clcomd
11.Review the /tmp/clconvert.log file to ensure that a conversion of the PowerHA ODMs
has occurred.
12.Start cluster services only on the node that you updated by using smitty clstart.
13.Ensure that the cluster services have started successfully on this node by using any of the
following commands:.
clstat -a
lssrc -ls clstrmgrES | grep state
clmgr query cluster | grep STATE
14.Continue to the next node.
15.Run the clmigcheck program on this node.
Keep in mind that you must run the clmigcheck program on each node before you can
install PowerHA 7.1. Follow the same steps as for the first system as explained in step 7
on page 195.

Chapter 7. Migrating to PowerHA 7.1 199


An error message similar to the one shown in Figure 7-73 indicates that one of the steps
was not performed. Often this message is displayed because the system was not
restarted after the installation of the AIX cluster file sets.
To correct this issue, return to step 4 on page 195. You might have to restart both systems,
depending on which part was missed.

# clmigcheck
Saving existing /tmp/clmigcheck/clmigcheck.log to
/tmp/clmigcheck/clmigcheck.log.bak
rshexec: cannot connect to node munich

ERROR: Internode communication failed,


check the clcomd.log file for more information.

#
Figure 7-73 The clmigcheck execution error message

Attention: Do not start the clcomd subsystem manually. Starting this system manually
can result in further errors, which might require you to re-install this node or all the
cluster nodes.

16.Install PowerHA only on this node in the same way as you did on the first node. See step 8
on page 199.
17.As on the first node, add the host names of your cluster nodes to the /etc/cluster/rhosts
file. The names must be the same as the node names.
18.Refresh the clcomd subsystem.
19.Start the cluster services only on the node that you updated.
20.Ensure that the cluster services started successfully on this node.
21.If you have more than two nodes in you cluster, repeat step 15 on page 199 through step
20 until all of your cluster nodes are updated.

You now have a fully running cluster environment. Before going into production mode, test
your cluster as explained in Chapter 9, “Testing the PowerHA 7.1 cluster” on page 259.

Upon checking the topology information by using the cltopinfo command, all non-IP and disk
heartbeat networks should be removed. If these networks are not removed, see Chapter 10,
“Troubleshooting PowerHA 7.1” on page 305.

When checking the RSCT subsystems, the topology subsystem should now be inactive as
shown in Figure 7-74.

# lssrc -a | grep svcs


grpsvcs grpsvcs 6684834 active
emsvcs emsvcs 5898390 active
topsvcs topsvcs inoperative
grpglsm grpsvcs inoperative
emaixos emsvcs inoperative
Figure 7-74 Checking for topology service

200 IBM PowerHA SystemMirror 7.1 for AIX


8

Chapter 8. Monitoring a PowerHA


SystemMirror 7.1 for AIX cluster
Monitoring plays an important role in managing issues when a cluster has duplicated
hardware that can “hide” the failing components from the user. It is also essential for tracking
the behavior of a cluster and helping to address performance issues or bad design
implementations.

The role of the administrator is to quickly find relevant information and analyze it to make the
best decision in every situation. This chapter provides several examples that show how the
PowerHA 7.1 administrator can gather information about the cluster by using several
methods.

For most of the examples in this chapter, the korea cluster from the test environment is used
with the participating seoul and busan nodes. All the commands in the examples are executed
as root user.

This chapter includes the following topics:


 Collecting information before a cluster is configured
 Collecting information after a cluster is configured
 Collecting information after a cluster is running

© Copyright IBM Corp. 2011. All rights reserved. 201


8.1 Collecting information before a cluster is configured
Before you configure the cluster, you must collect the relevant information. Later, the
administrator can use this information to see the changes that have been made after a
configured IBM PowerHA SystemMirror 7.1 for AIX cluster is running. Ensure that this
information is available to assist in troubleshooting and diagnosing the cluster in the future.
This topic lists the relevant information that you might want to collect.

The /etc/hosts file


The /etc/hosts file must have all the IP addresses that are used in the cluster configuration,
including the boot or base addresses, persistent addresses, and service addresses, as
shown in Example 8-1.

Example 8-1 A /etc/hosts sample configuration


seoul, busan:/ # egrep "seoul|busan|poksap" /etc/hosts
192.168.101.143 seoul-b1 # Boot IP label 1
192.168.101.144 busan-b1 # Boot IP label 1
192.168.201.143 seoul-b2 # Boot IP label 2
192.168.201.144 busan-b2 # Boot IP label 2
10.168.101.43 seoul # Persistent IP
10.168.101.44 busan # Persistent IP
10.168.101.143 poksap-db # Service IP label

The /etc/cluster/rhosts file


The /etc/cluster/rhosts file (Example 8-2) in PowerHA 7.1 replaces the
/usr/es/sbin/cluster/etc/rhosts file. This file is populated with the communication paths
used at the moment of the nodes definition.

Example 8-2 A /etc/cluster/rhosts sample configuration


seoul, busan:/ # cat /etc/cluster/rhosts
seoul # Persistent IP address used as communication path
busan # Persistent IP address used as communication path

CAA subsystems
Cluster Aware AIX (CAA) introduces a new set of subsystems. When the cluster is not
running, its status is inactive, except for the clcomd subsystem, which is active (Example 8-3).
The clcomdES subsystem has been replaced by the clcomd subsystem and is no longer part of
the cluster subsystems group. It is now part of the AIX Base Operating System (BOS), not
PowerHA.

Example 8-3 CAA subsystems status


seoul, busan:/ # lssrc -a | grep caa
clcomd caa 5505056 active
cld caa inoperative
clconfd caa inoperative

busan:/ # lslpp -w /usr/sbin/clcomd


File Fileset Type
----------------------------------------------------------------------------
/usr/sbin/clcomd bos.cluster.rte File

202 IBM PowerHA SystemMirror 7.1 for AIX


PowerHA groups
IBM PowerHA 7.1 creates two operating system groups during installation. The group
numbers must be consistent across cluster nodes as shown in Example 8-4.

Example 8-4 Groups created while installing PowerHA file sets


seoul, busan:/ # grep ha /etc/group
hacmp:!:202:
haemrm:!:203:

Disk configuration
With the current code level in AIX 7.1.0.1, the CAA repository cannot be created over virtual
SCSI (VSCSI) disks. For the korea cluster, a DS4800 storage system is used and is accessed
over N_Port ID Virtualization (NPIV). The rootvg volume group is the only one using VSCSI
devices. Example 8-5 shows a list of storage disks.

Example 8-5 Storage disks listing


seoul:/ # lspv
hdisk0 00c0f6a088a155eb rootvg active
hdisk1 00c0f6a077839da7 None
hdisk2 00c0f6a0107734ea None
hdisk3 00c0f6a010773532 None

busan:/ # lspv
hdisk0 00c0f6a089390270 rootvg active
hdisk1 00c0f6a077839da7 None
hdisk2 00c0f6a0107734ea None
hdisk3 00c0f6a010773532 None

seoul, busan:/ # lsdev -Cc disk


hdisk0 Available Virtual SCSI Disk Drive
hdisk1 Available C5-T1-01 MPIO Other DS4K Array Disk
hdisk2 Available C5-T1-01 MPIO Other DS4K Array Disk
hdisk3 Available C5-T1-01 MPIO Other DS4K Array Disk

Network interfaces configuration


The boot or base address is configured as the initial address for each network interface. The
future persistent IP address is aliased over the en0 interface in each node before the
PowerHA cluster configuration. Example 8-6 shows a configuration of the network interfaces.

Example 8-6 Network interfaces configuration


seoul:/ # ifconfig -a
en0:
flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT
,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 192.168.101.143 netmask 0xffffff00 broadcast 192.168.101.255
inet 10.168.101.43 netmask 0xffffff00 broadcast 10.168.101.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
en2:
flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT
,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 192.168.201.143 netmask 0xffffff00 broadcast 192.168.201.255

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 203


lo0:
flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LAR
GESEND,CHAIN>
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1%1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1

busan:/ # ifconfig -a
en0: en0:
flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT
,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 192.168.101.144 netmask 0xffffff00 broadcast 192.168.101.255
inet 10.168.101.44 netmask 0xffffff00 broadcast 10.168.101.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
en2:
flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT
,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 192.168.201.144 netmask 0xffffff00 broadcast 192.168.201.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
lo0:
flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LAR
GESEND,CHAIN>
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1%1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1

Routing table
Keeping the routing table is an important source of information. As shown in 8.3.1, “AIX
commands and log files” on page 216, the multicast address is not displayed in this table,
even when the CAA and IBM PowerHA clusters are running. Example 8-7 shows the routing
table for the seoul node.

Example 8-7 Routing table


seoul:/ # netstat -rn
Routing tables
Destination Gateway Flags Refs Use If Exp Groups

Route tree for Protocol Family 2 (Internet):


default 192.168.100.60 UG 1 3489 en0 - -
10.168.100.0 10.168.101.43 UHSb 0 0 en0 - - =>
10.168.100/22 10.168.101.43 U 10 39006 en0 - -
10.168.101.43 127.0.0.1 UGHS 11 24356 lo0 - -
10.168.103.255 10.168.101.43 UHSb 0 0 en0 - -
127/8 127.0.0.1 U 12 10746 lo0 - -
192.168.100.0 192.168.101.143 UHSb 0 0 en0 - - =>
192.168.100/22 192.168.101.143 U 2 1057 en0 - -
192.168.101.143 127.0.0.1 UGHS 0 16 lo0 - -
192.168.103.255 192.168.101.143 UHSb 0 39 en0 - -
192.168.200.0 192.168.201.143 UHSb 0 0 en2 - - =>
192.168.200/22 192.168.201.143 U 0 2 en2 - -
192.168.201.143 127.0.0.1 UGHS 0 4 lo0 - -
192.168.203.255 192.168.201.143 UHSb 0 0 en2 - -

204 IBM PowerHA SystemMirror 7.1 for AIX


Route tree for Protocol Family 24 (Internet v6):
::1%1 ::1%1 UH 3 17903 lo0 - -

Multicast information
You can use the netstat command to display information about an interface for which
multicast is enabled. As shown in Example 8-8 for en0, no multicast address is configured,
other than the default 224.0.0.1 address before the cluster is configured.

Example 8-8 Multicast information


seoul:/ # netstat -a -I en0
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en0 1500 link#2 a2.4e.50.54.31.3 304248 0 60964 0 0
01:00:5e:7f:ff:fd
01:00:5e:00:00:01
en0 1500 192.168.100 seoul-b1 304248 0 60964 0 0
239.255.255.253
224.0.0.1
en0 1500 10.168.100 seoul 304248 0 60964 0 0
239.255.255.253
224.0.0.1

Status of the IBM Systems Director common agent subsystems


The two subsystems must be active in every node to be discovered and managed by IBM
Systems Director as shown in Example 8-9. To monitor the cluster using the IBM Systems
Director web and command-line interfaces (CLIs), see 8.3, “Collecting information after a
cluster is running” on page 216.

Example 8-9 Common agent subsystems status


seoul:/ # lssrc -a | egrep "cim|platform"
platform_agent 2359482 active
cimsys 3211362 active

busan:/ # lssrc -a | egrep "cim|platform"


platform_agent 3014798 active
cimsys 2818190 active

Cluster status
Before a cluster is configured, the state of every node is NOT_CONFIGURED as shown in
Example 8-10.

Example 8-10 PowerHA cluster status


seoul:/ # lssrc -g cluster
Subsystem Group PID Status
clstrmgrES cluster 6947066 active

seoul:/ # lssrc -ls clstrmgrES


Current state: NOT_CONFIGURED
sccsid = "$Header: @(#) 61haes_r710_integration/13
43haes/usr/sbin/cluster/hacmprd/main.C, hacmp, 61haes_r710, 1034A_61haes_r710
2010-08-19T1
0:34:17-05:00$"

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 205


busan:/ # lssrc -g cluster
Subsystem Group PID Status
clstrmgrES cluster 3342346 active

busan:/ # lssrc -ls clstrmgrES


Current state: NOT_CONFIGURED
sccsid = "$Header: @(#) 61haes_r710_integration/13
43haes/usr/sbin/cluster/hacmprd/main.C, hacmp, 61haes_r710, 1034A_61haes_r710
2010-08-19T1
0:34:17-05:00$"

Modifications in the /etc/syslogd.conf file


During the installation of the PowerHA 7.1 file sets, entries are added to the
/etc/syslogd.conf configuration file as shown in Example 8-11.

Example 8-11 Modifications to the /etc/syslogd.conf file


# PowerHA SystemMirror Critical Messages
local0.crit /dev/console
# PowerHA SystemMirror Informational Messages
local0.info /var/hacmp/adm/cluster.log
# PowerHA SystemMirror Messages from Cluster Scripts
user.notice /var/hacmp/adm/cluster.log
# PowerHA SystemMirror Messages from Cluster Daemons
daemon.notice /var/hacmp/adm/cluster.log

Lines added to the /etc/inittab file


In PowerHA 7.1, the clcomd subsystem has a separate entry in the /etc/inittab file because
the clcomd subsystem is no longer part of the cluster subsystem group. Two entries now exist
as shown in Example 8-12.

Example 8-12 Modification to the /etc/inittab file


clcomd:23456789:once:/usr/bin/startsrc -s clcomd
hacmp:2:once:/usr/es/sbin/cluster/etc/rc.init >/dev/console 2>&1

8.2 Collecting information after a cluster is configured


After the configuration is done and the first cluster synchronization is performed, the CAA
services become available. Also, the administrator can start using the clcmd utility that
distributes every command passed as an argument to all the cluster nodes.

As soon as the configuration is synchronized to all nodes and the CAA cluster is created, the
administrator cannot change the cluster name or the cluster multicast address.

Changing the repository disk: The administrator can change the repository disk with the
procedure for replacing a repository disk provided in the PowerHA 7.1 Release Notes.

206 IBM PowerHA SystemMirror 7.1 for AIX


Disk configuration
During the first successful synchronization, the CAA repository is created over the chosen
disk. In each node, the hdisk device is renamed according to the new cluster unified
nomenclature. Is name changes to caa_private0. The repository volume group is called
caavg_private and is in active state in every node.

After the first synchronization, two other disks are added in the cluster storage by using the
following command:
chcluster -n korea -d+hdisk2,hdisk3

where hdisk2 is renamed to cldisk2, and hdisk3 is renamed to cldisk1. Example 8-13
shows the resulting disk listing.

Example 8-13 Disk listing


seoul:/ # clcmd lspv
-------------------------------
NODE seoul
-------------------------------
hdisk0 00c0f6a088a155eb rootvg active
caa_private0 00c0f6a077839da7 caavg_private active
cldisk2 00c0f6a0107734ea None
cldisk1 00c0f6a010773532 None
-------------------------------
NODE busan
-------------------------------
hdisk0 00c0f6a089390270 rootvg active
caa_private0 00c0f6a077839da7 caavg_private active
cldisk2 00c0f6a0107734ea None
cldisk1 00c0f6a010773532 None

Attention: The cluster repository disk is a special device for the cluster. The use of Logical
Volume Manager (LVM) commands over the repository disk is not supported. AIX LVM
commands are single node commands and are not intended for use in a clustered
configuration.

Multicast information
Compared with the multicast information collected when the cluster was not configured, the
netstat command now shows the 228.168.101.43 address in the table (Example 8-14).

Example 8-14 Multicast information


seoul:/ # netstat -a -I en0
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en0 1500 link#2 a2.4e.50.54.31.3 70339 0 44686 0 0
01:00:5e:28:65:2b
01:00:5e:7f:ff:fd
01:00:5e:00:00:01
en0 1500 192.168.100 seoul-b1 70339 0 44686 0 0
228.168.101.43
239.255.255.253
224.0.0.1

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 207


en0 1500 10.168.100 seoul 70339 0 44686 0 0
228.168.101.43
239.255.255.253
224.0.0.1

Cluster status
The cluster status changes from NOT_CONFIGURED to ST_INIT as shown in Example 8-15.

Example 8-15 PowerHA cluster status


busan:/ # lssrc -ls clstrmgrES
Current state: ST_INIT
sccsid = "$Header: @(#) 61haes_r710_integration/13
43haes/usr/sbin/cluster/hacmprd/main.C, hacmp, 61haes_r710, 1034A_61haes_r710
2010-08-19T1
0:34:17-05:00$"

CAA subsystem group active


All the CAA subsystems become active after the first cluster synchronization as shown in
Example 8-16.

Example 8-16 CAA subsystems status


seoul:/ # clcmd lssrc -g caa
-------------------------------
NODE seoul
-------------------------------
Subsystem Group PID Status
cld caa 3735780 active
clcomd caa 5439664 active
clconfd caa 4915418 active
solidhac caa 6947064 active
solid caa 5701642 active
-------------------------------
NODE busan
-------------------------------
Subsystem Group PID Status
cld caa 3211462 active
clcomd caa 2687186 active
solid caa 6160402 active
clconfd caa 6488286 active
solidhac caa 5439698 active

Subsystem guide:
 cld determines whether the local node must become the primary or secondary solidDB
server in a failover.
 The solid subsystem is the database engine.
 The solidhac subsystem is used for the high availability of the solidDB server.
 The clconfd subsystem runs every 10 minutes to put any missed cluster configuration
changes into effect on the local node.

208 IBM PowerHA SystemMirror 7.1 for AIX


Cluster information using the lscluster command
CAA comes with a set of command-line tools, as explained in the following sections, that can
be used to monitor the status and statistics of a running cluster. For more information about
CAA and its functionalities, see Chapter 2, “Features of PowerHA SystemMirror 7.1” on
page 23.

Listing the cluster configuration: -c flag


Example 8-17 shows the cluster configuration by using the lscluster -c command.

Example 8-17 Listing the cluster configuration


seoul:/ # lscluster -c
Cluster query for cluster korea returns:
Cluster uuid: a01f47fe-d089-11df-95b5-a24e50543103
Number of nodes in cluster = 2
Cluster id for node busan is 1
Primary IP address for node busan is 10.168.101.44
Cluster id for node seoul is 2
Primary IP address for node seoul is 10.168.101.43
Number of disks in cluster = 2
for disk cldisk1 UUID = fe1e9f03-005b-3191-a3ee-4834944fcdeb
cluster_major = 0 cluster_minor = 1
for disk cldisk2 UUID = 428e30e8-657d-8053-d70e-c2f4b75999e2
cluster_major = 0 cluster_minor = 2
Multicast address for cluster is 228.168.101.43

Tip: The primary IP address shown for each node is the IP address chosen as the
communication path during cluster definition. In this case, the address is the same IP
address that is used as the persistent IP address.

The multicast address, when not specified by the administrator during cluster creation, is
composed by the number 228 followed by the last three octets of the communication path
from the node where the synchronization is executed. In this particular example, the
synchronization was run from the seoul node that has the communication path
192.168.101.43. Therefore, the multicast address for the cluster becomes 228.168.101.43
as can be observed in the output of lscluster -c command.

The nodes can use IPv6, but at least one of the interfaces in each node must be configured
with IPv4 to enable the CAA cluster multicasting.

Listing the cluster nodes configuration: -m flag


The -m flag has a different output in each node. In the output shown in Example 8-18, clcmd is
used to distribute the command over all cluster nodes.

Example 8-18 Listing the cluster nodes configuration


seoul:/ # clcmd lscluster -m
-------------------------------
NODE seoul
-------------------------------
Calling node query for all nodes
Node query number of nodes examined: 2

Node name: busan


Cluster shorthand id for node: 1

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 209


uuid for node: e356646e-c0dd-11df-b51d-a24e57e18a03
State of node: UP
Smoothed rtt to node: 7
Mean Deviation in network rtt to node: 3
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
korea local a01f47fe-d089-11df-95b5-a24e50543103

Number of points_of_contact for node: 2


Point-of-contact interface & contact state
en2 UP
en0 UP
------------------------------
Node name: seoul
Cluster shorthand id for node: 2
uuid for node: 4f8858be-c0dd-11df-930a-a24e50543103
State of node: UP NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
korea local a01f47fe-d089-11df-95b5-a24e50543103

Number of points_of_contact for node: 0


Point-of-contact interface & contact state
n/a

-------------------------------
NODE busan
-------------------------------
Calling node query for all nodes
Node query number of nodes examined: 2

Node name: busan


Cluster shorthand id for node: 1
uuid for node: e356646e-c0dd-11df-b51d-a24e57e18a03
State of node: UP NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
korea local a01f47fe-d089-11df-95b5-a24e50543103

Number of points_of_contact for node: 0


Point-of-contact interface & contact state
n/a
------------------------------
Node name: seoul
Cluster shorthand id for node: 2
uuid for node: 4f8858be-c0dd-11df-930a-a24e50543103
State of node: UP
Smoothed rtt to node: 7

210 IBM PowerHA SystemMirror 7.1 for AIX


Mean Deviation in network rtt to node: 3
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
korea local a01f47fe-d089-11df-95b5-a24e50543103

Number of points_of_contact for node: 2


Point-of-contact interface & contact state
en2 UP
en0 UP

Zone: Example 8-18 on page 209 mentions zones. A zone is a concept that is planned for
use in future versions of CAA, where the node can be part of different groups of machines.

Listing the cluster interfaces: -i flag


The korea cluster is configured with NPIV through the VIOS. To have SAN heartbeating, you
must direct SAN connection through Fibre Channel (FC) adapters. In Example 8-19, a cluster
with such requirements has been used to demonstrate the output.

Example 8-19 Listing the cluster interfaces


sydney:/ # lscluster -i
Network/Storage Interface Query

Cluster Name: au_cl


Cluster uuid: 0252a470-c216-11df-b85d-6a888564f202
Number of nodes reporting = 2
Number of nodes expected = 2
Node sydney
Node uuid = a6ac83d4-c1d4-11df-8953-6a888564f202
Number of interfaces discovered = 4
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 6a.88.85.64.f2.2
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x1e080863
ndd flags for interface = 0x21081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 6a.88.85.64.f2.4
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x1e080863

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 211


ndd flags for interface = 0x21081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.135 broadcast 192.168.203.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
Node perth
Node uuid = c89d962c-c1d4-11df-aa87-6a888dd67502
Number of interfaces discovered = 4
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 6a.88.8d.d6.75.2
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x1e080863
ndd flags for interface = 0x21081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.136 broadcast 192.168.103.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 6a.88.8d.d6.75.4
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x1e080863

212 IBM PowerHA SystemMirror 7.1 for AIX


ndd flags for interface = 0x21081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.136 broadcast 192.168.203.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED

rtt: The round-trip time (rtt) is calculated by using a mean deviation formula. Some
commands show rrt instead of rtt, which is believed to be a typographic error in the
command.

sfwcom: Storage Framework Communication (sfwcom) is the interface created by CAA for
SAN heartbeating. To enable sfwcom, the following prerequisites must be in place:
 Each node must have either a 4 GB or 8 GB FC adapter. If you are using vSCSI or
NPIV, VIOS 2.2.0.11-FP24 SP01 is the minimum level required.
 The adapters used for SAN heartbeating must have the tme (target mode enabled)
parameter set to yes. The Fibre Channel controller must have the parameter dyntrk set
to yes, and the parameter fc_err_recov set to fast_fail.
 All the adapters participating in the heartbeating must be in the same fabric zone. In the
previous example, sydney-fcs0 and perth-fcs0 are in the same fabric zone;
sydney-fcs1 and perth-fcs1 are in the same fabric zone.

dpcomm: The dpcomm interface is the actual repository disk. It means that, on top of the
Ethernet and the Fibre Channel adapters, the cluster also uses the repository disk as a
physical media to exchange heartbeats among the nodes.

Excluding configured interfaces: Currently you cannot exclude configured interfaces


from being used for cluster monitoring and communication. All network interfaces are used
for cluster monitoring and communication.

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 213


Listing the cluster storage interfaces: -d flag
Example 8-20 shows all storage disks that are participating in the cluster, including the
repository disk.

Example 8-20 Listing cluster storage interfaces


seoul:/ # clcmd lscluster -d
-------------------------------
NODE seoul
-------------------------------
Storage Interface Query

Cluster Name: korea


Cluster uuid: a01f47fe-d089-11df-95b5-a24e50543103
Number of nodes reporting = 2
Number of nodes expected = 2
Node seoul
Node uuid = 4f8858be-c0dd-11df-930a-a24e50543103
Number of disk discovered = 3
cldisk2
state : UP
uDid : 3E213600A0B8000114632000009554C8E0B010F1815 FAStT03IBMfcp
uUid : 428e30e8-657d-8053-d70e-c2f4b75999e2
type : CLUSDISK
cldisk1
state : UP
uDid : 3E213600A0B8000291B080000E90C05B0CD4B0F1815 FAStT03IBMfcp
uUid : fe1e9f03-005b-3191-a3ee-4834944fcdeb
type : CLUSDISK
caa_private0
state : UP
uDid :
uUid : 03e41dc1-3b8d-c422-3426-f1f61c567cda
type : REPDISK
Node seoul
Node uuid = 4f8858be-c0dd-11df-930a-a24e50543103
Number of disk discovered = 3
cldisk2
state : UP
uDid : 3E213600A0B8000114632000009554C8E0B010F1815 FAStT03IBMfcp
uUid : 428e30e8-657d-8053-d70e-c2f4b75999e2
type : CLUSDISK
cldisk1
state : UP
uDid : 3E213600A0B8000291B080000E90C05B0CD4B0F1815 FAStT03IBMfcp
uUid : fe1e9f03-005b-3191-a3ee-4834944fcdeb
type : CLUSDISK
caa_private0
state : UP
uDid :
uUid : 03e41dc1-3b8d-c422-3426-f1f61c567cda
type : REPDISK

-------------------------------
NODE busan
-------------------------------

214 IBM PowerHA SystemMirror 7.1 for AIX


Storage Interface Query

Cluster Name: korea


Cluster uuid: a01f47fe-d089-11df-95b5-a24e50543103
Number of nodes reporting = 2
Number of nodes expected = 2
Node busan
Node uuid = e356646e-c0dd-11df-b51d-a24e57e18a03
Number of disk discovered = 3
cldisk1
state : UP
uDid : 3E213600A0B8000291B080000E90C05B0CD4B0F1815 FAStT03IBMfcp
uUid : fe1e9f03-005b-3191-a3ee-4834944fcdeb
type : CLUSDISK
cldisk2
state : UP
uDid : 3E213600A0B8000114632000009554C8E0B010F1815 FAStT03IBMfcp
uUid : 428e30e8-657d-8053-d70e-c2f4b75999e2
type : CLUSDISK
caa_private0
state : UP
uDid :
uUid : 03e41dc1-3b8d-c422-3426-f1f61c567cda
type : REPDISK
Node busan
Node uuid = e356646e-c0dd-11df-b51d-a24e57e18a03
Number of disk discovered = 3
cldisk1
state : UP
uDid : 3E213600A0B8000291B080000E90C05B0CD4B0F1815 FAStT03IBMfcp
uUid : fe1e9f03-005b-3191-a3ee-4834944fcdeb
type : CLUSDISK
cldisk2
state : UP
uDid : 3E213600A0B8000114632000009554C8E0B010F1815 FAStT03IBMfcp
uUid : 428e30e8-657d-8053-d70e-c2f4b75999e2
type : CLUSDISK
caa_private0
state : UP
uDid :
uUid : 03e41dc1-3b8d-c422-3426-f1f61c567cda
type : REPDISK

Listing the network statistics: -s flag


Example 8-21 shows overall statistics about cluster heartbeating and the gossip protocol
used for nodes communication.

Example 8-21 Listing the network statistics


seoul:/ # lscluster -s
Cluster Statistics:

Cluster Network Statistics:

pkts seen:194312 pkts passed:66305

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 215


IP pkts:126210 UDP pkts:127723
gossip pkts sent:22050 gossip pkts recv:64076
cluster address pkts:0 CP pkts:127497
bad transmits:0 bad posts:0
short pkts:0 multicast pkts:127768
cluster wide errors:0 bad pkts:0
dup pkts:3680 pkt fragments:0
fragments queued:0 fragments freed:0
requests dropped:0 pkts routed:0
pkts pulled:0 no memory:0
rxmit requests recv:21 requests found:21
requests missed:0 ooo pkts:2
requests reset sent:0 reset recv:0
requests lnk reset send :0 reset lnk recv:0
rxmit requests sent:5
alive pkts sent:0 alive pkts recv:0
ahafs pkts sent:17 ahafs pkts recv:7
nodedown pkts sent:0 nodedown pkts recv:0
socket pkts sent:733 socket pkts recv:414
cwide pkts sent:230 cwide pkts recv:230
socket pkts no space:0 pkts recv notforhere:0
stale pkts recv:0 other cluster pkts:0
storage pkts sent:1 storage pkts recv:1
out-of-range pkts recv:0

8.3 Collecting information after a cluster is running


Up to this point, all the examples in this chapter collected information about a non-running
PowerHA 7.1 cluster. This section explains how to obtain valuable information from a
configured and running cluster.

WebSMIT: WebSMIT is no longer a supported tool.

8.3.1 AIX commands and log files


AIX 7.1, which is used in the korea cluster, provides a set of tools that can be used to collect
relevant information about the cluster, cluster services, and cluster device status. This section
shows examples of that type of information.

Disk configuration
All the volume groups controlled by a resource group are shown as concurrent on both sides
as shown in Example 8-22.

Example 8-22 Listing disks


seoul:/ # clcmd lspv
-------------------------------
NODE seoul
-------------------------------
hdisk0 00c0f6a088a155eb rootvg active
caa_private0 00c0f6a077839da7 caavg_private active
cldisk2 00c0f6a0107734ea pokvg concurrent

216 IBM PowerHA SystemMirror 7.1 for AIX


cldisk1 00c0f6a010773532 pokvg concurrent

-------------------------------
NODE busan
-------------------------------
hdisk0 00c0f6a089390270 rootvg active
caa_private0 00c0f6a077839da7 caavg_private active
cldisk2 00c0f6a0107734ea pokvg concurrent
cldisk1 00c0f6a010773532 pokvg concurrent

Multicast information
When compared with the multicast information collected when the cluster is not configured,
the netstat command shows that the 228.168.101.43 address is present in the table. See
Example 8-23.

Example 8-23 Multicast information


seoul:/ # netstat -a -I en0
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en0 1500 link#2 a2.4e.50.54.31.3 82472 0 53528 0 0
01:00:5e:28:65:2b
01:00:5e:7f:ff:fd
01:00:5e:00:00:01
en0 1500 192.168.100 seoul-b1 82472 0 53528 0 0
228.168.101.43
239.255.255.253
224.0.0.1
en0 1500 10.168.100 seoul 82472 0 53528 0 0
228.168.101.43
239.255.255.253
224.0.0.1

seoul:/ # netstat -a -I en2


Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en2 1500 link#3 a2.4e.50.54.31.7 44673 0 22119 0 0
01:00:5e:7f:ff:fd
01:00:5e:28:65:2b
01:00:5e:00:00:01
en2 1500 192.168.200 seoul-b2 44673 0 22119 0 0
239.255.255.253
228.168.101.43
224.0.0.1
en2 1500 10.168.100 poksap-db 44673 0 22119 0 0
239.255.255.253
228.168.101.43
224.0.0.1

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 217


Status of the cluster
When the PowerHA cluster is running, its status changes from ST_INIT to ST_STABLE as
shown in Example 8-24.

Example 8-24 PowerHA cluster status


seoul:/ # lssrc -ls clstrmgrES
Current state: ST_STABLE
sccsid = "$Header: @(#) 61haes_r710_integration/13
43haes/usr/sbin/cluster/hacmprd/main.C, hacmp, 61haes_r710, 1034A_61haes_r710
2010-08-19T1
0:34:17-05:00$"
i_local_nodeid 1, i_local_siteid -1, my_handle 2
ml_idx[1]=0 ml_idx[2]=1
There are 0 events on the Ibcast queue
There are 0 events on the RM Ibcast queue
CLversion: 12 # Note: Version 12 represents PowerHA SystemMirror 7.1
local node vrmf is 7101
cluster fix level is "1"
The following timer(s) are currently active:
Current DNP values
DNP Values for NodeId - 1 NodeName - busan
PgSpFree = 1308144 PvPctBusy = 0 PctTotalTimeIdle = 98.105654
DNP Values for NodeId - 2 NodeName - seoul
PgSpFree = 1307899 PvPctBusy = 0 PctTotalTimeIdle = 96.912367

Group Services information


Previous versions of PowerHA use the grpsvcs subsystem. PowerHA 7.1 uses the cthags
subsystem. The output of the lssrc -ls cthags command has similar information to what
used to be presented by the lssrc -ls grpsvcs command. Example 8-25 shows this output.

Example 8-25 Output of the lssrc -ls cthags command


seoul:/ # lssrc -ls cthags
Subsystem Group PID Status
cthags cthags 6095048 active
5 locally-connected clients. Their PIDs:
6160578(IBM.ConfigRMd) 1966256(rmcd) 3604708(IBM.StorageRMd) 7078046(clstrmgr)
14680286(gsclvmd)
HA Group Services domain information:
Domain established by node 1
Number of groups known locally: 8
Number of Number of local
Group name providers providers/subscribers
rmc_peers 2 1 0
s00O3RA00009G0000015CDBQGFL 2 1 0
IBM.ConfigRM 2 1 0
IBM.StorageRM.v1 2 1 0
CLRESMGRD_1108531106 2 1 0
CLRESMGRDNPD_1108531106 2 1 0
CLSTRMGR_1108531106 2 1 0
d00O3RA00009G0000015CDBQGFL 2 1 0

Critical clients will be terminated if unresponsive

218 IBM PowerHA SystemMirror 7.1 for AIX


Network configuration and routing table
The service IP address is added to an interface on the node where the resource group is
started. The routing table also keeps the service IP address. The multicast address is not
displayed in the routing table. See Example 8-26.

Example 8-26 Network configuration and routing table


seoul:/ # clcmd ifconfig -a
-------------------------------
NODE seoul
-------------------------------
en0: en0:
flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT
,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 192.168.101.143 netmask 0xffffff00 broadcast 192.168.103.255
inet 10.168.101.43 netmask 0xffffff00 broadcast 10.168.103.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
en2:
flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT
,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 192.168.201.143 netmask 0xffffff00 broadcast 192.168.203.255
inet 10.168.101.143 netmask 0xffffff00 broadcast 10.168.103.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
lo0:
flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LAR
GESEND,CHAIN>
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1%1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
-------------------------------
NODE busan
-------------------------------
en0:
flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT
,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 192.168.101.144 netmask 0xffffff00 broadcast 192.168.103.255
inet 10.168.101.44 netmask 0xffffff00 broadcast 10.168.103.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
en2:
flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT
,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 192.168.201.144 netmask 0xffffff00 broadcast 192.168.203.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
lo0:
flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LAR
GESEND,CHAIN>
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1%1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1

seoul:/ # clcmd netstat -rn


-------------------------------
NODE seoul
-------------------------------
Routing tables
Destination Gateway Flags Refs Use If Exp Groups

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 219


Route tree for Protocol Family 2 (Internet):
default 192.168.100.60 UG 1 4187 en0 - -
10.168.100.0 10.168.101.43 UHSb 0 0 en0 - - =>
10.168.100.0 10.168.101.143 UHSb 0 0 en2 - - =>
10.168.100/22 10.168.101.43 U 7 56800 en0 - - =>
10.168.100/22 10.168.101.143 U 3 1770 en2 - -
10.168.101.43 127.0.0.1 UGHS 10 33041 lo0 - -
10.168.101.143 127.0.0.1 UGHS 1 72 lo0 - -
10.168.103.255 10.168.101.43 UHSb 0 0 en0 - - =>
10.168.103.255 10.168.101.143 UHSb 0 0 en2 - -
127/8 127.0.0.1 U 15 16316 lo0 - -
192.168.100.0 192.168.101.143 UHSb 0 0 en0 - - =>
192.168.100/22 192.168.101.143 U 2 1201 en0 - -
192.168.101.143 127.0.0.1 UGHS 0 18 lo0 - -
192.168.103.255 192.168.101.143 UHSb 0 43 en0 - -
192.168.200.0 192.168.201.143 UHSb 0 0 en2 - - =>
192.168.200/22 192.168.201.143 U 0 2 en2 - -
192.168.201.143 127.0.0.1 UGHS 0 4 lo0 - -
192.168.203.255 192.168.201.143 UHSb 0 0 en2 - -

Route tree for Protocol Family 24 (Internet v6):


::1%1 ::1%1 UH 2 4180 lo0 - -

-------------------------------
NODE busan
-------------------------------
Routing tables
Destination Gateway Flags Refs Use If Exp Groups

Route tree for Protocol Family 2 (Internet):


default 192.168.100.60 UG 1 2012 en0 - -
10.168.100.0 10.168.101.44 UHSb 0 0 en0 - - =>
10.168.100/22 10.168.101.44 U 23 54052 en0 - -
10.168.101.44 127.0.0.1 UGHS 10 5706 lo0 - -
10.168.103.255 10.168.101.44 UHSb 0 0 en0 - -
127/8 127.0.0.1 U 19 3803 lo0 - -
192.168.100.0 192.168.101.144 UHSb 0 0 en0 - - =>
192.168.100/22 192.168.101.144 U 3 1953 en0 - -
192.168.101.144 127.0.0.1 UGHS 0 14 lo0 - -
192.168.103.255 192.168.101.144 UHSb 2 27 en0 - -
192.168.200.0 192.168.201.144 UHSb 0 0 en2 - - =>
192.168.200/22 192.168.201.144 U 0 2 en2 - -
192.168.201.144 127.0.0.1 UGHS 0 4 lo0 - -
192.168.203.255 192.168.201.144 UHSb 0 0 en2 - -

Route tree for Protocol Family 24 (Internet v6):


::1%1 ::1%1 UH 6 876 lo0 - -

Using tcpdump, iptrace, and mping utilities to monitor multicast traffic


With the introduction of the multicast address and the gossip protocol, the cluster
administrator can use tools to monitor Ethernet heartbeating. The following sections explain
how to use the native tcpdump, iptrace, and mping native AIX tools for this type of monitoring.

220 IBM PowerHA SystemMirror 7.1 for AIX


The tcpdump utility
You can dump all the traffic between the seoul node and the multicast address
228.168.101.43 by using the tcpdump utility. Observe that the UDP packets originate in the
base or boot addresses of the interfaces, not in the persistent or service IP labels.
Example 8-27 shows how to list the available interfaces and then capture traffic for the en2
interface.

Example 8-27 Multicast packet monitoring for the seoul node using the tcpdump utility
seoul:/ # tcpdump -D
1.en0
2.en2
3.lo0

seoul:/ # tcpdump -t -i2 -v ip and host 228.168.101.43


tcpdump: listening on en0, link-type 1, capture size 96 bytes
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP (17), length: 1478)
seoul-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP (17), length: 1478)
seoul-b2.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x02 ttl 32, id 0, offset 0, flags [none], proto: UDP (17), length: 1478)
seoul-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP (17), length: 1478)
seoul-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP (17), length: 1478)
seoul-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP (17), length: 1478)
seoul-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP (17), length: 1478)
seoul-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP (17), length: 1478)
seoul-b2.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450

The same information is captured on the busan node as shown in Example 8-28.

Example 8-28 Multicast packet monitoring for the busan node using the tcpdump utility
busan:/tmp # tcpdump -D
1.en0
2.en2
3.lo0

busan:/ # tcpdump -t -i2 -v ip and host 228.168.101.43


IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP (17), length: 1478)
busan-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP (17), length: 1478)
busan-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP (17), length: 1478)
busan-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP (17), length: 1478)
busan-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP (17), length: 1478)
busan-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP (17), length: 1478)
busan-b2.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 221


IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP (17), length: 1478)
busan-b2.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP (17), length: 1478)
busan-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450

You can also see the multicast traffic for all the PowerHA 7.1 clusters in your LAN segment.
The following command generates the output:
seoul:/ # tcpdump -n -vvv port drmsfsd

The iptrace utility


The iptrace utility provides a more detailed packet tracing information compared to the
tcpdump utility. Both the en0 (MAC address A24E50543103) and en2 (MAC address
A24E50543107) interfaces are generating packets toward the cluster multicast address
228.168.101.43 as shown in Example 8-29.

Example 8-29 The iptrace utility for monitoring multicast packets


seoul:/tmp # iptrace -a -s 228.168.101.43 -b korea_cluster.log; sleep 30
[10289364]

seoul:/tmp # kill -9 10289364

seoul:/tmp # /usr/sbin/ipreport korea_cluster.log | more


IPTRACE version: 2.0
====( 1492 bytes transmitted on interface en0 )==== 12:49:17.384871427
ETHERNET packet : [ a2:4e:50:54:31:03 -> 01:00:5e:28:65:2b ] type 800 (IP)
IP header breakdown:
< SRC = 192.168.101.143 > (seoul-b1)
< DST = 228.168.101.43 >
ip_v=4, ip_hl=20, ip_tos=0, ip_len=1478, ip_id=0, ip_off=0
ip_ttl=32, ip_sum=251c, ip_p = 17 (UDP)
UDP header breakdown:
<source port=4098(drmsfsd), <destination port=4098(drmsfsd) >
[ udp length = 1458 | udp checksum = 0 ]
00000000 00000009 100234c8 00000030 00000000 |......4....0....|
00000010 1be40fb0 c19311df 920ca24e 50543103 |...........NPT1.|
********
00000030 ffffffff ffffffff ffffffff ffffffff |................|
00000040 00001575 00000000 00000000 00000000 |...u............|
00000050 00000000 00000003 00000000 00000000 |................|
00000060 00000000 00000000 00020001 00020fb0 |................|
00000070 c19311df 1be40fb0 c19311df 920ca24e |...............N|
00000080 50543103 0000147d 00000000 4f8858be |PT1....}....O.X.|
00000090 c0dd11df 930aa24e 50543103 00000000 |.......NPT1.....|
000000a0 00000000 00000000 00000000 00000000 |................|
********
000005a0 00000000 00000000 0000 |.......... |

====( 1492 bytes transmitted on interface en0 )==== 12:49:17.388085181


ETHERNET packet : [ a2:4e:50:54:31:03 -> 01:00:5e:28:65:2b ] type 800 (IP)
IP header breakdown:
< SRC = 192.168.101.143 > (seoul-b1)
< DST = 228.168.101.43 >
ip_v=4, ip_hl=20, ip_tos=0, ip_len=1478, ip_id=0, ip_off=0
ip_ttl=32, ip_sum=251c, ip_p = 17 (UDP)

222 IBM PowerHA SystemMirror 7.1 for AIX


UDP header breakdown:
<source port=4098(drmsfsd), <destination port=4098(drmsfsd) >
[ udp length = 1458 | udp checksum = 0 ]
00000000 00000004 10021002 00000070 00000000 |...........p....|
00000010 1be40fb0 c19311df 920ca24e 50543103 |...........NPT1.|
********
00000030 ffffffff ffffffff ffffffff ffffffff |................|
00000040 00001575 00000000 00000000 00000000 |...u............|
00000050 f1000815 b002b8a0 00000000 00000000 |................|
00000060 00000000 00000000 0002ffff 00010000 |................|
00000070 00000000 00000000 00000000 00000000 |................|
00000080 00000000 00000d7a 00000000 00000000 |.......z........|
00000090 00000000 00000000 00000000 00000000 |................|
000000a0 00000000 00020000 00000000 00000000 |................|
000000b0 00000000 00000000 00000000 00001575 |...............u|
000000c0 00000001 4f8858be c0dd11df 930aa24e |....O.X........N|
000000d0 50543103 00000001 00000000 00000000 |PT1.............|
000000e0 00000000 00000000 00000000 00000000 |................|
********
000005a0 00000000 00000000 0000 |.......... |

====( 1492 bytes transmitted on interface en2 )==== 12:49:17.394219029


ETHERNET packet : [ a2:4e:50:54:31:07 -> 01:00:5e:28:65:2b ] type 800 (IP)
IP header breakdown:
< SRC = 192.168.201.143 > (seoul-b2)
< DST = 228.168.101.43 >
ip_v=4, ip_hl=20, ip_tos=0, ip_len=1478, ip_id=0, ip_off=0
ip_ttl=32, ip_sum=c11b, ip_p = 17 (UDP)
UDP header breakdown:
<source port=4098(drmsfsd), <destination port=4098(drmsfsd) >
[ udp length = 1458 | udp checksum = 0 ]
00000000 00000009 100234c8 00000030 00000000 |......4....0....|
00000010 a01f47fe d08911df 95b5a24e 50543103 |..G........NPT1.|
********
00000030 ffffffff ffffffff ffffffff ffffffff |................|
00000040 00000fab 00000000 00000000 00000000 |................|
00000050 00000000 00000003 00000000 00000000 |................|
00000060 00000000 00000000 00020001 000247fe |..............G.|
00000070 d08911df a01f47fe d08911df 95b5a24e |......G........N|
00000080 50543103 000014b4 00000000 4f8858be |PT1.........O.X.|
00000090 c0dd11df 930aa24e 50543103 00000000 |.......NPT1.....|
000000a0 00000000 00000000 00000000 00000000 |................|
********
000005a0 00000000 00000000 0000 |.......... |
.
.
.

Tip: You can observer the multicast address in the last line of the lscluster -c CAA
command.

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 223


The mping utility
You can also use the mping utility to test the multicast connectivity. One node acts as a sender
of packets, and the other node acts as a receiver of packets. You trigger the command on
both nodes at the same time as shown in Example 8-30.

Example 8-30 Using the mping utility to test multicast connectivity


seoul:/ # mping -v -s -a 228.168.101.43
mping version 1.0
Localhost is seoul, 10.168.101.43
mpinging 228.168.101.43/4098 with ttl=32:
32 bytes from 10.168.101.44: seqno=1 ttl=32 time=0.260 ms
32 bytes from 10.168.101.44: seqno=1 ttl=32 time=0.326 ms
32 bytes from 10.168.101.44: seqno=1 ttl=32 time=0.344 ms
32 bytes from 10.168.101.44: seqno=1 ttl=32 time=0.361 ms
32 bytes from 10.168.101.44: seqno=2 ttl=32 time=0.235 ms
32 bytes from 10.168.101.44: seqno=2 ttl=32 time=0.261 ms
32 bytes from 10.168.101.44: seqno=2 ttl=32 time=0.299 ms
32 bytes from 10.168.101.44: seqno=2 ttl=32 time=0.317 ms
32 bytes from 10.168.101.44: seqno=3 ttl=32 time=0.216 ms
32 bytes from 10.168.101.44: seqno=3 ttl=32 time=0.262 ms
32 bytes from 10.168.101.44: seqno=3 ttl=32 time=0.282 ms
32 bytes from 10.168.101.44: seqno=3 ttl=32 time=0.300 ms

busan:/ # mping -v -r -a 228.168.101.43


mping version 1.0
Localhost is busan, 10.168.101.44
Listening on 228.168.101.43/4098:
Replying to mping from 10.168.101.43 bytes=32 seqno=1 ttl=32
Replying to mping from 10.168.101.43 bytes=32 seqno=1 ttl=32
Discarding receiver packet
Discarding receiver packet
Replying to mping from 10.168.101.43 bytes=32 seqno=2 ttl=32
Replying to mping from 10.168.101.43 bytes=32 seqno=2 ttl=32
Discarding receiver packet
Discarding receiver packet
Replying to mping from 10.168.101.43 bytes=32 seqno=3 ttl=32
Replying to mping from 10.168.101.43 bytes=32 seqno=3 ttl=32
Discarding receiver packet
Discarding receiver packet

8.3.2 CAA commands and log files


This section explains the commands specifically for gathering CAA-related information and
the associated log files.

Cluster information
The CAA comes with a set of command-line tools, as explained in “Cluster information using
the lscluster command” on page 209. These tools can be used to monitor the status and
statistics of a running cluster. For more information about CAA and its functionalities, see
Chapter 2, “Features of PowerHA SystemMirror 7.1” on page 23.

224 IBM PowerHA SystemMirror 7.1 for AIX


Cluster repository disk, CAA, and solidDB
This section provides additional information about the cluster repository disk, CAA, and
solidDB.

UUID
The UUID of the caa_private0 disk is stored as a cluster0 device attribute as shown in
Example 8-31.

Example 8-31 The cluster0 device attributes


seoul:/ # lsattr -El cluster0
clvdisk 03e41dc1-3b8d-c422-3426-f1f61c567cda Cluster repository disk identifier True
node_uuid 4f8858be-c0dd-11df-930a-a24e50543103 OS image identifier True

Example 8-32 also shows the UUID.

Example 8-32 UUID


caa_private0
state : UP
uDid :
uUid : 03e41dc1-3b8d-c422-3426-f1f61c567cda
type : REPDISK

The repository disk contains logical volumes for the bootstrap and solidDB file systems as
shown in Example 8-33.

Example 8-33 Repository logical volumes


seoul:/ # lsvg -l caavg_private
caavg_private:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
caalv_private1 boot 1 1 1 closed/syncd N/A
caalv_private2 boot 1 1 1 closed/syncd N/A
caalv_private3 boot 4 4 1 open/syncd N/A
fslv00 jfs2 4 4 1 closed/syncd /clrepos_private1
fslv01 jfs2 4 4 1 open/syncd /clrepos_private2
powerha_crlv boot 1 1 1 closed/syncd N/A

Querying the bootstrap repository


Example 8-34 shows the bootstrap repository.

Example 8-34 Querying the bootstrap repository


seoul:/ # /usr/lib/cluster/clras dumprepos
HEADER
CLUSRECID: 0xa9c2d4c2
Name: korea
UUID: a01f47fe-d089-11df-95b5-a24e50543103
SHID: 0x0
Data size: 1536
Checksum: 0xc197
Num zones: 0
Dbpass: a0305b84_d089_11df_95b5_a24e50543103
Multicast: 228.168.101.43

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 225


DISKS
name devno uuid udid
cldisk1 1 fe1e9f03-005b-3191-a3ee-4834944fcdeb 3E213600A0B8000291B080000E90C05B0CD4B0F1815
FAStT03IBMfcp
cldisk2 2 428e30e8-657d-8053-d70e-c2f4b75999e2 3E213600A0B8000114632000009554C8E0B010F1815
FAStT03IBMfcp

NODES
numcl numz uuid shid name
0 0 4f8858be-c0dd-11df-930a-a24e50543103 2 seoul
0 0 e356646e-c0dd-11df-b51d-a24e57e18a03 1 busan

ZONES
none

The solidDB status


You can use the command shown in Example 8-35 to check which node currently hosts the
active solidDB database.

Example 8-35 The solidDB status


seoul:/ # clcmd /opt/cluster/solidDB/bin/solcon -x pwdfile:/etc/cluster/dbpass -e "hsb state"
"tcp 2188" caa
-------------------------------
NODE seoul
-------------------------------
IBM solidDB Remote Control - Version 6.5.0.0 Build 0010
(c) Solid Information Technology Ltd. 1993, 2009
SECONDARY ACTIVE
-------------------------------
NODE busan
-------------------------------
IBM solidDB Remote Control - Version 6.5.0.0 Build 0010
(c) Solid Information Technology Ltd. 1993, 2009
PRIMARY ACTIVE

Tip: The solidDB database is not necessarily active in the same node where the PowerHA
resource group is active. You can see this difference when comparing Example 8-35 with
the output of the clRGinfo command:
seoul:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
db2pok_Resourc ONLINE seoul
OFFLINE busan

In this case, the solidDB database has the primary database active in the busan node, and
the PowerHA resource group is currently settled in the seoul node.

226 IBM PowerHA SystemMirror 7.1 for AIX


Another way to check which node has solidDB active is to use the lssrc command.
Example 8-36 shows that solidDB is active in the seoul node. Observe the line that says
“Group Leader.”

Example 8-36 Using the lssrc command to check where solidDB is active
seoul:/ # lssrc -ls IBM.StorageRM
Subsystem : IBM.StorageRM
PID : 7077950
Cluster Name : korea
Node Number : 2
Daemon start time : 10/05/10 10:06:57

PeerNodes: 2
QuorumNodes: 2

Group IBM.StorageRM.v1:
ConfigVersion: 0x24cab3184
Providers: 2
QuorumMembers: 2
Group Leader: seoul, 0xdc82faf0908920dc, 2

Information from malloc about memory use:


Total Space : 0x00be0280 (12452480)
Allocated Space: 0x007ec198 (8307096)
Unused Space : 0x003ed210 (4117008)
Freeable Space : 0x00000000 (0)

Information about trace levels:


_SEU Errors=255 Info=0 API=0 Buffer=0 SvcTkn=0 CtxTkn=0
_SEL Errors=255 Info=0 API=0 Buffer=0 Perf=0
_SEI Error=0 API=0 Mapping=0 Milestone=0 Diag=0
_SEA Errors=255 Info=0 API=0 Buffer=0 SVCTKN=0 CTXTKN=0
_MCA Errors=255 Info=0 API=0 Callbacks=0 Responses=0 RspPtrs=0
Protocol=0 APItoProto=0 PrototoRsp=0 CommPath=0 Thread=0 ThreadCtrl=0
RawProtocol=0 Signatures=0
_RCA RMAC_SESSION=0 RMAC_COMMANDGROUP=0 RMAC_REQUEST=0 RMAC_RESPONSE=0
RMAC_CALLBACK=0
_CAA Errors=255 Info=0 Debug=0 AUA_Blobs=0 AHAFS_Events=0
_GSA Errors=255 Info=2 GSCL=0 Debug=0
_SRA API=0 Errors=255 Wherever=0
_RMA Errors=255 Info=0 API=0 Thread=0 Method=0 Object=0
Protocol=0 Work=0 CommPath=0
_SKD Errors=255 Info=0 Debug=0
_SDK Errors=255 Info=0 Exceptions=0
_RMF Errors=255 Info=2 Debug=0
_STG Errors=255 Info=1 Event=1 Debug=0
/var/ct/2W7qV~q8aHtvMreavGL343/log/mc/IBM.StorageRM/trace -> spooling not enabled

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 227


Using the solidDB SQL interface
You can also retrieve some information shown by the lscluster command by using the
solidDB SQL interface as shown in Example 8-37 and Example 8-38 on page 229.

Example 8-37 The solidDB SQL interface (view from left side of code)
seoul:/ # /opt/cluster/solidDB/bin/solsql -x pwdfile:/etc/cluster/dbpass "tcp 2188" caa

IBM solidDB SQL Editor (teletype) - Version: 6.5.0.0 Build 0010


(c) Solid Information Technology Ltd. 1993, 2009
Connected to 'tcp 2188'.
Execute SQL statements terminated by a semicolon.
Exit by giving command: exit;
list schemas;
RESULT
------
Catalog: CAA
SCHEMAS:
--------
CAA
35193956_C193_11DF_A3EA_A24E50543103
36FC3B56_C193_11DF_A29A_A24E50543103
1 rows fetched.
list tables;
RESULT
------
Catalog: CAA
Schema: CAA
TABLES:
-------
CLUSTERS
NODES
REPOSNAMESPACE
REPOSSTORES
SHAREDDISKS
INTERFACES
INTERFACE_ATTRS
PARENT_CHILD
ENTITIES
1 rows fetched.
select * from clusters;
CLUSTER_ID CLUSTER_NAME ETYPE ESUBTYPE GLOB_ID UUID
---------- ------------ ----- -------- ------- ----
1 SIRCOL_UNKNOWN 4294967296 32 4294967297 00000000-0000-0000-0000-000000000000
2 korea 4294967296 32 4294967296 a01f47fe-d089-11df-95b5-a24e50543103
2 rows fetched.
select * from nodes;

NODES_ID NODE_NAME ETYPE ESUBTYPE GLOB_ID UUID


-------- --------- ----- -------- ------- ----
1 busan 8589934592 0 8589934593 e356646e-c0dd-11df-b51d-a24e57e18a03
2 seoul 8589934592 0 85899345944 f8858be-c0dd-11df-930a-a24e50543103

2 rows fetched.
select * from SHAREDDISKS;

228 IBM PowerHA SystemMirror 7.1 for AIX


SHARED_DISK_ID DISK_NAME ETYPE GLOB_ID UUID
-------------- --------- ----- ------- ----
1 cldisk2 34359738368 34359738370 428e30e8-657d-8053-d70e-c2f4b75999e2
2 cldisk1 34359738368 34359738369 fe1e9f03-005b-3191-a3ee-4834944fcdeb
2 rows fetched.

Example 8-38 Using the solidDB SQL interface (view from right side starting at CLUSTER_ID row)
VERIFIED_STATUS ESTATE VERSION_OPERATING VERSION_CAPABLE MULTICAST
--------------- ------ ----------------- --------------- ---------
NULL 1 1 1 0
NULL 1 1 1 0

VERIFIED_STATUS PARENT_CLUSTER_ID ESTATE VERSION_OPERATING VERSION_CAPABLE


--------------- ----------------- ------ ----------------- ---------------
NULL 2 1 1 1
NULL 2 1 1 1

VERIFIED_STATUS PARENT_CLUSTER_ID ESTATE VERSION_OPERATING VERSION_CAPABLE


--------------- ----------------- ------ ----------------- ---------------
NULL 2 1 1 1
NULL 2 1 1 1

SIRCOL: SIRCOL stands for Storage Interconnected Resource Collection.

The /var/adm/ras/syslog.caa log file


The mkcluster, chcluster and rmcluster commands (and their underlying APIs) use the
syslogd daemon for error logging. The cld and clconfd daemons and the clusterconf
command also use syslogd facility for error logging. For that purpose, when PowerHA 7.1 file
sets are installed, the following line is added to the /etc/syslog.conf file:
*.info /var/adm/ras/syslog.caa rotate size 1m files 10

This file keeps all the logs about CAA activity, including the error outputs from the commands.
Example 8-39 shows an error caught in the /var/adm/ras/syslog.caa file during the cluster
definition. The chosen repository disk has already been part of a repository in the past and
had not been cleaned up.

Example 8-39 Output of the /var/adm/ras/syslog.caa file


Sep 16 08:58:14 seoul user:err|error syslog: validate_device: Specified device,
hdisk1, is a repository.
Sep 16 08:58:14 seoul user:warn|warning syslog: To force cleanup of this disk, use
rmcluster -r hdisk1

# It also keeps track of all PowerHA SystemMirror events. Example:


Sep 16 09:40:40 seoul user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED:
acquire_service_addr 0
Sep 16 09:40:42 seoul user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED:
rg_move seoul 1 ACQUIRE 0

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 229


Sep 16 09:40:42 seoul user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED:
rg_move_acquire seoul 1 0
Sep 16 09:40:42 seoul user:notice PowerHA SystemMirror for AIX: EVENT START:
rg_move_complete seoul 1
Sep 16 09:40:42 seoul user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED:
rg_move_complete seoul 1 0
Sep 16 09:40:44 seoul user:notice PowerHA SystemMirror for AIX: EVENT START:
node_up_complete seoul
Sep 16 09:40:44 seoul user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED:
node_up_complete seoul 0

Tip: To capture debug information, you can replace *.info with *.debug in the
/etc/syslog.conf file, followed by a syslogd daemon refresh. Given that the output in
debug mode provides much information, redirect the syslogd output to a file system other
than /, /var, or /tmp.

The solidDB log files


The solidDB daemons keep log files on file systems over the repository disk in every node
inside the solidDB directory as shown in Example 8-40.

Example 8-40 The solidDB log files and directories


seoul:/ # lsvg -l caavg_private
caavg_private:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
caalv_private1 boot 1 1 1 closed/syncd N/A
caalv_private2 boot 1 1 1 closed/syncd N/A
caalv_private3 boot 4 4 1 open/syncd N/A
fslv00 jfs2 4 4 1 closed/syncd
/clrepos_private1
fslv01 jfs2 4 4 1 open/syncd
/clrepos_private2
powerha_crlv boot 1 1 1 closed/syncd N/A

seoul:/ # ls -lrt /clrepos_private2


total 8
drwxr-xr-x 2 root system 256 Sep 16 09:05 lost+found
drwxr-xr-x 4 bin bin 4096 Sep 17 14:32 solidDB

seoul:/ # ls -lrt /clrepos_private2/solidDB


total 18608
-r-xr-xr-x 1 root system 650 Feb 20 2010 solid.lic
-r-xr-xr-x 1 root system 5246 Jun 6 18:54 caa.sql
-r-xr-xr-x 1 root system 5975 Aug 7 15:53 solid.ini
d--x------ 2 root system 256 Aug 7 23:10 .sec
-r-x------ 1 root system 322 Sep 17 12:06 solidhac.ini
drwxr-xr-x 2 bin bin 256 Sep 17 12:06 logs
-rw------- 1 root system 8257536 Sep 17 12:06 solid.db
-rw-r--r-- 1 root system 18611 Sep 17 12:06 hacmsg.out
-rw------- 1 root system 1054403 Sep 17 14:32 solmsg.bak
-rw------- 1 root system 166011 Sep 17 15:03 solmsg.out

230 IBM PowerHA SystemMirror 7.1 for AIX


seoul:/ # ls -lrt /clrepos_private2/solidDB/logs
total 32
-rw------- 1 root system 16384 Sep 17 12:07 sol00002.log

Explanation of file names:


 The solid daemon generates the solmsg.out log file.
 The solidhac daemon generates the hacmsg.out log file.
 The solid.db file is the database itself, and the logs directory contains the database
transaction logs.
 The solid.ini files are the configuration files for the solid daemons; the solidhac.ini
files are the configuration files for the solidhac daemons.

Collecting CAA debug information for IBM support


The CAA component is now included in the snap command. The snap -e and clsnap
commands collect all the necessary information for IBM support. The snap command gathers
the following files from each node, compressing them into a .pax file:
LOG lscluster_zones
bootstrap_repository solid_lssrc
clrepos1_solidDB.tar solid_lssrc_S
dbpass solid_select_sys_tables
lscluster_clusters solid_select_tables
lscluster_network_interfaces syslog_caa
lscluster_network_statistics system_proc_version
lscluster_nodes system_uname
lscluster_storage_interfaces

8.3.3 PowerHA 7.1 cluster monitoring tools


PowerHA 7.1 comes with many commands and utilities that an administrator can use to
monitor the cluster. This section explains those tools that are most commonly used.

Using the clstat utility


The clstat utility is the most traditional and most used interactive tool to observe the cluster
status. Before using the clstat utility, you must convert the Simple Network Management
Protocol (SNMP) from version 3 to version 1, if it is not done yet. Example 8-41 shows the
steps and sample outputs.

Example 8-41 Converting SNMP from V3 to V1


seoul:/ # stopsrc -s snmpd
0513-044 The snmpd Subsystem was requested to stop.

seoul:/ # ls -ld /usr/sbin/snmpd


lrwxrwxrwx 1 root system 9 Sep 15 22:17 /usr/sbin/snmpd ->
snmpdv3ne

seoul:/ # /usr/sbin/snmpv3_ssw -1
Stop daemon: snmpmibd
In /etc/rc.tcpip file, comment out the line that contains: snmpmibd
In /etc/rc.tcpip file, remove the comment from the line that contains: dpid2
Make the symbolic link from /usr/sbin/snmpd to /usr/sbin/snmpdv1
Make the symbolic link from /usr/sbin/clsnmp to /usr/sbin/clsnmpne

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 231


Start daemon: dpid2

seoul:/ # ls -ld /usr/sbin/snmpd


lrwxrwxrwx 1 root system 17 Sep 20 09:49 /usr/sbin/snmpd ->
/usr/sbin/snmpdv1

seoul:/ # startsrc -s snmpd


0513-059 The snmpd Subsystem has been started. Subsystem PID is
8126570.

The clstat utility in interactive mode


With the new -i flag, you can now select the cluster ID from a list of available ones as shown
in Example 8-42.

Example 8-42 The clstat command in interactive mode


sydney:/ # clstat -i
clstat - HACMP Cluster Status Monitor
-------------------------------------

Number of clusters active: 1

ID Name State

1108531106 korea UP

Select an option:
# - the Cluster ID q- quit
1108531106

clstat - HACMP Cluster Status Monitor


-------------------------------------
Cluster: korea (1108531106)
Tue Oct 5 11:01:17 2010
State: UP Nodes: 2
SubState: STABLE

Node: busan State: UP


Interface: busan-b1 (0) Address: 192.168.101.144
State: UP
Interface: busan-b2 (0) Address: 192.168.201.144
State: UP

Node: seoul State: UP


Interface: seoul-b1 (0) Address: 192.168.101.143
State: UP
Interface: seoul-b2 (0) Address: 192.168.201.143
State: UP
Interface: poksap-db (0) Address: 10.168.101.143
State: UP
Resource Group: db2pok_ResourceGroup State: On line

232 IBM PowerHA SystemMirror 7.1 for AIX


The clstat utility with the -o flag
You can use the clstat utility with the -o flag as shown in Example 8-43. This flag instructs
the utility to run once and then exit. It is useful for scripts and cron jobs.

Example 8-43 The clstat utility with the option to run only once
sydney:/ # clstat -o

clstat - HACMP Cluster Status Monitor


-------------------------------------

Cluster: au_cl (1128255334)


Mon Sep 20 10:26:10 2010
State: UP Nodes: 2
SubState: STABLE

Node: perth State: UP


Interface: perth (0) Address: 192.168.101.136
State: UP
Interface: perthb2 (0) Address: 192.168.201.136
State: UP
Interface: perths (0) Address: 10.168.201.136
State: UP
Resource Group: perthrg State: On line

Node: sydney State: UP


Interface: sydney (0) Address: 192.168.101.135
State: UP
Interface: sydneyb2 (0) Address: 192.168.201.135
State: UP
Interface: sydneys (0) Address: 10.168.201.135
State: UP
Resource Group: sydneyrg State: On line
sydney:/ #

Tip: The sfwcom and dpcomm interfaces that are shown with the lscluster -i command are
not shown in output of the clstat utility. The PowerHA 7.1 cluster is unaware of the CAA
cluster that is present at the AIX level.

Using the cldump utility


Another traditional way to observe the cluster status is to use the cldump utility, which also
relies on the SNMP infrastructure as shown in Example 8-44.

Example 8-44 cldump command


seoul:/ # cldump
Obtaining information via SNMP from Node: seoul...

_____________________________________________________________________________
Cluster Name: korea
Cluster State: UP
Cluster Substate: STABLE
_____________________________________________________________________________

Node Name: busan State: UP

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 233


Network Name: net_ether_01 State: UP

Address: 192.168.101.144 Label: busan-b1 State: UP


Address: 192.168.201.144 Label: busan-b2 State: UP

Node Name: seoul State: UP

Network Name: net_ether_01 State: UP

Address: 10.168.101.143 Label: poksap-db State: UP


Address: 192.168.101.143 Label: seoul-b1 State: UP
Address: 192.168.201.143 Label: seoul-b2 State: UP

Cluster Name: korea

Resource Group Name: db2pok_ResourceGroup


Startup Policy: Online On Home Node Only
Fallover Policy: Fallover To Next Priority Node In The List
Fallback Policy: Never Fallback
Site Policy: ignore
Node Group State
---------------------------- ---------------
seoul ONLINE
busan OFFLINE

Tools in the /usr/es/sbin/cluster/utilities/ file


The administrator of a running PowerHA 7.1 cluster can use several tools that are provided
with the cluster.es.server.utils file set. These tools are kept in the
/usr/es/sbin/cluster/utilities/ directory. Examples of the tools are provided in the
following sections.

Listing the PowerHA SystemMirror cluster interfaces


Example 8-45 shows the list of interfaces in the cluster using the cllsif command.

Example 8-45 Listing cluster interfaces using the cllsif command


seoul:/ # /usr/es/sbin/cluster/utilities/cllsif
Adapter Type Network Net Type Attribute Node IP
Address Hardware Address Interface Name Global Name Netmask
Alias for HB Prefix Length

busan-b2 boot net_ether_01 ether public busan


192.168.201.144 en2 255.255.255.0 24
busan-b1 boot net_ether_01 ether public busan
192.168.101.144 en0 255.255.255.0 24
poksap-db service net_ether_01 ether public busan
10.168.101.143 255.255.255.0 24
seoul-b1 boot net_ether_01 ether public seoul
192.168.101.143 en0 255.255.255.0 24
seoul-b2 boot net_ether_01 ether public seoul
192.168.201.143 en2 255.255.255.0 24
poksap-db service net_ether_01 ether public seoul
10.168.101.143 255.255.255.0 24

234 IBM PowerHA SystemMirror 7.1 for AIX


Listing the whole cluster topology information
Example 8-46 shows the cluster topology information that is generated by using the cllscf
command.

Example 8-46 Cluster topology listing by using the cllscf command


seoul:/ # /usr/es/sbin/cluster/utilities/cllscf
Cluster Name: korea
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
There were 1 networks defined: net_ether_01
There are 2 nodes in this cluster

NODE busan:
This node has 1 service IP label(s):

Service IP Label poksap-db:


IP address: 10.168.101.143
Hardware Address:
Network: net_ether_01
Attribute: public
Aliased Address?: Enable

Service IP Label poksap-db has 2 communication interfaces.


(Alternate Service) Communication Interface 1: busan-b2
IP Address: 192.168.201.144
Network: net_ether_01
Attribute: public

Alias address for heartbeat:


(Alternate Service) Communication Interface 2: busan-b1
IP Address: 192.168.101.144
Network: net_ether_01
Attribute: public

Alias address for heartbeat:


Service IP Label poksap-db has no communication interfaces for recovery.

This node has 1 persistent IP label(s):

Persistent IP Label busan:


IP address: 10.168.101.44
Network: net_ether_01

NODE seoul:
This node has 1 service IP label(s):

Service IP Label poksap-db:


IP address: 10.168.101.143
Hardware Address:
Network: net_ether_01
Attribute: public
Aliased Address?: Enable

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 235


Service IP Label poksap-db has 2 communication interfaces.
(Alternate Service) Communication Interface 1: seoul-b1
IP Address: 192.168.101.143
Network: net_ether_01
Attribute: public

Alias address for heartbeat:


(Alternate Service) Communication Interface 2: seoul-b2
IP Address: 192.168.201.143
Network: net_ether_01
Attribute: public

Alias address for heartbeat:


Service IP Label poksap-db has no communication interfaces for recovery.

This node has 1 persistent IP label(s):

Persistent IP Label seoul:


IP address: 10.168.101.43
Network: net_ether_01

Breakdown of network connections:

Connections to network net_ether_01


Node busan is connected to network net_ether_01 by these interfaces:
busan-b2
busan-b1
poksap-db
busan

Node seoul is connected to network net_ether_01 by these interfaces:


seoul-b1
seoul-b2
poksap-db
seoul

Tip: The cltopinfo -m command is used to show the heartbeat rings in the previous
versions of PowerHA. Because this concept no longer applies, the output of the cltopinfo
-m command is empty in PowerHA 7.1.

The PowerHA 7.1 cluster administrator must explore all the utilities in the
/usr/es/sbin/cluster/utilities/ directory in a testing system. Most of the utilities are only
informational tools. Remember to never trigger unknown commands in production systems.

8.3.4 PowerHA ODM classes


Example 8-47 on page 237 provides a comprehensive list of PowerHA Object Data Manager
(ODM) files. Never edit these files directly, unless you are directed by IBM support. However,
you can use the odmget command to grab cluster configuration information directly from these
files as explained in this section.

236 IBM PowerHA SystemMirror 7.1 for AIX


Example 8-47 PowerHA ODM files
seoul:/etc/es/objrepos # ls HACMP*
HACMPadapter HACMPpprcconsistgrp
HACMPcluster HACMPras
HACMPcommadapter HACMPresource
HACMPcommlink HACMPresourcetype
HACMPcsserver HACMPrg_loc_dependency
HACMPcustom HACMPrgdependency
HACMPdaemons HACMPrresmethods
HACMPdisksubsys HACMPrules
HACMPdisktype HACMPsa
HACMPercmf HACMPsa_metadata
HACMPercmfglobals HACMPsdisksubsys
HACMPevent HACMPserver
HACMPeventmgr HACMPsircol
HACMPfcfile HACMPsite
HACMPfcmodtime HACMPsiteinfo
HACMPfilecollection HACMPsna
HACMPgpfs HACMPsp2
HACMPgroup HACMPspprc
HACMPlogs HACMPsr
HACMPmonitor HACMPsvc
HACMPnetwork HACMPsvcpprc
HACMPnim HACMPsvcrelationship
HACMPnode HACMPtape
HACMPnpp HACMPtc
HACMPoemfilesystem HACMPtimer
HACMPoemfsmethods HACMPtimersvc
HACMPoemvgmethods HACMPtopsvcs
HACMPoemvolumegroup HACMPude
HACMPpager HACMPudres_def
HACMPpairtasks HACMPudresource
HACMPpathtasks HACMPx25
HACMPport HACMPxd_mirror_group
HACMPpprc

Use the odmget command followed by the name of the file in the /etc/es/objrepos directory.
Example 8-48 shows how to retrieve information about the cluster.

Example 8-48 Using the odmget command to grab cluster information


seoul:/ # ls -ld /etc/es/objrepos/HACMPcluster
-rw-r--r-- 1 root hacmp 4096 Sep 17 12:29
/etc/es/objrepos/HACMPcluster

seoul:/ # odmget HACMPcluster

HACMPcluster:
id = 1108531106
name = "korea"
nodename = "seoul"
sec_level = "Standard"
sec_level_msg = ""
sec_encryption = ""
sec_persistent = ""

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 237


last_node_ids = ""
highest_node_id = 0
last_network_ids = ""
highest_network_id = 0
last_site_ids = ""
highest_site_id = 0
handle = 2
cluster_version = 12
reserved1 = 0
reserved2 = 0
wlm_subdir = ""
settling_time = 0
rg_distribution_policy = "node"
noautoverification = 0
clvernodename = ""
clverhour = 0
clverstartupoptions = 0

Tip: In previous versions of PowerHA, the ODM HACMPtopsvcs class kept information about
the current instance number for a node. In PowerHA 7.1, this class always has the instance
number 1 (instanceNum = 1 as shown in the following example) because topology services
are not used anymore. This number never changes.
seoul:/ # odmget HACMPtopsvcs
HACMPtopsvcs:
hbInterval = 1
fibrillateCount = 4
runFixedPri = 1
fixedPriLevel = 38
tsLogLength = 5000
gsLogLength = 5000

instanceNum = 1

You can use the HACMPnode ODM class to discover which version of PowerHA is installed as
shown in Example 8-49.

Example 8-49 Using the odmget command to retrieve the PowerHA version
seoul:/ # odmget HACMPnode | grep version | sort -u
version = 12

The following version numbers and corresponding HACMP/PowerHA release are available:
2: HACMP 4.3.1
3: HACMP 4.4
4: HACMP 4.4.1
5: HACMP 4.5
6: HACMP 5.1
7: HACMP 5.2
8: HACMP 5.3
9: HACMP 5.4
10: PowerHA 5.5
11: PowerHA 6.1
12: PowerHA 7.1

238 IBM PowerHA SystemMirror 7.1 for AIX


Querying the HACMPnode ODM class is useful during cluster synchronization after a migration,
when PowerHA issues warning messages about mixed versions among the nodes.

If the HACMPtopsvcs ODM class can no longer be used to discover if the configuration must be
synchronized across the nodes, you can query the HACMPcluster ODM class. This class
keeps a numeric attribute called handle. Each node has a different value for this attribute,
ranging from 1 to 32. You can retrieve the handle values by using the odmget or clhandle
commands as shown in Example 8-50.

Example 8-50 Viewing the cluster handles


seoul:/ # clcmd odmget HACMPcluster
-------------------------------
NODE seoul
-------------------------------

HACMPcluster:
id = 1108531106
name = "korea"
nodename = "seoul"
sec_level = "Standard"
sec_level_msg = ""
sec_encryption = ""
sec_persistent = ""
last_node_ids = ""
highest_node_id = 0
last_network_ids = ""
highest_network_id = 0
last_site_ids = ""
highest_site_id = 0
handle = 2
cluster_version = 12
reserved1 = 0
reserved2 = 0
wlm_subdir = ""
settling_time = 0
rg_distribution_policy = "node"
noautoverification = 0
clvernodename = ""
clverhour = 0
clverstartupoptions = 0

-------------------------------
NODE busan
-------------------------------

HACMPcluster:
id = 1108531106
name = "korea"
nodename = "busan"
sec_level = "Standard"
sec_level_msg = ""
sec_encryption = ""
sec_persistent = ""
last_node_ids = ""
highest_node_id = 0

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 239


last_network_ids = ""
highest_network_id = 0
last_site_ids = ""
highest_site_id = 0
handle = 1
cluster_version = 12
reserved1 = 0
reserved2 = 0
wlm_subdir = ""
settling_time = 0
rg_distribution_policy = "node"
noautoverification = 0
clvernodename = ""
clverhour = 0
clverstartupoptions = 0

seoul:/ # clcmd clhandle


-------------------------------
NODE seoul
-------------------------------
2 seoul

-------------------------------
NODE busan
-------------------------------
1 busan

When you perform a cluster configuration change in any node, that node receives a numeric
value of 0 over its handle.

Suppose that you want to add a new resource group to the korea cluster and that you make
the change from the seoul node. After you do the modification, and before you synchronize
the cluster, the handle attribute in the HACMPcluster ODM class in the seoul node has a value
of 0 as shown in Example 8-51.

Example 8-51 Handle values after a change, before synchronization


seoul:/ # clcmd odmget HACMPcluster | egrep "NODE|handle"
NODE seoul
handle = 0
NODE busan
handle = 1

seoul:/ # clcmd clhandle


-------------------------------
NODE seoul
-------------------------------
0 seoul
-------------------------------
NODE busan
-------------------------------
1 busan

240 IBM PowerHA SystemMirror 7.1 for AIX


After you synchronize the cluster, the handle goes back to its original value of 2 as shown in
Example 8-52.

Example 8-52 Original handle values after synchronization


seoul:/ # smitty sysmirror  Custom Cluster Configuration  Verify and
Synchronize Cluster Configuration (Advanced)

seoul:/ # clcmd odmget HACMPcluster | egrep "NODE|handle"


NODE seoul
handle = 2
NODE busan
handle = 1

seoul:/ # clcmd clhandle


-------------------------------
NODE seoul
-------------------------------
2 seoul
-------------------------------
NODE busan
-------------------------------
1
busan

If you experience a situation where more than one node has a handle with a 0 value, you or
another person might have performed the changes from different nodes. Therefore, you must
decide in which node you want to start the synchronization. As result, the cluster
modifications made on the other nodes are then lost.

8.3.5 PowerHA clmgr utility


The clmgr utility provides a new interface to PowerHA with consistency, usability, and
serviceability. The tool is packed into the cluster.es.server.utils file set as shown in
Example 8-53.

Example 8-53 The clmgr utility file set


seoul:/ # whence clmgr
/usr/es/sbin/cluster/utilities/clmgr

seoul:/ # lslpp -w /usr/es/sbin/cluster/utilities/clmgr


File Fileset Type
----------------------------------------------------------------------------
/usr/es/sbin/cluster/utilities/clmgr
cluster.es.server.utils Hardlink

The clmgr command generates a /var/hacmp/log/clutils.log log file.

The clmgr command supports the actions as listed in 5.2.1, “The clmgr action commands” on
page 104.

For monitoring purposes, you can use the query and view actions. For a list of object classes,
that are available for each action, see 5.2.2, “The clmgr object classes” on page 105.

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 241


Example using the query action
Example 8-54 shows the query action on the PowerHA cluster using the clmgr command.

Example 8-54 Query action on the PowerHA cluster using the clmgr command
seoul:/ # clmgr query cluster
CLUSTER_NAME="korea"
CLUSTER_ID="1108531106"
STATE="STABLE"
VERSION="7.1.0.1"
VERSION_NUMBER="12"
EDITION="STANDARD"
CLUSTER_IP=""
REPOSITORY="caa_private0"
SHARED_DISKS="cldisk2,cldisk1"
UNSYNCED_CHANGES="false"
SECURITY="Standard"
FC_SYNC_INTERVAL="10"
RG_SETTLING_TIME="0"
RG_DIST_POLICY="node"
MAX_EVENT_TIME="180"
MAX_RG_PROCESSING_TIME="180"
SITE_POLICY_FAILURE_ACTION="fallover"
SITE_POLICY_NOTIFY_METHOD=""
DAILY_VERIFICATION="Enabled"
VERIFICATION_NODE="Default"
VERIFICATION_HOUR="0"
VERIFICATION_DEBUGGING="Enabled"
LEVEL=""
ALGORITHM=""
GRACE_PERIOD=""
REFRESH=""
MECHANISM=""
CERTIFICATE=""
PRIVATE_KEY=""

seoul:/ # clmgr query interface


busan-b2
busan-b1
poksap-db
seoul-b1
seoul-b2

seoul:/ # clmgr query node


busan
seoul

seoul:/ # clmgr query network


net_ether_01

seoul:/ # clmgr query resource_group


db2pok_ResourceGroup

242 IBM PowerHA SystemMirror 7.1 for AIX


seoul:/ # clmgr query volume_group
caavg_private
pokvg

Tip: Another way to check the PowerHA version is to query the SNMP subsystem as
follows:
seoul:/ # snmpinfo -m dump -v -o /usr/es/sbin/cluster/hacmp.defs
clstrmgrVersion

clstrmgrVersion.1 = "7.1.0.1"
clstrmgrVersion.2 = "7.1.0.1"

Example using the view action


Example 8-55 shows the view action on the PowerHA cluster using the clmgr command.

Example 8-55 Using the view action on the PowerHA cluster using clmgr
seoul:/ # clmgr view report cluster
Cluster: korea
Cluster services: active
State of cluster: up
Substate: stable

#############
APPLICATIONS
#############
Cluster korea provides the following applications: db2pok_ApplicationServer
Application: db2pok_ApplicationServer
db2pok_ApplicationServer is started by
/usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok
db2pok_ApplicationServer is stopped by
/usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok
Application monitors for db2pok_ApplicationServer:
db2pok_SQLMonitor
db2pok_ProcessMonitor
Monitor name: db2pok_SQLMonitor
Type: custom
Monitor method: user
Monitor interval: 120 seconds
Hung monitor signal: 9
Stabilization interval: 240 seconds
Retry count: 3 tries
Restart interval: 1440 seconds
Failure action: fallover
Cleanup method: /usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok
Restart method: /usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok
Monitor name: db2pok_ProcessMonitor
Type: process
Process monitored: db2sysc
Process owner: db2pok
Instance count: 1
Stabilization interval: 240 seconds
Retry count: 3 tries
Restart interval: 1440 seconds

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 243


Failure action: fallover
Cleanup method: /usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok
Restart method: /usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok
This application is part of resource group 'db2pok_ResourceGroup'.
Resource group policies:
Startup: on home node only
Fallover: to next priority node in the list
Fallback: never
State of db2pok_ApplicationServer: online
Nodes configured to provide db2pok_ApplicationServer: seoul {up}
busan {up}
Node currently providing db2pok_ApplicationServer: seoul {up}
The node that will provide db2pok_ApplicationServer if seoul fails
is: busan
Resources associated with db2pok_ApplicationServer:
Service Labels
poksap-db(10.168.101.143) {online}
Interfaces configured to provide poksap-db:
seoul-b1 {up}
with IP address: 192.168.101.143
on interface: en0
on node: seoul {up}
on network: net_ether_01 {up}
seoul-b2 {up}
with IP address: 192.168.201.143
on interface: en2
on node: seoul {up}
on network: net_ether_01 {up}
busan-b2 {up}
with IP address: 192.168.201.144
on interface: en2
on node: busan {up}
on network: net_ether_01 {up}
busan-b1 {up}
with IP address: 192.168.101.144
on interface: en0
on node: busan {up}
on network: net_ether_01 {up}
Shared Volume Groups:
pokvg

#############
TOPOLOGY
#############
korea consists of the following nodes: busan seoul
busan
Network interfaces:
busan-b2 {up}
with IP address: 192.168.201.144
on interface: en2
on network: net_ether_01 {up}
busan-b1 {up}
with IP address: 192.168.101.144
on interface: en0
on network: net_ether_01 {up}

244 IBM PowerHA SystemMirror 7.1 for AIX


seoul
Network interfaces:
seoul-b1 {up}
with IP address: 192.168.101.143
on interface: en0
on network: net_ether_01 {up}
seoul-b2 {up}
with IP address: 192.168.201.143
on interface: en2
on network: net_ether_01 {up}

seoul:/ # clmgr view report topology


Cluster Name: korea
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: caa_private0
Cluster IP Address:

NODE busan:
Network net_ether_01
poksap-db 10.168.101.143
busan-b2 192.168.201.144
busan-b1 192.168.101.144
NODE seoul:
Network net_ether_01
poksap-db 10.168.101.143
seoul-b1 192.168.101.143
seoul-b2 192.168.201.143

Network Attribute Alias Monitor method Node Adapter(s)

net_ether_01 public Enable Default monitoring busan busan-b2


busan-b1 poksap-db
seoul seoul-b1 seoul-b2 poksap-db

Adapter Type Network Net Type Attribute Node IP


Address Hardware Address Interface Name Global Name Netmask
Alias for HB Prefix Length

busan-b2 boot net_ether_01 ether public busan


192.168.201.144 en2 255.255.255.0
22
busan-b1 boot net_ether_01 ether public busan
192.168.101.144 en0 255.255.255.0
22
poksap-db service net_ether_01 ether public busan
10.168.101.143 255.255.255.0
22
seoul-b1 boot net_ether_01 ether public seoul
192.168.101.143 en0 255.255.255.0
22

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 245


seoul-b2 boot net_ether_01 ether public seoul
192.168.201.143 en2 255.255.255.0
22
poksap-db service net_ether_01 ether public seoul
10.168.101.143 255.255.255.0
22

You can also use the clmgr command to see the list of PowerHA SystemMirror log files as
shown in Example 8-56.

Example 8-56 Viewing the PowerHA cluster log files using the clmgr command
seoul:/ # clmgr view log
Available Logs:

autoverify.log
cl2siteconfig_assist.log
cl_testtool.log
clavan.log
clcomd.log
clcomddiag.log
clconfigassist.log
clinfo.log
clstrmgr.debug
clstrmgr.debug.long
cluster.log
cluster.mmddyyyy
clutils.log
clverify.log
cspoc.log
cspoc.log.long
cspoc.log.remote
dhcpsa.log
dnssa.log
domino_server.log
emuhacmp.out
hacmp.out
ihssa.log
migration.log
sa.log
sax.log

Tip: The output verbose level can be set by using the -l option as in the following
example:
clmgr -l {low|med|high|max} action object

8.3.6 IBM Systems Director web interface


This section explains how to discover and monitor a cluster by using the IBM Systems
Director 6.1 web interface. For the steps to install IBM Systems Director, the IBM PowerHA
SystemMirror plug-in, and IBM System Director Common Agent, see Chapter 11, “Installing
IBM Systems Director and the PowerHA SystemMirror plug-in” on page 325.

246 IBM PowerHA SystemMirror 7.1 for AIX


Login page for IBM Systems Director
When you point the web browser to the IBM Systems Director IP address, port 8422, you are
presented with a login page. The root user and password are used to log on as shown in
Figure 8-1 on page 247.

Root user: Do not use the root user. The second person who logs on with the root user ID
unlogs the first person, and so on. The logon is exclusive. For a production environment,
create an AIX user ID for each person who must connect to the IBM Systems Director web
interface. This user ID must belong to smadmin. Therefore, everyone can connect
simultaneously to the IBM Systems Director web interface. For more information, see the
“Users and user groups in IBM Systems Director” topic in the IBM Systems Director V6.1.x
Information Center at:
http://publib.boulder.ibm.com/infocenter/director/v6r1x/index.jsp?topic=/direct
or.security_6.1/fqm0_c_user_accounts.html

smadmin (Administrator group): Members of the smadmin group are authorized for all
operations.

Figure 8-1 IBM Systems Director 6.1 login page

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 247


Welcome page for IBM Systems Director
On the welcome page, the administrator must first discover the systems with PowerHA to
administer. Figure 8-2 shows the link underlined in red.

Figure 8-2 IBM Systems Director 6.1 welcome page

248 IBM PowerHA SystemMirror 7.1 for AIX


Discovery Manager
In the Discovery Manager panel, the administrator must click the System discovery link as
shown in Figure 8-3.

Figure 8-3 IBM Systems Director 6.1 Discovery Manager

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 249


Selecting the systems and agents to discover
In the System Discovery panel, complete the following actions:
1. For Select a discovery option, select the Range of IPv4 addresses.
2. Enter the starting and ending IP addresses. In Figure 8-4, only the two IP addresses for
the seoul and busan nodes are used.
3. For Select the resource type to discover, leave the default of All.
4. Click the Discover now button. The discovery takes less than 1 minute in this case
because the IP range is limited to two machines.

Figure 8-4 Selecting the systems to discover

250 IBM PowerHA SystemMirror 7.1 for AIX


IBM Systems Director availability menu
In the left navigation bar, expand Availability and click the PowerHA SystemMirror link as
shown in Figure 8-5.

Figure 8-5 IBM Systems Director 6.1 availability menu

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 251


Initial panel of the PowerHA SystemMirror plug-in
In the Health Summary list, you can see that two systems have an OK status with one
resource group also having an OK status. Click the Manage Clusters link as shown in
Figure 8-6.

Figure 8-6 PowerHA SystemMirror plug-in initial menu

252 IBM PowerHA SystemMirror 7.1 for AIX


PowerHA available clusters
On the Cluster and Resource Group Management panel (Figure 8-7), the PowerHA plug-in
for IBM Systems Director shows the available clusters. This information has been retrieved in
the discovery process. Two clusters are shown: korea and ro_cl. In the korea cluster, the two
nodes, seoul and busan, are visible and indicate a healthy status. The General tab on the
right shows more relevant information about the selected cluster.

Figure 8-7 PowerHA SystemMirror plug-in: Available clusters

Cluster menu
You can right-click all the objects to access options. Figure 8-8 shows an example of the
options for the korea cluster.

Figure 8-8 Menu options when right-clicking a cluster in the PowerHA SystemMirror plug-in

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 253


PowerHA SystemMirror plug-in: Resource Groups tab
The Resource Group tab (Figure 8-9) shows the available resource groups in the cluster.

Figure 8-9 PowerHA SystemMirror plug-in: Resource Groups tab

254 IBM PowerHA SystemMirror 7.1 for AIX


Resource Groups menu
You can right-click the resource groups to access options such as those shown in
Figure 8-10.

Figure 8-10 Options available when right-clicking a resource group in PowerHA SystemMirror plug-in

PowerHA SystemMirror plug-in: Cluster tab


The Cluster tab has several tabs on the right that you can use to retrieve information about
the cluster. These tabs include the Resource Groups tab, Network tab, Storage tab, and
Additional Properties tab as shown in the following sections.

Resource Group tab


Figure 8-11 shows the Resource Groups tab and the information that is presented.

Figure 8-11 PowerHA SystemMirror plug-in: Resource Groups tab

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 255


Network tab
Figure 8-12 shows the Networks tab and the information that is displayed.

Figure 8-12 PowerHA SystemMirror plug-in: Networks tab

Storage tab
Figure 8-13 shows the Storage tab and the information that is presented.

Figure 8-13 PowerHA SystemMirror plug-in: Storage tab

256 IBM PowerHA SystemMirror 7.1 for AIX


Additional Properties tab
Figure 8-14 shows the Additional Properties tab and the information that is presented.

Figure 8-14 PowerHA SystemMirror plug-in Additional Properties tab

8.3.7 IBM Systems Director CLI (smcli interface)


The new web interface for IBM Systems Director is powerful, allowing IBM Systems Director
to be used anywhere to open a systems management console. However, it is often desirable
to perform certain functions against the management server by using a command line.

Whether scripting something to be used on many systems or to automate a process, the CLI
can be useful in a management environment such as IBM Systems Director.

Tip: To run the commands, the smcli interface requires you to be an IBM Systems Director
superuser.

Example 8-57 runs the smcli command host name mexico in IBM Systems Director to see the
available options with PowerHA.

Example 8-57 Available options for PowerHA in IBM Systems Director CLI
mexico:/ # /opt/ibm/director/bin/smcli lsbundle | grep sysmirror
sysmirror/help
sysmirror/lsac
sysmirror/lsam
sysmirror/lsappctl
sysmirror/lsappmon
sysmirror/lscl
sysmirror/lscluster
sysmirror/lsdependency
sysmirror/lsdp
sysmirror/lsfc
sysmirror/lsfilecollection
sysmirror/lsif
sysmirror/lsinterface
sysmirror/lslg
sysmirror/lslog
sysmirror/lsmd
sysmirror/lsmethod

Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 257


sysmirror/lsnd
.
.
.

All the configuration commands listed in Example 8-57 can be triggered by using the smcli
command. Example 8-58 shows the commands that you can use.

Example 8-58 Using #smcli to retrieve PowerHA information


# Lists the clusters that can be managed by the IBM Systems Director:
mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lscluster
korea (1108531106)

# Lists the service labels of a cluster:


mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lssi -c korea
poksap-db

# Lists all interfaces defined in a cluster:


mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lsif -c korea
busan-b1
busan-b2
seoul-b1
seoul-b2

# Lists resource groups of a cluster:


mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lsrg -c korea
db2pok_ResourceGroup

# Lists networks:
mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lsnw -c korea
net_ether_01

# Lists application servers of a cluster:


mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lsac -c korea
db2pok_ApplicationServer

258 IBM PowerHA SystemMirror 7.1 for AIX


9

Chapter 9. Testing the PowerHA 7.1 cluster


This chapter takes you through several simulations for testing a PowerHA 7.1 cluster and then
explains the cluster behavior and log files. This chapter includes the following topics:
 Testing the SAN-based heartbeat channel
 Testing the repository disk heartbeat channel
 Simulation of a network failure
 Testing the rootvg system event
 Simulation of a crash in the node with an active resource group
 Simulations of CPU starvation
 Simulation of a Group Services failure
 Testing a Start After resource group dependency
 Testing dynamic node priority

© Copyright IBM Corp. 2011. All rights reserved. 259


9.1 Testing the SAN-based heartbeat channel
This section explains how to check the redundant heartbeat through the storage area network
(SAN)-based channel if the network communication between nodes is lost. The procedure is
based on the test cluster shown in Figure 9-1. In this environment, the PowerHA cluster is
synchronized, and the CAA cluster is running.

Figure 9-1 Testing the SAN-based heartbeat

Example 9-1 shows the working state of the CAA cluster.

Example 9-1 Initial error-free CAA status


sydney:/ # lscluster -i
Network/Storage Interface Query

Cluster Name: au_cl


Cluster uuid: d77ac57e-cc1b-11df-92a4-00145ec5bf9a
Number of nodes reporting = 2
Number of nodes expected = 2
Node sydney
Node uuid = f6a81944-cbce-11df-87b6-00145ec5bf9a
Number of interfaces discovered = 4
Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9a
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 5

260 IBM PowerHA SystemMirror 7.1 for AIX


Probe interval for interface = 120 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9b
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 5
Probe interval for interface = 120 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.135 broadcast 192.168.203.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 0
Mean Deviation in network rrt across interface = 0
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
Node perth
Node uuid = 15bef17c-cbcf-11df-951c-00145e5e3182
Number of interfaces discovered = 4
Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.e7.25.d9
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3

Chapter 9. Testing the PowerHA 7.1 cluster 261


Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.136 broadcast 192.168.103.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.e7.25.d8
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.136 broadcast 192.168.203.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 0
Mean Deviation in network rrt across interface = 0
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED

262 IBM PowerHA SystemMirror 7.1 for AIX


You can check the connectivity between nodes by using the socksimple command. This
command provides a ping-type interface to send and receive packets over the cluster
communications channels. Example 9-2 shows the usage output of running the socksimple
command.

Example 9-2 socksimple usage


sydney:/ # socksimple
Usage: socksimple -r|-s [-v] [-a address] [-p port] [-t ttl]

-r|-s Receiver or sender. Required argument,


mutually exclusive
-a address Cluster address to listen/send on,
overrides the default. (must be < 16 characters long)
-p port port to listen/send on,
overrides the default of 12.
-p ttl Time-To-Live to send,
overrides the default of 1.
-v Verbose mode

You can obtain the cluster address for the -a option of the socksimple command from the
lscluster -c command output (Example 9-3).

Example 9-3 Node IDs of the CAA cluster


sydney:/ # lscluster -c
Cluster query for cluster aucl returns:
Cluster uuid: 98f28ffa-cfde-11df-9a82-00145ec5bf9a
Number of nodes in cluster = 2
Cluster id for node perth is 1
Primary IP address for node perth is 192.168.101.136
Cluster id for node sydney is 2
Primary IP address for node sydney is 192.168.101.135
Number of disks in cluster = 0
Multicast address for cluster is 228.168.101.135

To test the SAN-based heartbeat channel, follow these steps:


1. Check the cluster communication with all the network interfaces up (Example 9-4).

Example 9-4 The socksimple test with the network channel up


sydney:/ # socksimple -s -a 1
socksimple version 1.2
socksimpleing 1/12 with ttl=1:

1275 bytes from cluster host id = 1: seqno=1275 ttl=1 time=0.415 ms


1276 bytes from cluster host id = 1: seqno=1276 ttl=1 time=0.381 ms
1277 bytes from cluster host id = 1: seqno=1277 ttl=1 time=0.347 ms

--- socksimple statistics ---


3 packets transmitted, 3 packets received
round-trip min/avg/max = 0.347/0.381/0.415 ms

perth:/ # socksimple -r -a 1
socksimple version 1.2
Listening on 1/12:

Chapter 9. Testing the PowerHA 7.1 cluster 263


Replying to socksimple from cluster node id = 2 bytes=1275 seqno=1275 ttl=1
Replying to socksimple from cluster node id = 2 bytes=1276 seqno=1276 ttl=1
Replying to socksimple from cluster node id = 2 bytes=1277 seqno=1277 ttl=1
perth:/ #

2. Disconnect the network interfaces, by pulling the cables in one node to simulate an
Ethernet network failure. Example 9-5 shows the interfaces status.

Example 9-5 Ethernet ports down


perth:/ # entstat -d ent1 | grep -i link
Link Status: UNKNOWN

perth:/ # entstat -d ent2 | grep -i link


Link Status: UNKNOWN

3. Check the cluster communication by using the socksimple command as shown in


Example 9-6.

Example 9-6 The socksimple test with Ethernet ports down


sydney:/ # socksimple -s -a 1
socksimple version 1.2
socksimpleing 1/12 with ttl=1:

1275 bytes from cluster host id = 1: seqno=1275 ttl=1 time=1.075 ms


1275 bytes from cluster host id = 1: seqno=1275 ttl=1 time=50.513 ms
1275 bytes from cluster host id = 1: seqno=1275 ttl=1 time=150.663 ms
1276 bytes from cluster host id = 1: seqno=1276 ttl=1 time=0.897 ms
1276 bytes from cluster host id = 1: seqno=1276 ttl=1 time=50.623 ms
1276 bytes from cluster host id = 1: seqno=1276 ttl=1 time=150.791 ms

--- socksimple statistics ---


2 packets transmitted, 6 packets received
round-trip min/avg/max = 0.897/67.427/150.791 ms

perth:/ # socksimple -r -a 1
socksimple version 1.2
Listening on 1/12:

Replying to socksimple from cluster node id = 2 bytes=1275 seqno=1275 ttl=1


Replying to socksimple from cluster node id = 2 bytes=1276 seqno=1276 ttl=1
perth:/

4. Check the status of the cluster interfaces by using the lscluster -i command.
Example 9-7 shows the status for both disconnected ports on the perth node. In this
example, the status has changed from UP to DOWN SOURCE HARDWARE RECEIVE SOURCE
HARDWARE TRANSMIT.

Example 9-7 CAA cluster status with Ethernet ports down


sydney:/ # lscluster -i
Network/Storage Interface Query

Cluster Name: aucl


Cluster uuid: 98f28ffa-cfde-11df-9a82-00145ec5bf9a

264 IBM PowerHA SystemMirror 7.1 for AIX


Number of nodes reporting = 2
Number of nodes expected = 2
Node sydney
Node uuid = f6a81944-cbce-11df-87b6-00145ec5bf9a
Number of interfaces discovered = 4
Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9a
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255
netmask 255.255.255.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9b
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.135 broadcast 192.168.203.255
netmask 255.255.255.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500

Chapter 9. Testing the PowerHA 7.1 cluster 265


Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
Node perth
Node uuid = 15bef17c-cbcf-11df-951c-00145e5e3182
Number of interfaces discovered = 4
Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.e7.25.d9
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE
TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.136 broadcast 192.168.103.255
netmask 255.255.255.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.e7.25.d8
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE
TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.136 broadcast 192.168.203.255
netmask 255.255.255.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305

266 IBM PowerHA SystemMirror 7.1 for AIX


Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED

5. Reconnect the Ethernet cables and check the port status as shown in Example 9-8.

Example 9-8 Ethernet ports reconnected


perth:/ # entstat -d ent1 | grep -i link
Link Status: Up

perth:/ # entstat -d ent2|grep -i link


Link Status: Up

6. Check if the cluster status has recovered. Example 9-9 shows that both Ethernet ports on
the perth node are now in the UP state.

Example 9-9 CAA cluster status recovered


sydney:/ # lscluster -i
Network/Storage Interface Query

Cluster Name: aucl


Cluster uuid: 98f28ffa-cfde-11df-9a82-00145ec5bf9a
Number of nodes reporting = 2
Number of nodes expected = 2
Node sydney
Node uuid = f6a81944-cbce-11df-87b6-00145ec5bf9a
Number of interfaces discovered = 4
Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9a
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255
netmask 255.255.255.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9b
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3

Chapter 9. Testing the PowerHA 7.1 cluster 267


Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.135 broadcast 192.168.203.255
netmask 255.255.255.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
Node perth
Node uuid = 15bef17c-cbcf-11df-951c-00145e5e3182
Number of interfaces discovered = 4
Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.e7.25.d9
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.136 broadcast 192.168.103.255
netmask 255.255.255.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.e7.25.d8

268 IBM PowerHA SystemMirror 7.1 for AIX


Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.136 broadcast 192.168.203.255
netmask 255.255.255.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED

9.2 Testing the repository disk heartbeat channel


This section explains how to test the repository disk heartbeat channel.

9.2.1 Background
When the entire PowerHA SystemMirror IP network fails, and either the SAN-based heartbeat
network (sfwcom) does not exist, or it exists but has failed, CAA uses the
heartbeat-over-repository-disk (dpcom) feature.

The example in the next section describes dpcom heartbeating in a two-node cluster after all
IP interfaces have failed.

Chapter 9. Testing the PowerHA 7.1 cluster 269


9.2.2 Testing environment
A two-node cluster is configured with the following topology:
 en0 is not included in the PowerHA cluster, but it is monitored by CAA.
 en3 through en5 are included in the PowerHA cluster and monitored by CAA.
 No SAN-based communication channel (sfwcom) is available.

Initially, both nodes are online and running cluster services, all IP interfaces are online, and
the service IP address has an alias on the en3 interface.

This test scenario includes unplugging the cable of one interface at a time, starting with en3,
en4, en5, and finally en0. As each cable is unplugged, the service IP correctly swaps to the
next available interface on the same node. Each failed interface is marked as DOWN SOURCE
HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT as shown in Example 9-10. After the cables for
the en3 through en5 interfaces are unplugged, a local network failure event occurs, leading to
a selective failover of the resource group to the remote node. However, because the en0
interface is still up, CAA continues to heartbeat over the en0 interface.

Example 9-10 Output of the lscluster -i command


[hacmp27:HAES7101/AIX61-06 /]
# lscluster -i
Network/Storage Interface Query
Cluster Name: ha71sp1_aixsp2
Cluster uuid: c37f7324-daff-11df-903e-0011257e4998
Number of nodes reporting = 2
Number of nodes expected = 2
Node hacmp27
Node uuid = 66b66b10-d16e-11df-aa3f-0011257e4998
Number of interfaces discovered = 5
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.7e.49.98
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 9.3.44.27 broadcast 9.3.44.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 2 en3
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cc.d.b5
Smoothed rrt across interface = 8
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 110 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.1.27 broadcast 10.1.1.255 netmask 255.255.255.0

270 IBM PowerHA SystemMirror 7.1 for AIX


Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 3 en4
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cc.d.b6
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.2.27 broadcast 10.1.2.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 4 en5
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cc.d.b7
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.3.27 broadcast 10.1.3.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 5 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 118
Mean Deviation in network rrt across interface = 81
Probe interval for interface = 1990 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
Node hacmp28
Node uuid = 15e86116-d173-11df-8bdf-0011257e4340
Number of interfaces discovered = 5
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.7e.43.40
Smoothed rrt across interface = 8
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 110 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 9.3.44.28 broadcast 9.3.44.255 netmask 255.255.255.0

Chapter 9. Testing the PowerHA 7.1 cluster 271


Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 2 en3
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cb.e1.d
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 2
IPV4 ADDRESS: 10.1.1.28 broadcast 10.1.1.255 netmask 255.255.255.0
IPV4 ADDRESS: 192.168.1.27 broadcast 192.168.1.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 3 en4
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cb.e1.e
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.2.28 broadcast 10.1.2.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 4 en5
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cb.e1.f
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.3.28 broadcast 10.1.3.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 5 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 1037
Mean Deviation in network rrt across interface = 1020
Probe interval for interface = 20570 ms

272 IBM PowerHA SystemMirror 7.1 for AIX


ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED

Example 9-11 shows the output of the lscluster -m command.

Example 9-11 Output of the lscluster -m command


[hacmp27:HAES7101/AIX61-06 /]
# lscluster -m
Calling node query for all nodes
Node query number of nodes examined: 2

Node name: hacmp27


Cluster shorthand id for node: 1
uuid for node: 66b66b10-d16e-11df-aa3f-0011257e4998
State of node: UP NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
ha71sp1_aixsp2 local c37f7324-daff-11df-903e-0011257e4998

Number of points_of_contact for node: 0


Point-of-contact interface & contact state
n/a

------------------------------

Node name: hacmp28


Cluster shorthand id for node: 2
uuid for node: 15e86116-d173-11df-8bdf-0011257e4340
State of node: UP
Smoothed rtt to node: 8
Mean Deviation in network rtt to node: 3
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
ha71sp1_aixsp2 local c37f7324-daff-11df-903e-0011257e4998

Number of points_of_contact for node: 4


Point-of-contact interface & contact state
en0 UP
en5 DOWN
en4 DOWN
en3 DOWN

Chapter 9. Testing the PowerHA 7.1 cluster 273


After the en0 cable is unplugged, CAA proceeds to heartbeat over the repository disk
(dpcom). This action is indicated by the node status REACHABLE THROUGH REPOS DISK ONLY in
the lscluster -m command (Example 9-12).

Example 9-12 Output of the lscluster -m command


[hacmp27:HAES7101/AIX61-06 /]
# lscluster -m
Calling node query for all nodes
Node query number of nodes examined: 2

Node name: hacmp27


Cluster shorthand id for node: 1
uuid for node: 66b66b10-d16e-11df-aa3f-0011257e4998
State of node: UP NODE_LOCAL REACHABLE THROUGH REPOS DISK ONLY
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
ha71sp1_aixsp2 local c37f7324-daff-11df-903e-0011257e4998

Number of points_of_contact for node: 0


Point-of-contact interface & contact state
n/a

------------------------------

Node name: hacmp28


Cluster shorthand id for node: 2
uuid for node: 15e86116-d173-11df-8bdf-0011257e4340
State of node: UP REACHABLE THROUGH REPOS DISK ONLY
Smoothed rtt to node: 143
Mean Deviation in network rtt to node: 107
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
ha71sp1_aixsp2 local c37f7324-daff-11df-903e-0011257e4998

Number of points_of_contact for node: 5


Point-of-contact interface & contact state
dpcom UP
en0 DOWN
en5 DOWN
en4 DOWN
en3 DOWN

[hacmp28:HAES7101/AIX61-06 /]
# lscluster -m
Calling node query for all nodes
Node query number of nodes examined: 2

Node name: hacmp27


Cluster shorthand id for node: 1
uuid for node: 66b66b10-d16e-11df-aa3f-0011257e4998

274 IBM PowerHA SystemMirror 7.1 for AIX


State of node: UP REACHABLE THROUGH REPOS DISK ONLY
Smoothed rtt to node: 17
Mean Deviation in network rtt to node: 5
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
ha71sp1_aixsp2 local c37f7324-daff-11df-903e-0011257e4998

Number of points_of_contact for node: 5


Point-of-contact interface & contact state
dpcom UP
en4 DOWN
en5 DOWN
en3 DOWN
en0 DOWN

------------------------------

Node name: hacmp28


Cluster shorthand id for node: 2
uuid for node: 15e86116-d173-11df-8bdf-0011257e4340
State of node: UP NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
ha71sp1_aixsp2 local c37f7324-daff-11df-903e-0011257e4998

Number of points_of_contact for node: 0


Point-of-contact interface & contact state
n/a

Example 9-13 shows the output of the lscluster -i command with the dpcom status
changing from UP RESTRICTED AIX_CONTROLLED to UP AIX_CONTROLLED.

Example 9-13 Output of the lscluster -i command showing the dpcom status
[hacmp27:HAES7101/AIX61-06 /]
# lscluster -i
Network/Storage Interface Query

Cluster Name: ha71sp1_aixsp2


Cluster uuid: c37f7324-daff-11df-903e-0011257e4998
Number of nodes reporting = 2
Number of nodes expected = 2
Node hacmp27
Node uuid = 66b66b10-d16e-11df-aa3f-0011257e4998
Number of interfaces discovered = 5
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.7e.49.98
Smoothed rrt across interface = 8
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 110 ms

Chapter 9. Testing the PowerHA 7.1 cluster 275


ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 9.3.44.27 broadcast 9.3.44.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 2 en3
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cc.d.b5
Smoothed rrt across interface = 8
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 110 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.1.27 broadcast 10.1.1.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 3 en4
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cc.d.b6
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.2.27 broadcast 10.1.2.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 4 en5
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cc.d.b7
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.3.27 broadcast 10.1.3.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 5 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 23
Mean Deviation in network rrt across interface = 11

276 IBM PowerHA SystemMirror 7.1 for AIX


Probe interval for interface = 340 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP AIX_CONTROLLED
Node hacmp28
Node uuid = 15e86116-d173-11df-8bdf-0011257e4340
Number of interfaces discovered = 5
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.7e.43.40
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 9.3.44.28 broadcast 9.3.44.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 2 en3
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cb.e1.d
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 2
IPV4 ADDRESS: 10.1.1.28 broadcast 10.1.1.255 netmask 255.255.255.0
IPV4 ADDRESS: 192.168.1.27 broadcast 192.168.1.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 3 en4
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cb.e1.e
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.2.28 broadcast 10.1.2.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 4 en5
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cb.e1.f
Smoothed rrt across interface = 7

Chapter 9. Testing the PowerHA 7.1 cluster 277


Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.3.28 broadcast 10.1.3.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 5 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 10
Mean Deviation in network rrt across interface = 7
Probe interval for interface = 170 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP AIX_CONTROLLED

After any interface cable is reconnected, such as the en0 interface, CAA stops heartbeating
over the repository disk and resumes heartbeating over the IP interface.

Example 9-14 shows the output of the lscluster -m command after the en0 cable is
reconnected. The dpcom status changes from UP to DOWN RESTRICTED, and the en0 interface
status changes from DOWN to UP.

Example 9-14 Output of the lscluster -m command after en0 is reconnected


[hacmp27:HAES/AIX61-06 /]
# lscluster -m
Calling node query for all nodes
Node query number of nodes examined: 2

Node name: hacmp27


Cluster shorthand id for node: 1
uuid for node: 66b66b10-d16e-11df-aa3f-0011257e4998
State of node: UP NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
ha71sp1_aixsp2 local c37f7324-daff-11df-903e-0011257e4998

Number of points_of_contact for node: 0


Point-of-contact interface & contact state
n/a

------------------------------

Node name: hacmp28


Cluster shorthand id for node: 2
uuid for node: 15e86116-d173-11df-8bdf-0011257e4340
State of node: UP
Smoothed rtt to node: 7

278 IBM PowerHA SystemMirror 7.1 for AIX


Mean Deviation in network rtt to node: 4
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
ha71sp1_aixsp2 local c37f7324-daff-11df-903e-0011257e4998

Number of points_of_contact for node: 5


Point-of-contact interface & contact state
dpcom DOWN RESTRICTED
en0 UP
en5 DOWN
en4 DOWN
en3 DOWN

Example 9-15 shows the output of the lscluster -i command. The en0 interface is now
marked as UP, and the dpcom returns to UP RESTRICTED AIX_CONTROLLED.

Example 9-15 Output of the lscluster -i command


[hacmp27:HAES/AIX61-06 /]
# lscluster -i
Network/Storage Interface Query

Cluster Name: ha71sp1_aixsp2


Cluster uuid: c37f7324-daff-11df-903e-0011257e4998
Number of nodes reporting = 2
Number of nodes expected = 2
Node hacmp27
Node uuid = 66b66b10-d16e-11df-aa3f-0011257e4998
Number of interfaces discovered = 5
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.7e.49.98
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 9.3.44.27 broadcast 9.3.44.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 2 en3
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cc.d.b5
Smoothed rrt across interface = 8
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 110 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.1.27 broadcast 10.1.1.255 netmask 255.255.255.0

Chapter 9. Testing the PowerHA 7.1 cluster 279


Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 3 en4
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cc.d.b6
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.2.27 broadcast 10.1.2.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 4 en5
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cc.d.b7
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.3.27 broadcast 10.1.3.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 5 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 120
Mean Deviation in network rrt across interface = 105
Probe interval for interface = 2250 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
Node hacmp28
Node uuid = 15e86116-d173-11df-8bdf-0011257e4340
Number of interfaces discovered = 5
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.7e.43.40
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 9.3.44.28 broadcast 9.3.44.255 netmask 255.255.255.0

280 IBM PowerHA SystemMirror 7.1 for AIX


Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 2 en3
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cb.e1.d
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 2
IPV4 ADDRESS: 10.1.1.28 broadcast 10.1.1.255 netmask 255.255.255.0
IPV4 ADDRESS: 192.168.1.27 broadcast 192.168.1.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 3 en4
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cb.e1.e
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.2.28 broadcast 10.1.2.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 4 en5
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cb.e1.f
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.3.28 broadcast 10.1.3.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 5 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms

Chapter 9. Testing the PowerHA 7.1 cluster 281


ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED

9.3 Simulation of a network failure


The following section explains the simulation of a network failure.

9.3.1 Background
In PowerHA 7.1, the heartbeat method has changed. Heartbeating between the nodes is now
done by AIX. The newly introduced CAA takes the role for heartbeating and event
management.

This simulation tests a network down scenario and looks at the log files of PowerHA and CAA
monitoring. This test scenario has a two-node cluster, and one network interface is down on
one of the nodes using the ifconfig command.

This cluster has one IP heartbeat path and two non-heartbeat paths. One of the
non-heartbeat paths is a SAN-based heartbeat channel (sfwcom). The other non-heartbeat
path is heartbeating over the repository disk (dpcom). Although IP connectivity is lost when
using the ifconfig command, PowerHA SystemMirror use CAA for heartbeating over the two
other channels. This process is similar to the rs232 or diskhb heartbeat networks in previous
versions of PowerHA.

9.3.2 Testing environment


Before starting the network failover test, you check the status of the cluster. The resource
group myrg is on the riyad node as shown in Figure 9-2.

riyad:/ # netstat -i
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en0 1500 link#2 a2.4e.5f.b4.5.2 74918 0 50121 0 0
en0 1500 192.168.100 riyad 74918 0 50121 0 0
en0 1500 10.168.200 saudisvc 74918 0 50121 0 0
lo0 16896 link#1 3937 0 3937 0 0
lo0 16896 127 loopback 3937 0 3937 0 0
lo0 16896 loopback 3937 0 3937 0 0

riyad:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
myrg ONLINE riyad
OFFLINE jeddah
Figure 9-2 Status of the riyad node

282 IBM PowerHA SystemMirror 7.1 for AIX


The output of the lscluster -i command (Figure 9-3) shows that every adapter has the UP
state.

riyad:/ # lscluster -i |egrep "Interface|Node"


Network/Storage Interface Query
Node riyad
Node uuid = 2f1590d0-cc02-11df-bf20-a24e5fb40502
Interface number 1 en0
Interface state UP
Interface number 2 sfwcom
Interface state UP
Interface number 3 dpcom
Interface state UP RESTRICTED AIX_CONTROLLED
Node jeddah
Node uuid = 39710df0-cc04-11df-929f-a24e5f0d9e02
Interface number 1 en0
Interface state UP
Interface number 2 sfwcom
Interface state UP
Interface number 3 dpcom
Interface state UP RESTRICTED AIX_CONTROLLED
Figure 9-3 Output of the lscluster -i command

9.3.3 Testing a network failure


Now, the ifconfig en0 down command is issued on the riyad node. The lscluster
command shows en0 in a DOWN state and the resource group of the cluster moves to the next
available node in the chain as shown in Figure 9-4.

riyad:/ # lscluster -i |egrep "Interface|Node"


Network/Storage Interface Query
Node riyad
Node uuid = 2f1590d0-cc02-11df-bf20-a24e5fb40502
Interface number 1 en0
Interface state DOWN SOURCE SOFTWARE
Interface number 2 sfwcom
Interface state UP
Interface number 3 dpcom
Interface state UP RESTRICTED AIX_CONTROLLED
Node jeddah
Node uuid = 39710df0-cc04-11df-929f-a24e5f0d9e02
Interface number 1 en0
Interface state UP
Interface number 2 sfwcom
Interface state UP
Interface number 3 dpcom
Interface state UP RESTRICTED AIX_CONTROLLED
Figure 9-4 The lscluster -i command after a network failure

Chapter 9. Testing the PowerHA 7.1 cluster 283


The clRGinfo command shows that the myrg resource group moved to the jeddah node
(Figure 9-5).

riyad:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
myrg OFFLINE riyad
ONLINE jeddah
Figure 9-5 clRGinfo while network failure

You can also check the network down event in the /var/hacmp/adm/cluster.log file
(Figure 9-6).

Oct 6 09:57:42 riyad user:notice PowerHA SystemMirror for AIX: EVENT START:
network_down riyad net_ether_01
Oct 6 09:57:42 riyad user:notice PowerHA SystemMirror for AIX: EVENT
COMPLETED: network_down riyad net_ether_01 0
Oct 6 09:57:42 riyad user:notice PowerHA SystemMirror for AIX: EVENT START:
network_down_complete riyad net_ether_01
Oct 6 09:57:43 riyad user:notice PowerHA SystemMirror for AIX: EVENT
COMPLETED: network_down_complete riyad net_ether_01 0
Figure 9-6 Network down event from the cluster.log file

284 IBM PowerHA SystemMirror 7.1 for AIX


You can see this event by monitoring the AHAFS events. You can monitor AHAFS event by
running the /usr/sbin/rsct/bin/ahafs_mon_multi command as shown in Figure 9-7.

jeddah:/ # /usr/sbin/rsct/bin/ahafs_mon_multi
=== write String : CHANGED=YES;CLUSTER=YES
=== files being monitored:
fd file
3 /aha/cluster/nodeState.monFactory/nodeStateEvent.mon
4 /aha/cluster/nodeAddress.monFactory/nodeAddressEvent.mon
5 /aha/cluster/networkAdapterState.monFactory/networkAdapterStateEvent.mon
6 /aha/cluster/nodeList.monFactory/nodeListEvent.mon
7 /aha/cpu/processMon.monFactory/usr/sbin/rsct/bin/hagsd.mon
==================================
Loop 1:
Event for
/aha/cluster/networkAdapterState.monFactory/networkAdapterStateEvent.mon has
occurred.
BEGIN_EVENT_INFO
TIME_tvsec=1286376025
TIME_tvnsec=623294923
SEQUENCE_NUM=0
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=ADAPTER_DOWN
INTERFACE_NAME=en0
NODE_NUMBER=2
NODE_ID=0x2F1590D0CC0211DFBF20A24E5FB40502
CLUSTER_ID=0x93D8689AD0F211DFA49CA24E5F0D9E02
END_EVPROD_INFO
END_EVENT_INFO
==================================
Figure 9-7 Event monitoring from AHAFS

With help from the caa_event, you can monitor the network failure event. You can see the
CAA event by running the /usr/sbin/rsct/bin/caa_event -a command (Figure 9-8).

# /usr/sbin/rsct/bin/caa_event -a
EVENT: adapter liveness:
event_type(0)
node_number(2)
node_id(0)
sequence_number(0)
reason_number(0)
p_interface_name(en0)
EVENT: adapter liveness:
event_type(1)
node_number(2)
node_id(0)
sequence_number(1)
reason_number(0)
p_interface_name(en0)
Figure 9-8 Network failure in CAA event monitoring

Chapter 9. Testing the PowerHA 7.1 cluster 285


In this test scenario, you can see that the non-IP based heartbeat channel is working.
Compared to previous version, heartbeating is now performed by CAA.

9.4 Testing the rootvg system event


This scenario tests the event monitoring capability of PowerHA 7.1 on the new rootvg
system. Because events are now being monitored at the kernel level with CAA, you can
monitor the loss of access to the rootvg volume group.

9.4.1 The rootvg system event


As discussed previously, event monitoring is now at the kernel level. The
/usr/lib/drivers/phakernmgr kernel extension, which is loaded by the clevmgrdES
subsystem, monitors these events for loss of rootvg. It can initiate a node restart operation if
enabled to do so as shown in Figure 9-9.

PowerHA 7.1 has a new system event that is enabled by default. This new event allows for the
monitoring of the loss of the rootvg volume group while the cluster node is up and running.
Previous versions of PowerHA/HACMP were unable to monitor this type of loss. Also the
cluster was unable to perform a failover action in the event of the loss of access to rootvg. An
example is if you lose a SAN disk that is hosting the rootvg for this cluster node.

The new option is available under the SMIT menu path smitty sysmirror Custom Cluster
Configuration  Events  System Events. Figure 9-9 shows that the rootvg system event
is defined and enabled by default in PowerHA 7.1.

Change/Show Event Response

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Event Name ROOTVG +
* Response Log event and reboot +
* Active Yes +
Figure 9-9 The rootvg system event

The default event properties instruct the system to log an event and restart when a loss of
rootvg occurs. This exact scenario is tested in the next section to demonstrate this concept.

9.4.2 Testing the loss of the rootvg volume group


We simulate this test with a two-node cluster. The rootvg volume group is hosted on SAN
storage through a Virtual I/O Server (VIOS). The test removes access to the rootvg file
systems with the cluster node still up and running. This test can be done in several ways, from
pulling the physical SAN connection to making the storage unavailable to the operating
system. In this situation, the VSCSI resource is made unavailable on the VIOS.

This scenario entails a two-node cluster with one resource group. The cluster is running on
two nodes: sydney and perth. The rootvg volume group is hosted by the VIOS on a VSCSI
disk.

286 IBM PowerHA SystemMirror 7.1 for AIX


Cluster node status and mapping
First, you check the VIOS for the client mapping. You can identify the client partition number
by running the uname -L command on the cluster node. In this case, the client partition is 7.
Next you run the lsmap -all command on the VIOS, as shown in Figure 9-10, and look up
the client partition. Only one LUN is mapped through VIOS to the cluster node, because the
shared storage is attached by using Fibre Channel (FC) adapters.

lsmap -all

SVSA Physloc Client Partition


ID
--------------- -------------------------------------------- ------------------
vhost5 U9117.MMA.101F170-V1-C26 0x00000007

VTD vtscsi13
Status Available
LUN 0x8100000000000000
Backing device lp5_rootvg
Physloc
Figure 9-10 VIOS output of the lsmap command showing the rootvg resource

Check the node to ensure that you have the right disk as shown in Figure 9-11.

sydney:/ # lscfg -l hdisk0


hdisk0 U9117.MMA.101F170-V7-C5-T1-L8100000000000000 Virtual SCSI
Disk Drive

sydney:/ # lspv
hdisk0 00c1f170ff638163 rootvg active
caa_private0 00c0f6a0febff5d4 caavg_private active
hdisk2 00c1f170674f3d6b dbvg
hdisk3 00c1f1706751bc0d appvg
Figure 9-11 PowerHA node showing the mapping of hdisk0 to the VIOS

After the mapping is established, review the cluster status to ensure that the resource group
is online as shown in Figure 9-12.

sydney:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
dbrg ONLINE sydney
OFFLINE perth

sydney:/ # lssrc -ls clstrmgrES | grep "Current state"


Current state: ST_STABLE
Figure 9-12 Sydney cluster status

Chapter 9. Testing the PowerHA 7.1 cluster 287


Testing by taking the rootvg volume group offline
To perform the test, take the mapping offline on the VIOS by removing the virtual target
device definition. You do this test while the PowerHA node is up and running as shown in
Figure 9-13.

$ rmvdev -vtd vtscsi13


$ lsmap -vadapter vhost5
SVSA Physloc Client Partition
ID
--------------- -------------------------------------------- ------------------
vhost5 U9117.MMA.101F170-V1-C26 0x00000007

VTD NO VIRTUAL TARGET DEVICE FOUND


Figure 9-13 VIOS: Taking the rootvg LUN offline

You have now removed the virtual target device (VTD) mapping that maps the rootvg LUN to
the client partition, which in this case, is the PowerHA node called sydney. You perform this
operation while the node is up and running and hosting the resource group. This operation
demonstrates what happens to the node when rootvg access has been lost.

While checking the node, the node halted and failed the resource group over to the standby
node perth (Figure 9-14). This behavior is new and expected in this situation. It is a result of
the system event that monitors access to rootvg from the kernel. Checking perth shows that
the failover happened.

perth:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
dbrg OFFLINE sydney
ONLINE perth
Figure 9-14 Node status from the standby node showing that the node failed over

288 IBM PowerHA SystemMirror 7.1 for AIX


9.4.3 Loss of rootvg: What PowerHA logs
To show that this event is recognized and that you took the correct action, check the system
error report shown in Figure 9-15.

LABEL: KERNEL_PANIC
IDENTIFIER: 225E3B63

Date/Time: Wed Oct 6 14:07:54 2010


Sequence Number: 2801
Machine Id: 00C1F1704C00
Node Id: sydney
Class: S
Type: TEMP
WPAR: Global
Resource Name: PANIC

Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
ASSERT STRING

PANIC STRING
System Halt because of rootvg failure
Figure 9-15 System error report showing a rootvg failure

9.5 Simulation of a crash in the node with an active resource


group
This section presents a scenario to simulate the node crash with the resource group active.
The scenario is made of the hot-standby cluster configuration with participating nodes seoul
and busan with only one Ethernet network and two Ethernet interfaces in each node. The halt
-q command is triggered in the seoul node that is hosting the resource group.

The result is that the resource group moved to the standby node as expected. Example 9-16
shows the relevant output that is written to the busan:/var/hacmp/adm/cluster.log file.

Example 9-16 Output of the resource move to the standby node


Sep 29 16:30:22 busan user:warn|warning cld[11599982]: Shutting down all services.
Sep 29 16:30:23 busan user:warn|warning cld[11599982]: Unmounting file systems.
Sep 29 16:30:28 busan daemon:err|error ConfigRM[10879056]: (Recorded using libct_ffdc.a cv 2):::Error ID:
:::Reference ID::::Template ID: a098bf90:::Details File: :::Location:
RSCT,PeerDomain.C,1.99.1.519,17853:::CONFIGRM_PENDINGQUORUM_ER The operational quorum state of the active
peer domain has changed to PENDING_QUORUM. This state usually indicates that exactly half of the nodes
that are defined in the peer domain are online. In this state cluster resources cannot be recovered
although none will be stopped explicitly.
Sep 29 16:30:28 busan local0:crit clstrmgrES[5701662]: Wed Sep 29 16:30:28 Removing 2 from ml_idx
Sep 29 16:30:28 busan user:notice PowerHA SystemMirror for AIX: EVENT START: node_down seoul

Chapter 9. Testing the PowerHA 7.1 cluster 289


Sep 29 16:30:28 busan user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: node_down seoul 0
Sep 29 16:30:31 busan user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_release busan 1
Sep 29 16:30:31 busan user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move busan 1 RELEASE
Sep 29 16:30:32 busan user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move busan 1 RELEASE 0
Sep 29 16:30:32 busan user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_release busan 1 0
Sep 29 16:30:32 busan user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_fence busan 1
Sep 29 16:30:33 busan user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_fence busan 1 0
Sep 29 16:30:35 busan user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_fence busan 1
Sep 29 16:30:35 busan user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_fence busan 1 0
Sep 29 16:30:35 busan user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_acquire busan 1
Sep 29 16:30:35 busan user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move busan 1 ACQUIRE
Sep 29 16:30:36 busan user:notice PowerHA SystemMirror for AIX: EVENT START: acquire_takeover_addr
Sep 29 16:30:38 busan user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: acquire_takeover_addr 0
Sep 29 16:30:45 busan user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move busan 1 ACQUIRE 0
Sep 29 16:30:45 busan user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_acquire busan 1 0
Sep 29 16:30:45 busan user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_complete busan 1
Sep 29 16:30:46 busan user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_complete busan 1 0

The cld messages are related to the solidDB. The cld subsystem determines whether the
local node must become the primary or secondary solidDB server in a failover. Before the
crash, solidDB was active on the seoul node as follows:
seoul:/ # lssrc -ls IBM.StorageRM | grep Leader
Group Leader: seoul, 0xdc82faf0908920dc, 2

As expected, after the crash, solidDB is active in the remaining busan node as follows:
busan:/ # lssrc -ls IBM.StorageRM | grep Leader
Group Leader: busan, 0x564bc620973c9bdc, 1

With the absence of the seoul node, its interfaces are in STALE status as shown in
Example 9-17.

Example 9-17 The lscluster -i command to check the status of the cluster
busan:/ # lscluster -i
Network/Storage Interface Query

Cluster Name: korea


Cluster uuid: a01f47fe-d089-11df-95b5-a24e50543103
Number of nodes reporting = 2
Number of nodes expected = 2
Node busan
Node uuid = e356646e-c0dd-11df-b51d-a24e57e18a03
Number of interfaces discovered = 3
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = a2.4e.57.e1.8a.3
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x1e080863
ndd flags for interface = 0x21081b
Interface state UP
Number of regular addresses configured on interface = 2
IPV4 ADDRESS: 192.168.101.144 broadcast 192.168.103.255 netmask
255.255.255.0

290 IBM PowerHA SystemMirror 7.1 for AIX


IPV4 ADDRESS: 10.168.101.44 broadcast 10.168.103.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.43 broadcast 0.0.0.0 netmask
0.0.0.0
Interface number 2 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
Interface number 3 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = a2.4e.57.e1.8a.7
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x1e080863
ndd flags for interface = 0x21081b
Interface state UP
Number of regular addresses configured on interface = 2
IPV4 ADDRESS: 192.168.201.144 broadcast 192.168.203.255 netmask
255.255.255.0
IPV4 ADDRESS: 10.168.101.143 broadcast 10.168.103.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.43 broadcast 0.0.0.0 netmask
0.0.0.0
Node seoul
Node uuid = 4f8858be-c0dd-11df-930a-a24e50543103
Number of interfaces discovered = 3
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = a2.4e.50.54.31.3
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x1e080863
ndd flags for interface = 0x21081b
Interface state STALE
Number of regular addresses configured on interface = 2
IPV4 ADDRESS: 192.168.101.143 broadcast 192.168.103.255 netmask
255.255.255.0
IPV4 ADDRESS: 10.168.101.43 broadcast 10.168.103.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.43 broadcast 0.0.0.0 netmask
0.0.0.0
Interface number 2 en2

Chapter 9. Testing the PowerHA 7.1 cluster 291


ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = a2.4e.50.54.31.7
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x1e080863
ndd flags for interface = 0x21081b
Interface state STALE
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.143 broadcast 192.168.203.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.43 broadcast 0.0.0.0 netmask
0.0.0.0
Interface number 3 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state STALE

Results: The results were the same when issuing the halt command instead of the halt
-q command.

9.6 Simulations of CPU starvation


In previous versions of PowerHA, CPU starvation could activate the deadman switch, leading
the starved node to a halt with a consequent move of the resource groups. In PowerHA 7.1,
the deadman switch no longer exists, and its functionality is accomplished at the kernel
interruption level. This test shows how the absence of the deadman switch can influence
cluster behavior.

Scenario 1
This scenario shows the use of a stress tool on the CPU of one node with more than 50
processes in the run queue and a duration of 60 seconds.

Overview
This scenario consists of a hot-standby cluster configuration with participating nodes seoul
and busan with only one Ethernet network. Each node has two Ethernet interfaces. The
resource group is hosted on seoul, and solidDB is active on the busan node. A tool is run to
stress the seoul CPU with more than 50 processes in the run queue with a duration of 60
seconds as shown in Example 9-18 on page 293.

292 IBM PowerHA SystemMirror 7.1 for AIX


Example 9-18 Scenario testing the use of a stress tool on one node
seoul:/ # lssrc -ls IBM.StorageRM | grep Leader
Group Leader: busan, 0x564bc620973c9bdc, 1

seoul:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
db2pok_Resourc ONLINE seoul
OFFLINE busan

Beneath the lpartstat output header, you see the CPU and memory configuration for each
node:
Seoul: Power 6, type=Shared, mode=Uncapped, smt=On, lcpu=2, mem=3584MB, ent=0.50
Busan: Power 6, type=Shared, mode=Uncapped, smt=On, lcpu=2, mem=3584MB, ent=0.50

Results
Before the test, the seoul node is running within an average of 3% of its entitled capacity. The
run queue is within an average of three processes as shown in Example 9-19.

Example 9-19 The vmstat result of the seoul node


seoul:/ # vmstat 2
System configuration: lcpu=2 mem=3584MB ent=0.50
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------------------
r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec
2 0 424045 10674 0 0 0 0 0 0 92 1508 359 1 2 97 0 0.02 3.4
3 0 424045 10674 0 0 0 0 0 0 84 1001 346 1 1 97 0 0.02 3.1
3 0 424044 10675 0 0 0 0 0 0 88 1003 354 1 1 97 0 0.02 3.1
3 0 424045 10674 0 0 0 0 0 0 91 1507 352 1 2 97 0 0.02 3.5
3 0 424047 10672 0 0 0 0 0 0 89 1057 370 1 2 97 0 0.02 3.3
3 0 424064 10655 0 0 0 0 0 0 94 1106 379 1 2 97 0 0.02 3.6

During the test, the entitled capacity raised to 200%, and the run queue raised to an average
of 50 processes as shown in Example 9-20.

Example 9-20 Checking the node status after running the stress test
seoul:/ # vmstat 2
System configuration: lcpu=2 mem=3584MB ent=0.50
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------------------
r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec
52 0 405058 167390 0 0 0 0 0 0 108 988 397 42 8 50 0 0.25 50.6
41 0 405200 167248 0 0 0 0 0 0 78 140 245 99 0 0 0 0.79 158.1
49 0 405277 167167 0 0 0 0 0 0 71 206 249 99 0 0 0 1.00 199.9
50 0 405584 166860 0 0 0 0 0 0 73 33 241 99 0 0 0 1.00 199.9
48 0 405950 166491 0 0 0 0 0 0 71 297 244 99 0 0 0 1.00 199.8

As expected, the CPU starvation did not trigger a resource group move from the seoul node
to the busan node. The /var/adm/ras/syslog.caa log file reported messages about solidDB
daemons being unable to communicate, but the leader node continued to be the busan node
as shown in Example 9-21 on page 294.

Chapter 9. Testing the PowerHA 7.1 cluster 293


Example 9-21 Status of the nodes after triggering a CPU starvation scenario
seoul:/ # lssrc -ls IBM.StorageRM | grep Leader
Group Leader: busan, 0x564bc620973c9bdc, 1

seoul:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
db2pok_Resourc ONLINE seoul
OFFLINE busan

Scenario 2
This scenario shows the use of a stress tool on the CPU of two nodes with more than 50
processes in the run queue and a duration of 60 seconds.

Overview
This scenario consists of a hot-standby cluster configuration with participating nodes seoul
and busan with only one Ethernet network. Each node has two Ethernet interfaces. Both the
resource group and the solidDB are active in busan node. A tool is run to stress the CPU of
both nodes with more than 50 processes in the run queue with a duration of 60 seconds as
shown in Example 9-22.

Example 9-22 Scenario testing the use of a stress tool on both nodes
seoul:/ # lssrc -ls IBM.StorageRM | grep Leader
Group Leader: busan, 0x564bc620973c9bdc, 1

seoul:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
db2pok_Resourc OFFLINE seoul
ONLINE busan

Results
Before the test, both nodes have a low run queue and low entitled capacity as shown in
Example 9-23.

Example 9-23 Results of the stress test in scenario two


seoul:/ # vmstat 2
System configuration: lcpu=2 mem=3584MB ent=0.50
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------------------
r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec
1 0 389401 181315 0 0 0 0 0 0 95 1651 302 2 2 97 0 0.02 3.5
1 0 389405 181311 0 0 0 0 0 0 91 960 316 1 2 97 0 0.02 3.3
1 0 389406 181310 0 0 0 0 0 0 88 953 299 1 1 97 0 0.02 3.1
1 0 389408 181308 0 0 0 0 0 0 97 1461 301 1 2 97 0 0.02 3.5
1 0 389411 181305 0 0 0 0 0 0 109 967 326 1 3 96 0 0.02 4.7

busan:/ # vmstat 2
System configuration: lcpu=2 mem=3584MB ent=0.50
kthr memory page faults cpu

294 IBM PowerHA SystemMirror 7.1 for AIX


----- ----------- ------------------------ ------------ -----------------------
r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec
1 0 450395 349994 0 0 0 0 0 0 77 670 363 1 2 97 0 0.02 3.4
1 0 450395 349994 0 0 0 0 0 0 80 477 359 1 1 98 0 0.02 3.1
1 0 450395 349994 0 0 0 0 0 0 80 554 369 1 1 97 0 0.02 3.4
1 0 450395 349994 0 0 0 0 0 0 73 479 368 1 1 98 0 0.02 3.1
1 0 450395 349994 0 0 0 0 0 0 81 468 339 1 1 98 0 0.01 2.9

During the test, the seoul node kept an average of 50 processes in the run queue and an
entitled capacity of 200% as shown in Example 9-24.

Example 9-24 Seoul node vmstat results during the test


seoul:/ # vmstat 2
System configuration: lcpu=2 mem=3584MB ent=0.50
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------------------
r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec
43 0 371178 199534 0 0 0 0 0 0 74 312 251 99 0 0 0 1.00 199.8
52 0 371178 199534 0 0 0 0 0 0 73 19 247 99 0 0 0 1.00 200.0
52 0 371176 199534 0 0 0 0 0 0 75 108 249 99 0 0 0 1.00 199.9
47 0 371075 199635 0 0 0 0 0 0 74 33 257 99 0 0 0 1.00 200.1

The busan node did not respond to the vmstat command during the test. When the CPU
stress finished, it could throw just one line of output showing a run queue of 119 processes
(Example 9-25).

Example 9-25 Busan node showing only one line of output


busan:/ # vmstat 2
System configuration: lcpu=2 mem=3584MB ent=0.50
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------------------
119 0 450463 349911 0 0 0 0 0 0 56 19 234 99 0 0 0 0.50 99.6

Both the resource group and solidDB database did not move from the busan node as shown
in Example 9-26.

Example 9-26 Status of the busan node


seoul:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
db2pok_Resourc OFFLINE seoul
ONLINE busan

seoul:/ # lssrc -ls IBM.StorageRM | grep Leader


Group Leader: busan, 0x564bc620973c9bdc, 1

Conclusion
The conclusion of this test is that eventual peak performance degradation events do not
cause resource group moves and unnecessary outages.

Chapter 9. Testing the PowerHA 7.1 cluster 295


9.7 Simulation of a Group Services failure
This scenario consists of a hot-standby cluster configuration with participating nodes seoul
and busan with only one Ethernet network. Each node has two Ethernet interfaces. We end
the cthags process in the seoul node that was hosting the resource group.

As a result, the seoul node halted as expected, and the resource group is acquired by the
remaining node as shown in Example 9-27.

Example 9-27 Resource group movement


seoul:/ # lssrc -ls cthags
Subsystem Group PID Status
cthags cthags 5963978 active
5 locally-connected clients. Their PIDs:
6095070(IBM.ConfigRMd) 6357196(rmcd) 5963828(IBM.StorageRMd) 7471354(clstrmgr)
12910678(gsclvmd)
HA Group Services domain information:
Domain established by node 2
Number of groups known locally: 8
Number of Number of local
Group name providers providers/subscribers
rmc_peers 2 1 0
s00O3RA00009G0000015CDBQGFL 2 1 0
IBM.ConfigRM 2 1 0
IBM.StorageRM.v1 2 1 0
CLRESMGRD_1108531106 2 1 0
CLRESMGRDNPD_1108531106 2 1 0
CLSTRMGR_1108531106 2 1 0
d00O3RA00009G0000015CDBQGFL 2 1 0
Critical clients will be terminated if unresponsive

seoul:/ # ps -ef | grep cthagsd | grep -v grep


root 5963978 3866784 4 17:02:33 - 0:00 /usr/sbin/rsct/bin/hagsd cthags

seoul:/ # kill -9 5963978

The seoul:/var/adm/ras/syslog.caa log file recorded the messages before the crash. You
can observe that the seoul node was halted after 1 second as shown in Example 9-28.

Example 9-28 Message in the syslog.caa file in the seoul node


Sep 29 17:02:33 seoul daemon:err|error RMCdaemon[6357196]: (Recorded using libct_ffdc.a cv 2):::Error ID:
6XqlQl0dZucA/POE1DK4e.1...................:::Reference ID: :::Template ID: b1731da3:::Details File:
:::Location: RSCT,rmcd_gsi.c,1.50,10
48 :::RMCD_2610_101_ER Internal error. Error data 1 00000001 Error data 2 00000000 Error data 3 dispatch_gs
Sep 29 17:02:33 seoul local0:crit clstrmgrES[7471354]: Wed Sep 29 17:02:33 announcementCb: Called,
state=ST_STABLE, provider token 1
Sep 29 17:02:33 seoul local0:crit clstrmgrES[7471354]: Wed Sep 29 17:02:33 announcementCb: GsToken 3,
AdapterToken 4, rm_GsToken 1
Sep 29 17:02:33 seoul local0:crit clstrmgrES[7471354]: Wed Sep 29 17:02:33 announcementCb: GRPSVCS
announcment code=512; exiting
Sep 29 17:02:33 seoul local0:crit clstrmgrES[7471354]: Wed Sep 29 17:02:33 CHECK FOR FAILURE OF RSCT
SUBSYSTEMS (cthags)
Sep 29 17:02:33 seoul daemon:err|error ConfigRM[6095070]: (Recorded using libct_ffdc.a cv 2):::Error ID:
:::Reference ID: :::Template ID: 362b0a5f:::Details File: :::Location:
RSCT,PeerDomain.C,1.99.1.519,21079:::CONFIGRM_EXIT_GS_ER The peer domain configuration manager daemon

296 IBM PowerHA SystemMirror 7.1 for AIX


(IBM.ConfigRMd) is exiting due to the Group Services subsystem terminating. The configuration manager
daemon will restart automatically, synchronize the nodes configuration with the domain and rejoin the
domain if possible.
Sep 29 17:02:34 seoul daemon:notice StorageRM[5963828]: (Recorded using libct_ffdc.a cv 2):::Error ID:
:::Reference ID: :::Template ID: a8576c0d:::Details File: :::Location: RSCT,StorageRMDaemon.C,1.56,323
:::STORAGERM_STOPPED_ST IBM.StorageRM daemon has been stopped.
Sep 29 17:02:34 seoul daemon:notice ConfigRM[6095070]: (Recorded using libct_ffdc.a cv 2):::Error ID:
:::Reference ID: :::Template ID: de84c4db:::Details File: :::Location: RSCT,IBM.ConfigRMd.C,1.55,346
:::CONFIGRM_STARTED_STIBM.ConfigRM daemon has started.
Sep 29 17:02:34 seoul daemon:notice snmpd[3342454]: NOTICE: lost peer (SMUX ::1+51812+5)
Sep 29 17:02:34 seoul daemon:notice RMCdaemon[15663146]: (Recorded using libct_ffdc.a cv 2):::Error ID:
6eKora0eZucA/Xuo/D
K4e.1...................:::Reference ID: :::Template ID: a6df45aa:::Details File: :::Location:
RSCT,rmcd.c,1.75,225:::RMCD_INFO_0_ST The daemon is started.
Sep 29 17:02:34 seoul user:notice PowerHA SystemMirror for AIX: clexit.rc : Unexpected termination of
clstrmgrES
Sep 29 17:02:34 seoul user:notice PowerHA SystemMirror for AIX: clexit.rc : Halting system immediately!!!

9.8 Testing a Start After resource group dependency


This test uses the example that was configured in 5.1.6, “Configuring Start After and Stop
After resource group dependencies” on page 96. Figure 9-16 shows a summary of the
configuration. The dependency configuration of the Start After resource group is tested to see
whether it works as expected.

Figure 9-16 Start After dependency between the apprg and dbrg resource group

Chapter 9. Testing the PowerHA 7.1 cluster 297


9.8.1 Testing the standard configuration of a Start After resource group
dependency
Example 9-29 shows the state of a resource group pair after a normal startup of the cluster on
both nodes.

Example 9-29 clRGinfo for a Start After resource group pair


sydney:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
dbrg ONLINE sydney
OFFLINE perth
apprg ONLINE perth
OFFLINE sydney

With both resource groups online, the source (dependent) apprg resource group can be
brought offline and then online again. Alternatively, it can be gracefully moved to another
node without any influence on the target dbrg resource group. With both resource groups
online, the source (dependent) apprg resource group can be brought offline and then online
again. Alternatively, it can be gracefully moved to another node without any influence on the
target dbrg resource group. Target resource group can also be brought offline. However, to
bring the source resource group online, the target resource group must be brought online
manually (if it is offline).

If you start the cluster only on the home node of the source resource group, the apprg
resource group in this case, the cluster waits until the dbrg resource group is brought online
as shown in Example 9-30. The startup policy is Online On Home Node Only for both resource
groups.

Example 9-30 Offline because the target is offline from clRGinfo


sydney:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
dbrg OFFLINE sydney
OFFLINE perth
apprg OFFLINE due to target offlin perth
OFFLINE sydney

9.8.2 Testing application startup with Startup Monitoring configured


For this test, both resource groups are started on the same node. This way their application
scripts logs messages in the same file so that you can see the detailed sequence of their start
and finish moments. The home node is temporarily modified to sydney for both resource
groups. Then the cluster is started only on the sydney node with both resource groups.

298 IBM PowerHA SystemMirror 7.1 for AIX


Example 9-31 shows the start, stop, and monitoring scripts. Note the syslog configuration that
was made to log the messages through the local7 facility in the
/var/hacmp/log/StartAfter_cluster.log file.

Example 9-31 Dummy start, stop, and monitoring scripts


sydney:/HA71 # ls
app_mon.sh app_stop.sh db_start.sh
app_start.sh db_mon.sh db_stop.sh

sydney:/HA71 # cat db_start.sh


#!/bin/ksh
fp="local7.info"
file="`expr "//$0" : '.*/\([^/]*\)'`"
# cleanup
if [ -f /dbmp/db.lck ]; then rm /dbmp/db.lck; fi

logger -t"$file" -p$fp "Starting up DB... "


sleep 50
echo "DB started at:\n\t`date`">/dbmp/db.lck
logger -t"$file" -p$fp "DB is running!"
exit 0

sydney:/HA71 # cat db_stop.sh


#!/bin/ksh
fp="local7.info"
file="`expr "//$0" : '.*/\([^/]*\)'`"

logger -t"$file" -p$fp "Shutting down DB... "


sleep 20
# cleanup
if [ -f /dbmp/db.lck ]; then rm /dbmp/db.lck; fi
logger -t"$file" -p$fp "DB stopped!"
exit 0

sydney:/HA71 # cat db_mon.sh


#!/bin/ksh
fp="local7.info"
file="`expr "//$0" : '.*/\([^/]*\)'`"

if [ -f /dbmp/db.lck ]; then
logger -t"$file" -p$fp "DB is running!"
exit 0
fi
logger -t"$file" -p$fp "DB is NOT running!"
exit 1

sydney:/HA71 # cat app_start.sh


#!/bin/ksh
fp="local7.info"
file="`expr "//$0" : '.*/\([^/]*\)'`"
# cleanup
if [ -f /appmp/app.lck ]; then rm /appmp/app.lck; fi

logger -t"$file" -p$fp "Starting up APP... "


sleep 10

Chapter 9. Testing the PowerHA 7.1 cluster 299


echo "APP started at:\n\t`date`">/appmp/app.lck
logger -t"$file" -p$fp "APP is running!"
exit 0

sydney:/HA71 # cat app_stop.sh


#!/bin/ksh
fp="local7.info"
file="`expr "//$0" : '.*/\([^/]*\)'`"

logger -t"$file" -p$fp "Shutting down APP... "


sleep 2
# cleanup
if [ -f /appmp/app.lck ]; then rm /appmp/app.lck; fi
logger -t"$file" -p$fp "APP stopped!"
exit 0

sydney:/HA71 # cat app_mon.sh


#!/bin/ksh
fp="local7.info"
file="`expr "//$0" : '.*/\([^/]*\)'`"

if [ -f /appmp/app.lck ]; then
logger -t"$file" -p$fp "APP is running!"
exit 0
fi
logger -t"$file" -p$fp "APP is NOT running!"
exit 1

sydney:/HA71 # grep local7 /etc/syslog.conf


local7.info /var/hacmp/log/StartAfter_cluster.log rotate size 256k files 4

Without Startup Monitoring, the APP startup script is launched before the DB startup script
returns as shown Example 9-32.

Example 9-32 Startup sequence without Startup monitoring mode


...
Oct 12 07:53:26 sydney local7:info db_mon.sh: DB is NOT running!
Oct 12 07:53:27 sydney local7:info db_start.sh: Starting up DB...
Oct 12 07:53:36 sydney local7:info app_mon.sh: APP is NOT running!
Oct 12 07:53:37 sydney local7:info app_start.sh: Starting up APP...
Oct 12 07:53:47 sydney local7:info app_start.sh: APP is running!
Oct 12 07:53:53 sydney local7:info app_mon.sh: APP is running!
Oct 12 07:54:17 sydney local7:info db_start.sh: DB is running!
Oct 12 07:54:23 sydney local7:info app_mon.sh: APP is running!
...

300 IBM PowerHA SystemMirror 7.1 for AIX


With Startup Monitoring, the APP startup script is launched after the DB startup script
returns, as shown in Example 9-33, and as expected.

Example 9-33 Startup sequence with Startup Monitoring


...
Oct 12 08:02:38 sydney local7:info db_mon.sh: DB is NOT running!
Oct 12 08:02:39 sydney local7:info db_start.sh: Starting up DB...
Oct 12 08:02:39 sydney local7:info db_mon.sh: DB is NOT running!
Oct 12 08:02:45 sydney local7:info db_mon.sh: DB is NOT running!
Oct 12 08:02:51 sydney local7:info db_mon.sh: DB is NOT running!
Oct 12 08:02:57 sydney local7:info db_mon.sh: DB is NOT running!
Oct 12 08:03:03 sydney local7:info db_mon.sh: DB is NOT running!
Oct 12 08:03:09 sydney local7:info db_mon.sh: DB is NOT running!
Oct 12 08:03:15 sydney local7:info db_mon.sh: DB is NOT running!
Oct 12 08:03:21 sydney local7:info db_mon.sh: DB is NOT running!
Oct 12 08:03:27 sydney local7:info db_mon.sh: DB is NOT running!
Oct 12 08:03:29 sydney local7:info db_start.sh: DB is running!
Oct 12 08:03:33 sydney local7:info db_mon.sh: DB is running!
Oct 12 08:03:49 sydney local7:info app_mon.sh: APP is NOT running!
Oct 12 08:03:50 sydney local7:info app_start.sh: Starting up APP...
Oct 12 08:04:00 sydney local7:info app_start.sh: APP is running!
...

Example 9-34 shows the state change of the resource groups during this startup.

Example 9-34 Resource group state during startup


sydney:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
dbrg OFFLINE sydney
OFFLINE perth
apprg OFFLINE sydney
OFFLINE perth

sydney:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
dbrg ACQUIRING sydney
OFFLINE perth
apprg TEMPORARY ERROR sydney
OFFLINE perth

sydney:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
dbrg ONLINE sydney
OFFLINE perth

apprg ACQUIRING sydney


OFFLINE perth

Chapter 9. Testing the PowerHA 7.1 cluster 301


sydney:/ # clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
dbrg ONLINE sydney
OFFLINE perth
apprg ONLINE sydney
OFFLINE perth

9.9 Testing dynamic node priority


This test has the algeria, brazil, and usa nodes, and one resource group in the cluster as
shown in Figure 9-17. This resource group is configured to fail over based on a script return
value. The DNP.sh script returns different values for each node. For details about configuring
the dynamic node priority (DNP), see 5.1.8, “Configuring the dynamic node priority (adaptive
failover)” on page 102.

Figure 9-17 Dynamic node priority test environment

Table 9-1 provides the cluster details.

Table 9-1 Cluster details


Field Value

Resource name algeria_rg

Participating nodes algeria, brazil, usa

Dynamic node priority policy cl_lowest_nonzero_udscript_rc

302 IBM PowerHA SystemMirror 7.1 for AIX


Field Value

DNP script path /usr/IBM/HTTPServer/bin/DNP.sh

DNP script timeout value 20

The default node priority is algeria first, then brazil, and then usa. The usa node gets the
lowest return value from DNP.sh. When a resource group failover is triggered, the algeria_rg
resource group is moved to the usa node, because the return value is the lowest one as
shown in Example 9-35.

Example 9-35 Expected return value for each nodes


usa:/ # clcmd cat /usr/IBM/HTTPServer/bin/DNP.sh
-------------------------------
NODE usa
-------------------------------
exit 100

-------------------------------
NODE brazil
-------------------------------
exit 105

-------------------------------
NODE algeria
-------------------------------
exit 103

When the resource group fails over, algeria_rg moves from the algeria node to the usa
node, which has the lowest return value in DNP.sh as shown in Figure 9-18.

# clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
algeria_rg ONLINE algeria
OFFLINE brazil
OFFLINE usa

# clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
algeria_rg OFFLINE algeria
OFFLINE brazil
ONLINE usa
Figure 9-18 clRGinfo of before and after takeover

Chapter 9. Testing the PowerHA 7.1 cluster 303


Then the DNP.sh script is modified to set brazil with the lowest return value as shown in
Example 9-36.

Example 9-36 Changing the DNP.sh file


usa:/ # clcmd cat /usr/IBM/HTTPServer/bin/DNP.sh

-------------------------------
NODE usa
-------------------------------
exit 100

-------------------------------
NODE brazil
-------------------------------
exit 101

-------------------------------
NODE algeria
-------------------------------
exit 103

Upon resource group failover, the resource group moves to brazil as long as it has the
lowest return value among the cluster nodes this time as shown in Figure 9-19.

# clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
algeria_rg OFFLINE algeria
OFFLINE brazil
ONLINE usa

# clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
algeria_rg OFFLINE algeria
ONLINE brazil
OFFLINE usa
Figure 9-19 Resource group moving

To simplify the test scenario, DNP.sh is defined to simply return a value. In a real situation, you
can replace this DNP.sh sample file with any customized script. Then, node failover is done
based upon the return value of your own script.

304 IBM PowerHA SystemMirror 7.1 for AIX


10

Chapter 10. Troubleshooting PowerHA 7.1


This chapter shares the experiences of the writers of this IBM Redbooks publication and the
lessons learned in all the phases of implementing PowerHA 7.1 to help you troubleshoot your
migration, installation, configuration, and Cluster Aware AIX (CAA).

This chapter includes the following topics:


 Locating the log files
 Troubleshooting the migration
 Troubleshooting the installation and configuration
 Troubleshooting problems with CAA

© Copyright IBM Corp. 2011. All rights reserved. 305


10.1 Locating the log files
This section explains where you can find the various log files in your PowerHA cluster to
assist in managing problems with CAA and PowerHA.

10.1.1 CAA log files


You can check the CAA clutils log file and the syslog file for error messages as explained in
the following sections.

The clutils file


If you experience a problem with an operation, such as creating a cluster in CAA, check the
/var/hacmp/log/clutils.log log file.

The syslog facility


The CAA service uses the syslog facility to log errors and debugging information. All CAA
messages are written to the /var/adm/ras/syslog.caa file.

For verbose logging information, you must enable debug mode by editing the
/etc/syslog.conf configuration file and adding the following line as shown in Figure 10-1:
*.debug /tmp/syslog.out rotate size 10m files 10

local0.crit /dev/console
local0.info /var/hacmp/adm/cluster.log
user.notice /var/hacmp/adm/cluster.log
daemon.notice /var/hacmp/adm/cluster.log
*.info /var/adm/ras/syslog.caa rotate size 1m files 10
*.debug /tmp/syslog.out rotate size 10m files 10
Figure 10-1 Extract from the /etc/syslog.conf file

After you make this change, verify that a syslog.out file is in the in /tmp directory. If this file is
not in the directory, create one by entering the touch /tmp/syslog.out command. After you
create the file, refresh the syslog daemon by issuing the refresh -s syslogd command.

When debug mode is enabled, you capture detailed debugging information in the
/tmp/syslog.out file. This information can assist you in troubleshooting problems with
commands, such as the mkcluster command during cluster migration.

10.1.2 PowerHA log files


The following PowerHA log files are most commonly used:
/var/hacmp/adm/cluster.log
One of the main sources of information for the administrator. This file
tracks time-stamped messages of all PowerHA events, scripts, and
daemons.
/var/hacmp/log/hacmp.out
Along with cluster.log file, this file is the most important source of
information. Recent PowerHA releases are sending more details to
this log file, including summaries of events and the location of
resource groups.

306 IBM PowerHA SystemMirror 7.1 for AIX


/var/log/clcomd/clcomd.log
Includes information about communication that is exchanged among
all the cluster nodes.

Increasing the verbose logging level


You can increase the verbose logging level in PowerHA by enabling the export
VERBOSE_LOGGING=high setting. This setting enables a high level of logging for PowerHA. The
result is that you see more information in the log files when this variable is exported in such
logs as the hacmp.out and clmigcheck.log files.

Listing the PowerHA log files by using the clmgr utility


One of the common ways to have a list of all PowerHA log files is to use the clmgr
command-line utility. First you run the clmgr view log command to access a list of the
available logs as shown in Example 10-1. Then you run the clmgr view log logname
command replacing logname with the log that you want to analyze.

Example 10-1 Generating a list of PowerHA log files with the clmgr utility
seoul:/ # clmgr view log
ERROR: """" does not appear to exist!
Available Logs:
autoverify.log
cl2siteconfig_assist.log
cl_testtool.log
clavan.log
clcomd.log
clcomddiag.log
clconfigassist.log
clinfo.log
clstrmgr.debug
clstrmgr.debug.long
cluster.log
cluster.mmddyyyy
clutils.log
clverify.log
cspoc.log
cspoc.log.long
cspoc.log.remote
dhcpsa.log
dnssa.log
domino_server.log
emuhacmp.out
hacmp.out
ihssa.log
migration.log
sa.log
sax.log

seoul:/ # clmgr view log cspoc.log | more


Warning: no options were provided for log "cspoc.log".
Defaulting to the last 500 lines.
09/21/10 10:23:09 seoul: success: clresactive -v datavg
09/21/10 10:23:10 seoul: success: /usr/es/sbin/cluster/cspoc/clshowfs2 datavg

09/21/10 10:23:29 [========== C_SPOC COMMAND LINE ==========]

Chapter 10. Troubleshooting PowerHA 7.1 307


09/21/10 10:23:29 /usr/es/sbin/cluster/sbin/cl_chfs -cspoc -nseoul,busan -FM -a
size=+896 -A no /database/logdir
09/21/10 10:23:29 busan: success: clresactive -v datavg
09/21/10 10:23:29 seoul: success: clresactive -v datavg
09/21/10 10:23:30 seoul: success: eval LC_ALL=C lspv
09/21/10 10:23:35 seoul: success: chfs -A no -a size="+1835008" /database/logdir
09/21/10 10:23:36 seoul: success: odmget -q 'attribute = label and value =
/database/logdir' CuAt
09/21/10 10:23:37 busan: success: eval varyonvg -n -c -A datavg ;
imfs -lx lvdata09
; imfs -l lvdata09;
varyonvg -n -c -P datavg.

10.2 Troubleshooting the migration


This section offers a collection of problems and solutions that you might encounter when
migration testing. The information is based on the experience of the writers of this Redbooks
publication.

10.2.1 The clmigcheck script


The clmigcheck script writes all activity to the /tmp/clmigcheck.log file (Figure 10-2).
Therefore, you must first look in this file for an error message if you run into any problems with
the clmigcheck utility.

mk_cluster: ERROR: Problems encountered creating the cluster in AIX.


Use the syslog facility to see output from the mkcluster command.

Error termination on: Wed Sep 22 15:47:43 EDT 2010


Figure 10-2 Output from the clmigcheck.log file

10.2.2 The ‘Cluster still stuck in migration’ condition


When migration is completed, you might not progress to the update of the Object Data
Manager (ODM) entries until the node_up event is run on the last node of the cluster. If you
have this problem, start the node to see if this action completes the migration protocol and
updates the version numbers correctly. For PowerHA 7.1, the version number must be 12 in
the HACMPcluster class. You can verify this number by running odmget as shown in example
7-51. If the version number is less than 12, you are still stuck in migration and must call IBM
support.

10.2.3 Existing non-IP networks


The following section provides details about problems with existing non-IP networks that are
not removed. It describes a possible workaround to remove disk heartbeat networks if they
were not deleted as part of the migration process.

308 IBM PowerHA SystemMirror 7.1 for AIX


After the migration, the output of the cltopinfo command might still show the disk heartbeat
network as shown in Example 10-2.

Example 10-2 The cltopinfo command with the disk heartbeat still being displayed
berlin:/ # cltopinfo
Cluster Name: de_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: caa_private0
Cluster IP Address:
There are 2 node(s) and 3 network(s) defined
NODE berlin:
Network net_diskhb_01
berlin_hdisk1_01 /dev/hdisk1
Network net_ether_01
berlin 192.168.101.141
Network net_ether_010
alleman 10.168.101.142
german 10.168.101.141
berlinb1 192.168.200.141
berlinb2 192.168.220.141
NODE munich:
Network net_diskhb_01
munich_hdisk1_01 /dev/hdisk1
Network net_ether_01
munich 192.168.101.142
Network net_ether_010
alleman 10.168.101.142
german 10.168.101.141
munichb1 192.168.200.142
munichb2 192.168.220.142

Resource Group http_rg


Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node In The List
Fallback Policy Never Fallback
Participating Nodes munich berlin
Service IP Label alleman

Resource Group nfs_rg


Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node In The List
Fallback Policy Fallback To Higher Priority Node In The List
Participating Nodes berlin munich
Service IP Label german

Chapter 10. Troubleshooting PowerHA 7.1 309


To remove the disk heartbeat network, follow these steps:
1. Stop PowerHA on all cluster nodes. You must perform this action because the removal
does not work in a running cluster. Figure 10-3 shows the error message that is received
when trying to remove the network in an active cluster.

COMMAND STATUS

Command: failed stdout: yes stderr: no

Before command completion, additional instructions may appear below.

cldare: Migration from PowerHA SystemMirror to PowerHA SystemMirror/ES


detected.
A DARE event cannot be run until the migration has completed.

F1=Help F2=Refresh F3=Cancel F6=Command


F8=Image F9=Shell F10=Exit /=Find
n=Find Next
Figure 10-3 Cluster synchronization error message

310 IBM PowerHA SystemMirror 7.1 for AIX


2. Remove the network:
a. Follow the path smitty sysmirror  Cluster Nodes and Networks  Manage
Networks and Network Interfaces  Networks  Remove a Network.
b. On the SMIT panel, similar to the one shown in Figure 10-4, select the disk heartbeat
network that you want to remove.
You might have to repeat these steps if you have more than one disk heartbeat network.

Networks

Move cursor to desired item and press Enter.

Add a Network
Change/Show a Network
Remove a Network

+--------------------------------------------------------------------------+
| Select a Network to Remove |
| |
| Move cursor to desired item and press Enter. |
| |
| net_diskhb_01 |
| net_ether_01 (192.168.100.0/22) |
| net_ether_010 (10.168.101.0/24 192.168.200.0/24 192.168.220.0/24) |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 10-4 Removing the disk heartbeat network

3. Synchronize your cluster by selecting the path: smitty sysmirror  Custom Cluster
Configuration  Verify and Synchronize Cluster Configuration (Advanced).
4. See if the network is deleted by using the cltopinfo command as shown in Example 10-3.

Example 10-3 Output of the cltopinfo command after removing the disk heartbeat network
berlin:/ # cltopinfo
Cluster Name: de_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: caa_private0
Cluster IP Address:
There are 2 node(s) and 2 network(s) defined
NODE berlin:
Network net_ether_01
berlin 192.168.101.141
Network net_ether_010
german 10.168.101.141
alleman 10.168.101.142
berlinb1 192.168.200.141

Chapter 10. Troubleshooting PowerHA 7.1 311


berlinb2 192.168.220.141
NODE munich:
Network net_ether_01
munich 192.168.101.142
Network net_ether_010
german 10.168.101.141
alleman 10.168.101.142
munichb1 192.168.200.142
munichb2 192.168.220.142

Resource Group http_rg


Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node In The List
Fallback Policy Never Fallback
Participating Nodes munich berlin
Service IP Label alleman

Resource Group nfs_rg


Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node In The List
Fallback Policy Fallback To Higher Priority Node In The List
Participating Nodes berlin munich
Service IP Label german
berlin:/ #

5. Start PowerHA on all your cluster nodes by running the smitty cl_start command.

10.3 Troubleshooting the installation and configuration


This section explains how you can recover from various installation and configuration
problems on CAA and PowerHA.

10.3.1 The clstat and cldump utilities and the SNMP


After installing and configuring PowerHA 7.1 in AIX 7.1, the clstat and cldump utilities do not
work. If you experience this problem, convert the SNMP from version 3 to version 1. See
Example 10-4 for all the steps to correct this problem.

Example 10-4 The clstat utility not working under SNMP V3

seoul:/ # clstat -a
Failed retrieving cluster information.
There are a number of possible causes:
clinfoES or snmpd subsystems are not active.
snmp is unresponsive.
snmp is not configured correctly.
Cluster services are not active on any nodes.
Refer to the HACMP Administration Guide for more information.

seoul:/ # stopsrc -s snmpd


0513-044 The snmpd Subsystem was requested to stop.

seoul:/ # ls -ld /usr/sbin/snmpd

312 IBM PowerHA SystemMirror 7.1 for AIX


lrwxrwxrwx 1 root system 9 Sep 15 22:17 /usr/sbin/snmpd ->
snmpdv3ne

seoul:/ # /usr/sbin/snmpv3_ssw -1
Stop daemon: snmpmibd
In /etc/rc.tcpip file, comment out the line that contains: snmpmibd
In /etc/rc.tcpip file, remove the comment from the line that contains: dpid2
Make the symbolic link from /usr/sbin/snmpd to /usr/sbin/snmpdv1
Make the symbolic link from /usr/sbin/clsnmp to /usr/sbin/clsnmpne
Start daemon: dpid2

seoul:/ # ls -ld /usr/sbin/snmpd


lrwxrwxrwx 1 root system 17 Sep 20 09:49 /usr/sbin/snmpd ->
/usr/sbin/snmpdv1

seoul:/ # startsrc -s snmpd


0513-059 The snmpd Subsystem has been started. Subsystem PID is 8126570.

10.3.2 The /var/log/clcomd/clcomd.log file and the security keys


You might find that you cannot start the clcomd daemon and its log file has messages
indicating problems with the security keys as shown in Example 10-5.

Example 10-5 The clcomd daemon indicating problems with the security keys
2010-09-23T00:02:07.983104: WARNING: Cannot read the key
/etc/security/cluster/key_md5_des
2010-09-23T00:02:07.985975: WARNING: Cannot read the key
/etc/security/cluster/key_md5_3des
2010-09-23T00:02:07.986082: WARNING: Cannot read the key
/etc/security/cluster/key_md5_aes

This problem means that the /etc/cluster/rhosts file is not completed correctly. On all
cluster nodes, edit this file by using the IP addresses as the communication paths during
cluster definition, before the first synchronization. Use the host name as the persistent
address and the communication path. Then add the persistent addresses to the
/etc/cluster/rhosts file. Finally, issue the startsrc -s clcomd command.

10.3.3 The ECM volume group


When creating an ECM volume group by using the PowerHA C-SPOC menus, the
administrator receives the message shown in Example 10-6 about the inability to create the
group.

Example 10-6 Error messages when trying to create an ECM volume group using C-SPOC
seoul: 0516-1335 mkvg: This system does not support enhanced concurrent capable
seoul: volume groups.
seoul: 0516-862 mkvg: Unable to create volume group.
seoul: cl_rsh had exit code = 1, see cspoc.log and/or clcomd.log for more
information
cl_mkvg: An error occurred executing mkvg appvg on node seoul

In /var/hacmp/log/cspoc.log, the messages are:

Chapter 10. Troubleshooting PowerHA 7.1 313


09/14/10 17:41:40 [========== C_SPOC COMMAND LINE ==========]
09/14/10 17:41:40 /usr/es/sbin/cluster/sbin/cl_mkvg -f -n -B -cspoc -nseoul,busan
-rdatarg -y datavg -s32 -V100 -lfalse -
E 00c0f6a0107734ea 00c0f6a010773532 00c0f6a0fed38de6 00c0f6a0fed3d324
00c0f6a0fed3ef8f
09/14/10 17:41:40 busan: success: clresactive -v datavg
09/14/10 17:41:40 seoul: success: clresactive -v datavg
09/14/10 17:41:41 cl_mkvg: cl_mkvg: An error occurred executing mkvg datavg on
node seoul
09/14/10 17:41:41 seoul: FAILED: mkvg -f -n -B -y datavg -s 32 -V 100 -C cldisk4
cldisk3 cldisk1 cldisk2 cldisk5
09/14/10 17:41:41 seoul: 0516-1335 mkvg: This system does not support enhanced
concurrent capable
09/14/10 17:41:41 seoul: volume groups.
09/14/10 17:41:41 seoul: 0516-862 mkvg: Unable to create volume group.
09/14/10 17:41:41 seoul: RETURN_CODE=1
09/14/10 17:41:41 seoul: cl_rsh had exit code = 1, see cspoc.log and/or clcomd.log
for more information
09/14/10 17:41:42 seoul: success: cl_vg_fence_init datavg rw cldisk4 cldisk3
cldisk1 cldisk2 cldisk5

In this case, install the bos.clvm.enh file set and any fixes for this file set for the system to stay
in a consistent version state.

10.3.4 Communication path


If your cluster node communication path is misconfigured, you might see an error message
similar to the one shown in Figure 10-5.

------------[ PowerHA SystemMirror Migration Check ]-------------

ERROR: Communications Path for node brazil must be set to hostname

Hit <Enter> to continue

ERROR:
Figure 10-5 clmigcheck error for communication path

314 IBM PowerHA SystemMirror 7.1 for AIX


If you see an error for communication path while running the clmigcheck program, verify that
the /etc/hosts file includes the communication path for the cluster. Also check the
communication path in the HACMPnode ODM class as shown in Figure 10-6.

algeria:/ # odmget HACMPnode | grep -p COMMUNICATION


HACMPnode:
name = "algeria"
object = "COMMUNICATION_PATH"
value = "algeria"
node_id = 1
node_handle = 1
version = 12

HACMPnode:
name = "brazil"
object = "COMMUNICATION_PATH"
value = "brazil"
node_id = 3
node_handle = 3
version = 12
Figure 10-6 Communication path definition at HACMPnode.odm

Because the clmigcheck program is a ksh script, certain profiles can cause a similar problem. If
the problem persists after you correct the /etc/hosts configuration file, try to remove the
contents of the kshrc file because it might be affecting the behavior of the clmigcheck program.

If your /etc/cluster/rhosts program is not configured properly, you see an error message
similar to the one shown in Figure 10-7. The /etc/cluster/rhosts file must contain the fully
qualified domain name of each node in the cluster (that is, the output from the host name
command). After changing the /etc/cluster/rhosts file, run the stopsrc and startsrc
commands on the clcomd subsystem.

brazil:/ # clmigcheck
lslpp: Fileset hageo* not installed.
rshexec: cannot connect to node algeria
ERROR: Internode communication failed,
check the clcomd.log file for more information.

brazil:/ # clrsh algeria date


connect: Connection refused
rshexec: cannot open socket
Figure 10-7 The clcomd error message

You can also check clcomd communication by using the clrsh command as shown in
Figure 10-8.

algeria:/ # clrsh algeria date


Mon Sep 27 11:14:12 EDT 2010
algeria:/ # clrsh brazil date
Mon Sep 27 11:14:15 EDT 2010
Figure 10-8 Checking the clcomd connection

Chapter 10. Troubleshooting PowerHA 7.1 315


10.4 Troubleshooting problems with CAA
In this chapter, we discuss various problems that you could encounter on configuration or
installation of CAA, and provide recovery steps.

10.4.1 Previously used repository disk for CAA


When defining a PowerHA cluster, you must define a disk to use as the repository for the
CAA. If the specified disk was used previously as a repository by another cluster, upon
synchronizing the cluster, you receive a message in the /var/adm/ras/syslog.caa file (or
another file defined in /etc/syslog.conf). Example 10-7 shows the message that you
receive.

Example 10-7 CAA error message in the /var/adm/ras/syslog.caa file


Sep 16 08:58:14 seoul user:err|error syslog: validate_device: Specified device,
hdisk1, is a repository.
Sep 16 08:58:14 seoul user:warn|warning syslog: To force cleanup of this disk, use
rmcluster -r hdisk1

Example 10-8 shows the exact error message saved in the smit.log file.

Example 10-8 CAA errors in the smit.log file


ERROR: Problems encountered creating the cluster in AIX. Use the syslog facility
to see output from the mkcluster command.
ERROR: Creating the cluster in AIX failed. Check output for errors in local
cluster configuration, correct them, and try synchronization
again.

The message includes the solution as shown in Example 10-7. You run the rmcluster
command as shown in Example 10-9 to remove all CAA structures from the specified disk.

Example 10-9 Removing CAA structures from a disk


seoul:/ # rmcluster -r hdisk1
This operation will scrub hdisk1, removing any volume groups and clearing cluster
identifiers.
If another cluster is using this disk, that cluster will be destroyed.
Are you sure? (y/[n]) y
remove_cluster_repository: Couldn't get cluster repos lock.
remove_cluster_repository: Force continue.

After you issue the rmcluster command, the administrator can synchronize the cluster again.

Tip: After running the rmcluster command, verify that the caa_private0 disk has been
unconfigured and is not seen on other nodes. Run the lqueryvg -Atp command against
the repository disk to ensure that the volume group definition is removed from the disk. If
you encounter problems with the rmcluster command, see “Removal of the volume group
when the rmcluster command does not” on page 320 for information about how to
manually remove the volume group.

316 IBM PowerHA SystemMirror 7.1 for AIX


10.4.2 Repository disk replacement
The information to replace a repository disk is currently only available in the
/usr/es/sbin/cluster/README7.1.0.UPDATE file. However, the following information has been
provided to assist you in solving this problem:
1. If necessary, add a new disk and ensure that it is recognized by AIX. The maximum size
required is 10 GB. The disk must be zoned and masked to all cluster nodes.
2. Identify the current repository disk. You can use any of the following commands to obtain
this information:
lspv | grep caa_private
cltopinfo
lscluster -d
3. Stop cluster services on all nodes. Either bring resource groups offline or place them in an
unmanaged state.
4. Remove the CAA cluster by using the following command:
rmcluster -fn clustername
5. Verify that the AIX cluster is removed by running the following command in each node:
lscluster -m
6. If the CAA cluster is still present, run the following command in each node:
clusterconf -fu
7. Verify that the cluster repository is removed by using the lspv command. The repository
disk (see step 2) must not belong to any volume group.
8. Define a new repository disk by following the path: smitty sysmirror  Cluster Nodes
and Networks  Initial Cluster Setup (Typical)  Define Repository Disk and
Cluster IP Address.
9. Verify and synchronize the PowerHA cluster:
#smitty cm_ver_and_sync
10.Verify that the AIX cluster is recreated by running the following command:
#lscluster -m
11.Verify that the repository disk has changed by running the following command:
lspv | grep caa_private
12.Start cluster services on all nodes:
smitty cl_start

10.4.3 CAA cluster after the node restarts


In some cases, the CAA cluster disappears after a system reboot or halt. If you encounter this
situation, try the following solutions:
 Wait 10 minutes. If you have another node in your cluster, the clconfd daemon checks for
nodes that need to join or sync up. It wakes up every 10 minutes.
 If the previous method does not work, run the clusterconf command manually. This
solution works only if the system is aware of the repository disk location. You can check it
by running the lsattr -El cluster0 command.

Chapter 10. Troubleshooting PowerHA 7.1 317


See if clvdisk contains the repository disk UUID. Otherwise, you see the clusterconf error
message as shown in Example 10-10.

Example 10-10 The clusterconf error message


riyad:/ # clusterconf -v
_find_and_load_repos(): No repository candidate found.
leave_sinc: Could not get cluster disk names from cache file
/etc/cluster/clrepos_cache: No such file or directory
leave_sinc: Could not find cluster disk names.

 Manually define the repository disk by using the following command:


clusterconf -vr caa_private0
 If you know that the repository disk is available, and you know that your node is listed in
the configuration on the repository disk, use the -s flag on the clusterconf command to
do a search for it. This utility examines all locally visible hard disk drives to find the
repository disk.

10.4.4 Creation of the CAA cluster


You might encounter an error message about creating the CAA cluster when the clmigcheck
utility is run. You might also see such a message when trying to install PowerHA for the first
time or when creating a CAA cluster configuration. Depending on whether you are doing a
migration or a new configuration, you either see a problem in the clmigcheck.log file or on the
verification of your cluster.

One of the error messages that you see is “ERROR: Problems encountered creating the
cluster in AIX.” This message indicates a problem with creating the CAA cluster. The
clmigcheck program calls the mkcluster command to create the CAA cluster, which is what
you must look for in the logs.

To proceed with the troubleshooting, enable the syslog debugging as discussed in 10.2.1,
“The clmigcheck script” on page 308.

Incorrect entries in the /etc/filesystems file


When the CAA cluster is created, the cluster creates a caavg_private volume group and the
associated file systems for CAA. This information is kept in the /var/adm/ras/syslog.caa log
file. Any problems that you face when running the mkcluster command are also logged in the
/var/hacmp/clutils.log file.

If you encounter a problem when creating your cluster, check these log files to ensure that the
volume group and file systems are created without any errors.

318 IBM PowerHA SystemMirror 7.1 for AIX


Figure 10-9 shows the contents of caavg_private volume group.

# lsvg -l caavg_private
caavg_private:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
caalv_private1 boot 1 1 1 closed/syncd N/A
caalv_private2 boot 1 1 1 closed/syncd N/A
caalv_private3 boot 4 4 1 open/syncd N/A
fslv00 jfs2 4 4 1 open/syncd /clrepos_private1
fslv01 jfs2 4 4 1 closed/syncd /clrepos_private2
powerha_crlv boot 1 1 1 closed/syncd N/A
Figure 10-9 Contents of the caavg_private volume group

Figure 10-10 shows a crfs failure while creating the CAA cluster. This problem was corrected
by removing incorrect entries in the /etc/filesystems file. Likewise, problems can happen
when you already have the same logical volume name that must be used by the CAA cluster,
for example.

Sep 29 15:50:49 riyad user:info cluster[9437258]: stdout: caalv_private3


Sep 29 15:50:49 riyad user:info cluster[9437258]: stderr:
Sep 29 15:50:49 riyad user:info cluster[9437258]: cl_run_log_method:
'/usr/lib/cluster/clreposfs ' returned 1
Sep 29 15:50:49 riyad user:info cluster[9437258]: stdout:
Sep 29 15:50:49 riyad user:info cluster[9437258]: stderr: crfs:
/clrepos_private1 file system already exists '/usr/sbin/crfs -v jfs2 -m
/clrepos_private1 -g caavg_private -a options=dio -a logname=INLINE -a
size=256M' failed with rc=1
Sep 29 15:50:49 riyad user:err|error cluster[9437258]: cluster_repository_init:
create_clv failed
Sep 29 15:50:49 riyad user:info cluster[9437258]: cl_run_log_method:
'/usr/sbin/varyonvg -b -u caavg_private' returned 0
Figure 10-10 The syslog.caa entries after a failure during CAA creation

Tip: When you look at the syslog.caa file, focus on the AIX commands (such as mkvg, mklv,
and crfs) and their returned values. If you find non-zero return values, a problem exists.

Chapter 10. Troubleshooting PowerHA 7.1 319


10.4.5 Volume group name already in use
A volume group that is already in use can cause the error message discussed in 10.4.4,
“Creation of the CAA cluster” on page 318. When you encounter the error message, enable
syslog debugging. The /tmp/syslog.out log file has the entries shown in Figure 10-11.

Sep 23 11:46:09 chile user:info cluster[21037156]: cl_run_log_method:


'/usr/sbin/mkvg -f -y caavg_private -s 64 caa_private0' returned 1
Sep 23 11:46:09 chile user:info cluster[21037156]: stdout:
Sep 23 11:46:09 chile user:info cluster[21037156]: stderr: 0516-360
/usr/sbin/mkvg: The device name is already used; choose a different name.
Sep 23 11:46:09 chile user:err|error cluster[21037156]:
cluster_repository_init: create_cvg failed
Figure 10-11 Extract from the syslog.out file

You can see that the volume group creation failed because the name is already in use. This
problem can happen for several reasons. Ffor example, it can occur if the disk was previously
used as the CAA repository or the disk had the volume group descriptor area (VGDA)
information of other volume group in it.

Disk previously used by CAA volume group or third party


If the disk was previously used by CAA or AIX, you can recover from this situation by running
the following command:
rmcluster -r hdiskx

For the full sequence of steps, see 10.4.1, “Previously used repository disk for CAA” on
page 316.

If you find that the rmcluster command has not removed your CAA definition from the disk,
use the steps in the following section, “Removal of the volume group when the rmcluster
command does not.”

Removal of the volume group when the rmcluster command does not
In this situation, you must use the Logical Volume Manager (LVM) commands, which you can
do in one of two ways. The easiest method is to import the volume group, vary on the volume
group, and then reduce it so that the VGDA is removed from the disk. If this method does not
work, use the dd command to overwrite special areas of the disk.

Tip: Make sure that the data contained on the disk is not needed because usage of the
following steps destroys the volume group data on the disk.

Removing the VGDA from the disk


This method involves importing the volume group from the disk and reducing it from the
volume group to remove the VGDA information without losing the PVID. If you are able to
import the volume group, activate it by using the varyonvg command:
# varyonvg vgname

If the activation fails, run the exportvg command to remove the volume group definition from
the ODM. Then try to import it with a different name as follows:
# exportvg vgname
# importvg -y new-vgname hdiskx

320 IBM PowerHA SystemMirror 7.1 for AIX


If you cannot activate the imported volume group, use the reducevg command as shown in
Example 10-12.

reducevg -df test_vg caa_private0


0516-1246 rmlv: If caalv_private1 is the boot logical volume, please run 'chpv
-c <diskname>'
as root user to clear the boot record and avoid a potential boot
off an old boot image that may reside on the disk from which this
logical volume is moved/removed.
rmlv: Logical volume caalv_private1 is removed.
0516-1246 rmlv: If caalv_private2 is the boot logical volume, please run 'chpv
-c <diskname>'
as root user to clear the boot record and avoid a potential boot
off an old boot image that may reside on the disk from which this
logical volume is moved/removed.
rmlv: Logical volume caalv_private2 is removed.
Figure 10-12 The reducevg command

After you complete the forced reduction, check whether the disk no longer contains a volume
group by using the lqueryvg -Atp hdisk command.

Also verify whether any previous volume group definition is still being displayed on the other
nodes of your cluster by using the lspv command. If the lspv output shows the PVID with one
associated volume group, you can fix it by running the exportvg vgname command.

If experience any problems with this procedure, try a force overwrite of the disk as described
in “Overwriting the disk.”

Overwriting the disk


This method involves writing data to the top of the disk to overwrite the VGDA information and
effectively cleaning the disk, leaving it ready for use by other volume groups.

Attention: Only attempt this method if the rmcluster and reducevg procedures fail and if
AIX still has access to the disk. You can check this access by running the lquerypv -h
/dev/hdisk command.

Enter the following command:


# dd if=/dev/zero of=/dev/hdiskx bs=4 count=1

This command zeros only the part of the disk that contains the repository offset. Therefore,
you do not lose the PVID information.

In some cases, this procedure is not sufficient to resolve the problem. If you need to
completely overwrite the disk, run the following procedure:

Attention: This procedure overwrites the entire disk structure including the PVID. You
must follow the steps as shown to change the PVID if required during migration.

# dd if=/dev/zero of=/dev/hdiskn bs=512 count=9


# chdev -l hdiskn -a pv=yes
# rmdev -dl hdiskn
# cfgmgr

Chapter 10. Troubleshooting PowerHA 7.1 321


On any other node in the cluster, you must also update the disk:
# rmdev -dl hdiskn
# cfgmgr

Run the lspv command to check that the PVID is the same on both nodes. To ensure that you
have the real PVID, query the disk as follows:
# lquerypv -h /dev/hdiskn

Look for the PVID, which is in sector 80 as shown in Figure 10-13.

chile:/ # lquerypv -h /dev/hdisk3


00000000 C9C2D4C1 00000000 00000000 00000000 |................|
00000010 00000000 00000000 00000000 00000000 |................|
00000020 00000000 00000000 00000000 00000000 |................|
00000030 00000000 00000000 00000000 00000000 |................|
00000040 00000000 00000000 00000000 00000000 |................|
00000050 00000000 00000000 00000000 00000000 |................|
00000060 00000000 00000000 00000000 00000000 |................|
00000070 00000000 00000000 00000000 00000000 |................|
00000080 000FE401 68921CEA 00000000 00000000 |....h...........|
Figure 10-13 PVID from the lquerypv command

The PVID should match the lspv output as shown in Figure 10-14.

chile:/ # lspv
hdisk1 000fe4114cf8d1ce None
hdisk2 000fe40163c54011 None
hdisk3 000fe40168921cea None
hdisk4 000fe4114cf8d3a1 None
hdisk5 000fe4114cf8d441 None
hdisk6 000fe4114cf8d4d5 None
hdisk7 000fe4114cf8d579 None
hdisk8 000fe4114cf8d608 ny_datavg
hdisk0 000fe40140a5516a rootvg active
Figure 10-14 The lspv output showing PVID

10.4.6 Changed PVID of the repository disk


Your repository disk PVID might have changed because of a dd on the whole disk or a change
in the logical unit number (LUN). If this change happened and you must complete the
migration, follow the guidance in this section to change it.

322 IBM PowerHA SystemMirror 7.1 for AIX


If you are in a migration that has not yet been completed, change the PVID section in the
/var/clmigcheck/clmigcheck.txt file (Figure 10-15). You must change this file on every node
in your cluster.

CLUSTER_TYPE:STANDARD
CLUSTER_REPOSITORY_DISK:000fe40120e16405
CLUSTER_MULTICAST:NULL
Figure 10-15 Changing the PVID in the clmigcheck.txt file

If this is post migration and PowerHA is installed, you must also modify the HACMPsircol ODM
class (Figure 10-16) on all nodes in the cluster.

HACMPsircol:
name = "newyork_sircol"
id = 0
uuid = "0"
repository = "000fe4114cf8d258"
ip_address = ""
nodelist = "serbia,scotland,chile,"
backup_repository1 = ""
backup_repository2 = ""
Figure 10-16 The HACMPsircol ODM class

To modify the HACMPsircol ODM class, enter the following commands:


# odmget HACMPsircol > HACMPsircol.add
# vi HACMPsircol.add

Change the repository = "000fe4114cf8d258" line to your new PVID as follows:


# odmdelete -o HACMPsircol
# odmadd HACMPsircol.add

Then save the file.

10.4.7 The ‘Cluster services are not active’ message


After migration of PowerHA, if you notice that CAA cluster services are not running, you see
the “Cluster services are not active” message when you run the lscluster command.
You also notice that the CAA repository disk is not varied on.

You might be able to recover by recreating the CAA cluster from the last CAA configuration
(HACMPsircol class in ODM) as explained in the following steps:
1. Clear the CAA repository disk as explained in “Previously used repository disk for CAA” on
page 316.
2. Perform a synchronization or verification of the cluster. Upon synchronizing the cluster, the
mkcluster command is run to recreate the CAA cluster. However, if the problem still
persists, contact IBM support.

Chapter 10. Troubleshooting PowerHA 7.1 323


324 IBM PowerHA SystemMirror 7.1 for AIX
11

Chapter 11. Installing IBM Systems Director


and the PowerHA SystemMirror
plug-in
This chapter explains how to install IBM Systems Director Version 6.2. It also explains how to
install the PowerHA SystemMirror plug-in for the IBM Systems Director, and the necessary
agents on the client machines to be managed by Systems Director. For detailed planning,
prerequisites, and instructions, see Implementing IBM Systems Director 6.1, SG24-7694.

This chapter includes the following topics:


 Installing IBM Systems Director Version 6.2
 Installing the SystemMirror plug-in
 Installing the clients

© Copyright IBM Corp. 2011. All rights reserved. 325


11.1 Installing IBM Systems Director Version 6.2
Before you configure the cluster using the SystemMirror plug-in, you must install and
configure the IBM Systems Director. You can install the IBM Systems Director Server on AIX,
Linux, or Windows operating system. For quick reference, this section provides the installation
steps on AIX. See the information in the following topics in the IBM Systems Director
Information Center for details about installation on other operating systems:
 The “IBM Systems Director V6.2.x” topic for general information
http://publib.boulder.ibm.com/infocenter/director/v6r2x/index.jsp?topic=/com.ib
m.director.main.helps.doc/fqm0_main.html
 “Installing IBM Systems Director on the management server” topic for installation
information
http://publib.boulder.ibm.com/infocenter/director/v6r2x/index.jsp?topic=/com.ib
m.director.install.helps.doc/fqm0_t_installing.html

The following section, “Hardware requirements”, explains the installation requirements of IBM
Systems Director v6.2 on AIX.

11.1.1 Hardware requirements


See the “Hardware requirements for running IBM Systems Director Server” topic in the IBM
Systems Director Information Center for details about the recommended hardware
requirements for installing IBM Systems Director:
http://publib.boulder.ibm.com/infocenter/director/v6r2x/topic/com.ibm.director.pla
n.helps.doc/fqm0_r_hardware_requirements_for_running_ibm_systems_director_server.h
tml

Table 11-1 lists the hardware requirements for IBM Systems Director Server running on AIX
for a small configuration that has less than 500 managed systems.

Table 11-1 Hardware requirements for IBM Systems Director Server on AIX
Resource Requirement

CPU Two processors, IBM POWER5™, POWER6 or POWER7™, or for


partitioned systems:
 Entitlement = 1
 Uncapped Virtual processors = 4
 Weight = Default

Memory 3 GB

Disk storage 4 GB

File system requirement root = 1.2 GB


(during installation) /tmp = 2 GB
/opt = 4 GB

326 IBM PowerHA SystemMirror 7.1 for AIX


More information: Disk storage requirements for running the IBM Systems Director
Server are used by the /opt file system. Therefore, a total of 4 GB is required for the /opt
file system while installing IBM Systems Director and during run time.

For more details about hardware requirements, see the “Recommended hardware
requirements for IBM Systems Director Server running on AIX” topic in the IBM Systems
Director Information Center at:
http://publib.boulder.ibm.com/infocenter/director/v6r2x/index.jsp?topic=/com.ib
m.director.plan.helps.doc/fqm0_r_hardware_requirements_servers_running_aix.html

11.1.2 Installing IBM Systems Director on AIX


For the prerequisites and complete steps for installing IBM Systems Director, see the
following topics in the IBM Systems Director Information Center:
 “Preparing to install IBM Systems Director Server on AIX”
http://publib.boulder.ibm.com/infocenter/director/v6r2x/index.jsp?topic=/
com.ibm.director.install.helps.doc/fqm0_t_preparing_to_install_ibm_director_on_
aix.html
 “Installing IBM Systems Director Server on AIX,” which provides the complete installation
steps
http://publib.boulder.ibm.com/infocenter/director/v6r2x/index.jsp?topic=/com.ib
m.director.install.helps.doc/fqm0_t_installing_ibm_director_server_on_aix.html

The following steps summarize the process for installing IBM Systems Director on AIX:
1. Increase the file size limit:
ulimit -f 4194302 (or to unlimited)
2. Increase the number of file descriptors:
ulimit -n 4000
3. Verify the file system (/, /tmp and /opt) size as mentioned in Table 11-1 on page 326:
df -g / /tmp /opt
4. Download IBM Systems Director from the IBM Systems Director Downloads page at:
http://www.ibm.com/systems/management/director/downloads/
5. Extract the content:
gzip -cd <package_name> | tar -xvf -
where <package_name> is the file name of the download package.
6. Install the content by using the script in the extracted package:
./dirinstall.server

Chapter 11. Installing IBM Systems Director and the PowerHA SystemMirror plug-in 327
11.1.3 Configuring and activating IBM Systems Director
To configure and activate IBM Systems Director, follow these steps:
1. Configure IBM Systems Director by using the following script:
/opt/ibm/director/bin/configAgtMgr.sh

Agent password: The script prompts for an agent password for which you can
consider giving the host system root password or any other common password of your
choice. This password is used by IBM Systems Director for its internal communication
and does have any external impact.

2. Start IBM Systems Director:


/opt/ibm/director/bin/smstart
3. Monitor the activation process as shown in Figure 11-1. This process might take 2-3
minutes.

/opt/ibm/director/bin/smstatus -r
Inactive
Starting
Active
Figure 11-1 Activation status for IBM Systems Director

Some subsystems are added as part of the installation process as follows:


Subsystem Group PID Status
platform_agent 2752614 active
cimsys 3080288 active
Some process start automatically:
root 6553804 7995522 0 13:24:40 pts/0 0:00
/opt/ibm/director/jre/bin/java -Xverify:none -cp /opt/ibm/director/lwi/r
root 7340264 1 0 13:19:26 pts/2 3:14
/opt/ibm/director/jre/bin/java -Xms512m -Xmx2048m -Xdump:system:events=g
root 7471292 2949286 0 12:00:31 - 0:00
/opt/freeware/cimom/pegasus/bin/cimssys platform_agent
root 7536744 1 0 12:00:31 - 0:00
/opt/ibm/icc/cimom/bin/dirsnmpd
root 8061058 3604568 0 13:16:32 - 0:14
/var/opt/tivoli/ep/_jvm/jre/bin/java -Xmx384m -Xminf0.01 -Xmaxf0.4 -Dsun
4. Log in to IBM Systems Director by using the following address:
https://<hostname.domain.com or IP>:8422/ibm/console/logon.jsp
In this example, we use the following address:
https://indus74.in.ibm.com:8422/ibm/console/logon.jsp
5. On the welcome page (Figure 12-4 on page 335) that opens, log in using root credentials.

After completing the installation of IBM Systems Director, install the SystemMirror plug-in as
explained in the following section.

328 IBM PowerHA SystemMirror 7.1 for AIX


11.2 Installing the SystemMirror plug-in
The IBM Systems Director provides two sets of plug-ins:
 The SystemMirror server plug-in to be installed in the IBM Systems Director Server.
 The SystemMirror agent plug-in to be installed in the cluster nodes or the endpoints as
discovered by IBM Systems Director.

11.2.1 Installing the SystemMirror server plug-in


You must install the SystemMirror server plug-in in the IBM Systems Director Server.
Table 11-2 on page 329 outlines the installation steps for the SystemMirror server plug-in
depending on your operating system. You can find this table and more information about the
installation in the SystemMirror installation steps chapter in “Configuring AIX Clusters for High
Availability Using PowerHA SystemMirror for Systems Director,” which you can download
from:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101774

Table 11-2 Installation steps for the SystemMirror server plug-in


Operating Installation steps
system

AIX and Linux Graphical installation:


# chmod 700 IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin
# IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin
Textual Installation:
# chmod 700 IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin
# IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin -i console
Silent mode installation
Edit the installer.properties file.
# chmod 700 IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin
# IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin -i silent

Windows Graphical installation:


IBMSystemsDirector-PowerHA_SystemMirror-Windows.exe
Textual installation:
IBMSystemsDirector-PowerHA_SystemMirror-Windows.exe -i console
Silent installation:
First, edit the installer.properties file.
IBMSystemsDirector-PowerHA_SystemMirror-Windows.exe -i silent

export DISPLAY: export DISPLAY =<ip address of X Windows Server>:1 is required to


export the display of the server running the X Window System server to use the graphical
installation.

Verifying the installation of the SystemMirror plug-in


The interface plug-in of the subagent is loaded when the IBM System Director Server starts.
To check the installation, run the following command depending on your environment:
 AIX / Linux:
/opt/ibm/director/lwi/bin/lwiplugin.sh -status | grep mirror
 Windows:
C:/Program Files/IBM/Director/lwi/bin/lwiplugin.bat

Chapter 11. Installing IBM Systems Director and the PowerHA SystemMirror plug-in 329
Figure 11-2 shows the output of the plug-in status.

94:RESOLVED:com.ibm.director.power.ha.systemmirror.branding:7.1.0.1:com.ibm.director.power.ha.systemmirr
or.branding
95:ACTIVE:com.ibm.director.power.ha.systemmirror.common:7.1.0.1:com.ibm.director.power.ha.systemmirror.c
ommon
96:ACTIVE:com.ibm.director.power.ha.systemmirror.console:7.1.0.1:com.ibm.director.power.ha.systemmirror.
console
97:RESOLVED:com.ibm.director.power.ha.systemmirror.helps.doc:7.1.0.1:com.ibm.director.power.ha.systemmir
ror.helps.doc
98:INSTALLED:com.ibm.director.power.ha.systemmirror.server.fragment:7.1.0.0:com.ibm.director.power.ha.sy
stemmirror.server.fragment
99:ACTIVE:com.ibm.director.power.ha.systemmirror.server:7.1.0.1:com.ibm.director.power.ha.systemmirror.s
erver

Figure 11-2 Output of the plug-in status command

If the subagent interface plug-in shows the RESOLVED status instead of the ACTIVE status,
attempt to start the subagent. Enter the following commands by using the lwiplugin.sh script
on AIX and Linux or the lwiplugin.bat script on Windows and the plug-in number (which is
94):
 AIX and Linux
/opt/ibm/director/agent/bin/lwiplugin.sh -start 94
 Windows
C:/Program Files/IBM/Director/lwi/bin/lwiplugin.bat -start 94

If Systems Director was active during installation of the plug-in, you must stop it and restart it
as follows:
1. Stop the IBM Systems Director Server:
# /opt/ibm/director/bin/smstop
2. Start the IBM Systems Director Server:
# /opt/ibm/director/bin/smstart
3. Monitor the startup process:
# /opt/ibm/director/bin/smstatus -r
Inactive
Starting
Active *** (the "Active" status can take a long time)

11.2.2 Installing the SystemMirror agent plug-in in the cluster nodes


Install the cluster.es.director.agent file set by using SMIT. This file set is provided with the
base PowerHA SystemMirror installable images.

More information: See the SystemMirror agent installation section in Configuring AIX
Clusters for High Availability Using PowerHA SystemMirror for Systems Director paper at:
http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101774

See also PowerHA SystemMirror for IBM Systems Director, SC23-6763.

330 IBM PowerHA SystemMirror 7.1 for AIX


11.3 Installing the clients
You must perform the steps in the following sections in each node that is going to be
managed by the PowerHA SystemMirror plug-in for IBM Systems Director. This topic includes
the following sections:
 Installing the common agent
 Installing the PowerHA SystemMirror agent

11.3.1 Installing the common agent


Perform these steps on each node that is going to be managed by the IBM Systems Director
Server:
1. Extract the SysDir6_2_Common_Agent_AIX.jar file set:
# /usr/java5/bin/jar -xvf SysDir6_2_Common_Agent_AIX.jar
2. Give execution permission to the repository/dir6.2_common_agent_aix.sh file:
# chmod +x repository/dir6.2_common_agent_aix.sh
3. Execute the repository/dir6.2_common_agent_aix.sh file:
# ./repository/dir6.2_common_agent_aix.sh
Some subsystems are added as part of the installation process:
platform_agent 3211374 active
cimsys 2621604 active
Some process start automatically:
root 421934 1 0 15:55:30 - 0:00
/opt/ibm/icc/cimom/bin/dirsnmpd
root 442376 1 0 15:55:40 - 0:00 /usr/bin/cimlistener
root 458910 1 0 15:55:31 - 0:00
/opt/freeware/cimom/pegasus/bin/CIM_diagd
root 516216 204950 0 15:55:29 - 0:00
/opt/freeware/cimom/pegasus/bin/cimssys platform_agent
root 524366 1 0 15:55:29 - 0:00 ./slp_srvreg -D
root 581780 1 0 15:55:37 - 0:04 [cimserve]
root 626740 204950 0 15:55:29 - 0:00
/opt/freeware/cimom/pegasus/bin/cimssys cimsys
root 630862 1 0 15:55:29 - 0:00
/opt/ibm/director/cimom/bin/tier1slp

Chapter 11. Installing IBM Systems Director and the PowerHA SystemMirror plug-in 331
11.3.2 Installing the PowerHA SystemMirror agent
To install the PowerHA SystemMirror agent on the nodes, follow these steps:
1. Install the cluster.es.director.agent.rte file set:
# smitty install_latest
2. Stop the common agent:
# stopsrc -s platform_agent
# stopsrc -s cimsys
3. Start the common agent:
# startsrc -s platform_agent

Tip: The cimsys subsystem starts along with the platform_agent subsystem.

332 IBM PowerHA SystemMirror 7.1 for AIX


12

Chapter 12. Creating and managing a cluster


using IBM Systems Director
The SystemMirror plug-in provided by IBM Systems Director is used to configure and manage
the PowerHA cluster. This plug-in provides a state-of-the-art interface and a command-line
interface (CLI) for cluster configuration. It includes wizards to help you create and manage the
cluster and the resource groups. The plug-in also helps in seamless integration of Smart
Assists and third-party application support.

This chapter explains how to create and manage the PowerHA SystemMirror cluster with IBM
Systems Director.

This chapter includes the following topics:


 Creating a cluster with the SystemMirror plug-in wizard
 Creating a cluster with the SystemMirror plug-in CLI
 Performing cluster management
 Performing cluster management with the SystemMirror plug-in CLI
 Creating a resource group with the SystemMirror plug-in GUI wizard
 Resource group management using the SystemMirror plug-in wizard
 Managing a resource group with the SystemMirror plug-in CLI
 Verifying and synchronizing a configuration with the GUI
 Verifying and synchronizing with the CLI
 Performing cluster monitoring with the SystemMirror plug-in

© Copyright IBM Corp. 2011. All rights reserved. 333


12.1 Creating a cluster
You can create a cluster by using the wizard for the SystemMirror plug-in or by using the CLI
commands for the SystemMirror plug-in. This topic explains how to use both methods.

12.1.1 Creating a cluster with the SystemMirror plug-in wizard


To create the cluster by using the GUI wizard of the SystemMirror plug-in, follow these steps.
1. Go to your IBM Systems Director server.
2. On the login page (Figure 12-1), log in to IBM Systems Director with your user ID and
password.

Figure 12-1 Systems Director login console

3. In the IBM Systems Director console, in the left navigation pane, expand Availability and
select PowerHA SystemMirror (Figure 12-2).

Figure 12-2 Selecting the PowerHA SystemMirror link in IBM Systems Director

334 IBM PowerHA SystemMirror 7.1 for AIX


4. In the right pane, under Cluster Management, click Create Cluster (Figure 12-3).

Figure 12-3 The Create Cluster link under Cluster Management

5. Starting with the Create Cluster Wizard, follow the wizard panes to create the cluster.
a. In the Welcome pane (Figure 12-4), click Next.

Figure 12-4 Create Cluster Wizard

Chapter 12. Creating and managing a cluster using IBM Systems Director 335
b. In the Name the cluster pane (Figure 12-5), in the Cluster name field, provide a name
for the cluster. Click Next.

Figure 12-5 Entering the cluster name

c. In the Choose nodes pane (Figure 12-6), select the host names of the nodes.

Figure 12-6 Selecting the cluster nodes

336 IBM PowerHA SystemMirror 7.1 for AIX


Common storage: The cluster nodes must have the common storage for the
repository disk. To verify the common storage, in the Choose nodes window, click
the Common storage button. The Common storage window (Figure 12-7) opens
showing the common disks.

Figure 12-7 Verifying common storage availability for the repository disk

d. In the Configure nodes pane (Figure 12-8), set the controlling node. The controlling
node in the cluster is considered to be the primary or home node. Click Next.

Figure 12-8 Setting the controlling node

Chapter 12. Creating and managing a cluster using IBM Systems Director 337
e. In the Choose repositories pane (Figure 12-9), choose the storage disk that is shared
among all nodes in the cluster to use as the common storage repository. Click Next.

Figure 12-9 Selecting the repository disk

f. In the Configure security pane (Figure 12-10), specify the security details to secure
communication within the cluster.

Figure 12-10 Configuring the cluster security configuration

338 IBM PowerHA SystemMirror 7.1 for AIX


g. In the Summary pane (Figure 12-11), verify the configuration details.

Figure 12-11 Summary pane

6. Verify the cluster creation in the AIX cluster nodes by using either of the following
commands:
– The CAA command:
/usr/sbin/lscluster -m
– The PowerHA command:
/usr/es/sbin/cluster/utilities/cltopinfo

12.1.2 Creating a cluster with the SystemMirror plug-in CLI


IBM Systems Director provides a CLI to monitor and manage the system. This section
explains how to create a cluster by using the SystemMirror plug-in CLI.

Chapter 12. Creating and managing a cluster using IBM Systems Director 339
Overview of the CLI
The CLI is executed by using a general-purpose smcli command. To list the available CLI
commands for managing the cluster, run the smcli lsbundle command as shown in
Figure 12-12.

# smcli lsbundle | grep sysmirror


sysmirror/help
sysmirror/lsac
sysmirror/lsam
sysmirror/lsappctl
sysmirror/lsappmon
sysmirror/lscl
sysmirror/lscluster
sysmirror/lsdependency
sysmirror/lsdp
sysmirror/lsfc
sysmirror/lsfilecollection
sysmirror/lsif
sysmirror/lsinterface
sysmirror/lslg
sysmirror/lslog
sysmirror/lsmd
sysmirror/lsmethod
.....
.....
Figure 12-12 CLI commands specific to SystemMirror

You can retrieve help information for the commands (Figure 12-12) as shown in Figure 12-13.

# smcli lscluster --help

smcli sysmirror/lscluster {-h|-?|--help} \


[-v|--verbose]
smcli sysmirror/lscluster [-v|--verbose] \
[<CLUSTER>[,<CLUSTER#2>,...]]

Command Alias: lscl


Figure 12-13 CLI help option

Creating a cluster with the CLI


Before you create a cluster, ensure that you have all the required details to create the cluster:
 Cluster nodes
 Persistent IP (if any)
 Repository disk
 Controlling node
 Security options (if any)

To verify the availability of the mkcluster command, you can use the smcli lsbundle
command in IBM Systems Director as shown in Figure 12-12.

340 IBM PowerHA SystemMirror 7.1 for AIX


To create a cluster, issue the smcli mkcluster command from the IBM Systems Director
Server as shown in Example 12-1.

Example 12-1 Creating a cluster with the smcli mkcluster CLI command
smcli mkcluster -i 224.0.0.0 \
-r hdisk3 \
–n nodeA.xy.ibm.com,nodeB.xy.ibm.com \
DB2_Cluster

You can use the -h option to list the commands that are available (Figure 12-14).

# smcli mkcluster -h
smcli sysmirror/mkcluster {-h|-?|--help} [-v|--verbose]
smcli sysmirror/mkcluster [{-i|--cluster_ip} <multicast_address>] \
[{-S|--fc_sync_interval} <##>] \
[{-s|--rg_settling_time} <##>] \
[{-e|--max_event_time} <##>] \
[{-R|--max_rg_processing_time} <##>] \
[{-c|--controlling_node} <node>] \
[{-d|--shared_disks} <DISK>[,<DISK#2>,...] ] \
{-r|--repository} <disk> \
{-n|--nodes} <NODE>[, <NODE#2>,...] \
[<cluster_name>]
Figure 12-14 The mkcluster -h command to list the available commands

To verify that the cluster has been created, you can use the smcli lscluster command.

Command help: To assistance with using the commands, you can use either of the
following help options:
smcli <command name> -help --verbose
smcli <command name> -h -v

12.2 Performing cluster management


You can perform cluster management by using the GUI wizard for the SystemMirror plug-in or
by using the CLI commands for the SystemMirror plug-in. This topic explains how to use both
methods.

12.2.1 Performing cluster management with the SystemMirror plug-in GUI


wizard
IBM Systems Director provides GUI wizards to manage the network, storage, and snapshots
of a cluster. IBM Systems Director also provides functionalities to add nodes, view cluster
services status changes, review reports, and verify and synchronize operations. The following
sections guide you through these functionalities.

Chapter 12. Creating and managing a cluster using IBM Systems Director 341
Accessing the Cluster Management Wizard
To access the Cluster Management Wizard, follow these steps:
1. In the IBM Systems Director console, expand Availability and select PowerHA
SystemMirror (Figure 12-3 on page 335).
2. In the right pane, under Cluster Management, click the Manage Clusters link
(Figure 12-15).

Figure 12-15 Manage cluster

342 IBM PowerHA SystemMirror 7.1 for AIX


Cluster management functionality
This section describes the cluster management functionality:
 Cluster Management window (Figure 12-16)
After clicking the Manage Clusters link in the IBM Systems Director console, you see the
Cluster Management pane. This pane contains a series of tabs to help you manage your
cluster.

Figure 12-16 Cluster Management pane

Chapter 12. Creating and managing a cluster using IBM Systems Director 343
 Edit Advanced Properties button
Under the General tab, you can click the Edit Advanced Properties button to modify the
cluster properties. For example, you can change the controlling node as shown in
Figure 12-17.

Figure 12-17 Editing the advanced properties, such as the controlling node

 Add Network tab


Under the Networks tab, you can click the Add Network button to add a network as
shown in Figure 12-18.

Figure 12-18 Add Network function

344 IBM PowerHA SystemMirror 7.1 for AIX


 Storage management
On the Storage tab, you can perform disk management tasks such as converting the
hdisk into VPATH. From the View drop-down list, select Disks to modify the disk properties
as shown in Figure 12-19.

Figure 12-19 Cluster storage management

 Capture Snapshot
You can capture and manage snapshots through the Snapshots tab. To capture a new
snapshot, click the Create button on the Snapshots tab as shown in Figure 12-20.

Figure 12-20 Capture Snapshot function

Chapter 12. Creating and managing a cluster using IBM Systems Director 345
 File collection and logs management
You can manage file collection and logs on the Additional Properties tab. From the View
drop-down list, select either File Collections or Log files as shown in Figure 12-21.

Figure 12-21 Additional Properties tab: File Collections and Log files options

 Creating a file collection


On the Additional Properties tab, when you select File Collections from the View
drop-down list and click the Create button, you can create a file collection as shown in
Figure 12-22.

Figure 12-22 Creating a file collection

346 IBM PowerHA SystemMirror 7.1 for AIX


 Collect log files button
On the Additional Properties tab, when you select Log files from the View drop-down list
and click the Collect log files button, you can collect log files as shown in Figure 12-23.

Figure 12-23 Collect log files

The Systems Director plug-in also provides a CLI to manage the cluster. The following section
explains the available CLI commands and how you can find help for each of these commands.

12.2.2 Performing cluster management with the SystemMirror plug-in CLI


The SystemMirror plug-in provides a CLI to most of the cluster management functions. For a
list o the available functions, use the following command:
smcli lsbundle | grep sysmirror

A few of the CLI commands are provided as follows for a quick reference:
 Snapshot creation
You can use the smcli mksnapshot command to create a snapshot. Figure 12-24 on
page 348 shows the command for obtaining detailed help about this command.

mkss: mkss is the alias for the mksnapshot command.

Chapter 12. Creating and managing a cluster using IBM Systems Director 347
# smcli mkss -h -v

smcli sysmirror/mksnapshot [-h|-?|--help] [-v|--verbose]


smcli sysmirror/mksnapshot {-c|--cluster} <CLUSTER> \
{-d|--description} "<DESCRIPTION>" \
[{-M|--methods} <METHOD>[,<METHOD#2>,...] ] \
[-s|--save_logs] \
<snapshot_name>
Figure 12-24 Help details for the mksnapshot command

Figure 12-2 shows usage of the smcli mkss command.

Example 12-2 Usage of the mksnapshot command


smcli mkss -c selma04_cluster -d "Selma04 cluster snapshot taken on Sept2010"
selma04_sep10_ss

Verify the snapshot by using the smcli lsss command as shown in Example 12-3.

Example 12-3 Verifying the snapshot


# smcli lsss -c selma04_cluster selma04_sep10_ss
NAME="selma04_sep10_ss"
DESCRIPTION="Selma04 cluster snapshot taken on Sept2010"
METHODS=""
SAVE_LOGS="false"
CAPTURE_DATE="Sep 29 09:47"
NODE="selma03"

 File collection
You can use the smcli mkfilecollection command to create a file collection as shown in
Example 12-4. A file collection helps to keep the files and directories synchronized on all
nodes in the cluster.

Example 12-4 File collection


# smcli mkfilecollection -c selma04_cluster -C -d "File Collection for the
selma04 cluster" -F /home selma04_file_collection

# smcli lsfilecollection -c selma04_cluster selma04_file_collection


NAME="selma04_file_collection"
DESCRIPTION="File Collection for the selma04 cluster"
FILE="/home"
SIZE="256"

 Log files
You can use the smcli lslog command (Example 12-5) to list the available log files in the
cluster. Then you can use the smcli vlog command to view the log files.

Example 12-5 Log file management


# smcli lslog -c selma04_cluster
Node: selma03
=============
autoverify.log

348 IBM PowerHA SystemMirror 7.1 for AIX


cl2siteconfig_assist.log
cl_testtool.log
clavan.log
clcomd.log
clcomddiag.log
....
....(output truncated)

# smcli vlog -c selma04_cluster -n selma03 -T 4 clverify.log


Collector succeeded on node selma03 (31610 bytes)
Collector succeeded on node selma03 (4250 bytes)
Collector succeeded on node selma03 (26 bytes)
selma03 0

Modification functionality: At the time of writing this IBM Redbooks publication, an edit
or modification CLI command, such as to modify the controlling node, is not available for its
initial release. Therefore, use the GUI wizards for the modification functionality.

12.3 Creating a resource group with the SystemMirror plug-in


GUI wizard
You can configure the resource group by using the Resource Group Wizard as follows:
1. Log in to IBM Systems Director.
2. In the left navigation area, expand Availability and select PowerHA SystemMirror
(Figure 12-25).
3. In the right pane, under Resource Group Management, click Add a resource group link.

Figure 12-25 Resource group management

Chapter 12. Creating and managing a cluster using IBM Systems Director 349
4. On the Clusters tab, click the Actions list and select Add Resource Group
(Figure 12-26). Then select the cluster node, and click the Action button.

Alternative: You can select the resource group configuration wizard by selecting the
cluster nodes, as shown in Figure 12-26.

Figure 12-26 Adding a resource group

5. In the Choose a cluster pane (Figure 12-27), choose the cluster where the resource group.
Notice that this step is highlighted under welcome in the left pane.

Figure 12-27 Choose the cluster for the resource group configuration

You can now choose to create either a custom resource group or a predefined resource group
as explained in 12.3.1, “Creating a custom resource group” on page 351, and 12.3.2,
“Creating a predefined resource group” on page 353.

350 IBM PowerHA SystemMirror 7.1 for AIX


12.3.1 Creating a custom resource group
To create a customer resource group, follow these steps:
1. In the Add a resource group pane (Figure 12-28), select the Create a custom resource
group option, enter a resource group name, and click Next.

Figure 12-28 Adding a resource group

2. In the Choose nodes pane (Figure 12-29), select the nodes for which you want to
configure the resource group.

Figure 12-29 Selecting the nodes for configuring a resource group

Chapter 12. Creating and managing a cluster using IBM Systems Director 351
3. In the Choose policies and attributes pane (Figure 12-30), select the policies to add to the
resource group.

Figure 12-30 Selecting the policies and attributes

4. In the Choose resources pane (Figure 12-31), select the shared resources to define for
the resource group.

Figure 12-31 Selecting the shared resources

352 IBM PowerHA SystemMirror 7.1 for AIX


5. In the Summary pane (Figure 12-32), review the settings and click the Finish button to
create the resource group.

Figure 12-32 Summary pane of the Resource Creation wizard

12.3.2 Creating a predefined resource group


For a set of applications, such as IBM SAP, WebSphere®, DB2, HTTP Server, and Tivoli
Directory Server, the SystemMirror plug-in facilitates the process of creating predefined
resource groups.

To configure the predefined resource groups, follow these steps:


1. In the Add a resource group pane (Figure 12-33 on page 354), select the Create
predefined resource groups for one of the following discovered applications radio
button. Then select the application for which the resource group is to be configured.

Application list: Only the applications installed in the cluster nodes are displayed
under the predefined resource group list.

Chapter 12. Creating and managing a cluster using IBM Systems Director 353
Figure 12-33 Predefined resource group configuration

2. In the Choose components pane, for the predefined resource group, select the
components of the application to create the resource group. In the example shown in
Figure 12-34, the Tivoli Director Server component is selected. Each component already
has the predefined properties such as the primary node and takeover node.
Modify the properties per your configuration and requirements. Then create the resource
group.

Figure 12-34 Application components

354 IBM PowerHA SystemMirror 7.1 for AIX


12.3.3 Verifying the creation of a resource group
To verify the creation of a resource group, follow these steps:
1. In the right pane, under Cluster Management, click the Manage Clusters link
(Figure 12-15 on page 342).
2. Click the Resource Groups tab (Figure 12-35).

Figure 12-35 Resource Groups tab

3. Enter the following base SystemMirror command to verify that the resource group has
been created:
/usr/es/sbin/cluster/utilities/clshowres

12.4 Managing a resource group


You can manage a resource group by using the SystemMirror plug-in wizard or the
SystemMirror plug-in CLI commands. This topic explains how to use both methods.

12.4.1 Resource group management using the SystemMirror plug-in wizard


The SystemMirror plug-in wizard has simplified resource group management with the addition
of the following functionalities:
 Checking the status of a resource group
 Moving a resource group across nodes
 Creating dependencies

Accessing the resource group management wizard


To access the Resource Group Management wizard, follow these steps:
1. Log in to IBM Systems Director.
2. In the left pane, expand Availability and select PowerHA SystemMirror (Figure 12-36 on
page 356).

Chapter 12. Creating and managing a cluster using IBM Systems Director 355
3. In the right pane, under Resource Group Management, select Manage Resource Groups
(Figure 12-36).

Figure 12-36 Resource group management link

The Resource Group Management wizard opens as in Figure 12-37. Alternatively, you can
access the Resource Group Management wizard by selecting Manage Cluster under Cluster
Management (Figure 12-36).

To access the Cluster and Resource Group Management wizard, select the Resource
Groups tab as shown in Figure 12-37.

Figure 12-37 Resource Group Management tab

356 IBM PowerHA SystemMirror 7.1 for AIX


Resource group management functionality
The Resource Group Management wizard includes the following functions:
 Create Dependency function
a. Select the Clusters button to see the resource groups defined under the cluster.
b. Click the Action list and select Create Dependency (as shown in Figure 12-38).
Alternatively, right-click a cluster name and select Create Dependency.

Figure 12-38 Selecting the Create Dependency function

c. In the Parent-child window (Figure 12-39), select the dependency type to configure the
dependencies.

Figure 12-39 Parent-child window

Chapter 12. Creating and managing a cluster using IBM Systems Director 357
 Resource group removal
Right-click the selected resource group, and click Remove to remove the resource group
as shown in Figure 12-40.

Figure 12-40 Cluster and Resource Group Management pane

 Application Availability and Configuration reports


The Application Availability and Configuration reports show the configuration details of the
resource group. The output of these reports is similar to the output produced by the
clshowres command in the base PowerHA installation. You can also see the status of the
application. To access these reports, right-click a resource group name, select Reports
and then select Application Availability or Configuration as shown in Figure 12-41.

Figure 12-41 Application Monitors

358 IBM PowerHA SystemMirror 7.1 for AIX


 Resource group status change
To view move, online, and offline status changes, right-click a resource group name and
select Advanced. Then select the option you need as shown in Figure 12-42.

Figure 12-42 viewing a status change

12.4.2 Managing a resource group with the SystemMirror plug-in CLI


Similar to the CLI commands for cluster creation and management, a set of CLI commands
are provided for resource group management. To list the available CLI commands for
managing the cluster, run the smcli lsbundle command (Figure 12-12 on page 340).

The following commands are specific to resource groups:


 To remove the resource group in the controlling node:
sysmirror/rmresgrp
 To start the resource group in online state:
sysmirror/startresgrp
 To stop the resource group to an offline state:
sysmirror/stopresgrp
 To move the resource group to an online state:
sysmirror/moveresgrp
 To list all the configured resource groups:
sysmirror/lsresgrp
If the resource group name is used along with this command, it provides the details of the
resource group.

Chapter 12. Creating and managing a cluster using IBM Systems Director 359
Examples of CLI command usage
This section shows examples using the CLI commands for resource group management.

To list the resource groups, use the following command as shown in Example 12-6:
smcli lsresgrp -c <cluster name>

Example 12-6 The smcli lsresgrp command


# smcli lsresgrp -c selma04_cluster
myRG
RG01_selma03
RG02_selma03
RG03_selma04
RG04_selma04_1
RG05_selma03_04
RG06_selma03_04
RG_dhe

To remove the resource group, use the following command as shown in Example 12-7:
smcli rmresgrp -c <cluster name> -C <RG_name>

Example 12-7 The smcli rmresgrp command using the -C option to confirm the removal operation
# smcli rmresgrp -c selma04_cluster Test_AltRG

Removing this resource group will cause all user-defined PowerHA information
to be DELETED.

Removing objects is something which is not easily reversed, and therefore


requires confirmation. If you are sure that you want to proceed with this
removal operation, re-run the command using the "--confirm" or "-C" option.

Consider creating a snapshot of the current cluster configuration first,


though, since restoring a snapshot will be the only way to reverse any
deletions.

12.5 Verifying and synchronizing a configuration


You can verify and synchronize a cluster by using the wizard for the SystemMirror plug-in or
by using the CLI commands for the SystemMirror plug-in. This topic explains how to use both
methods.

12.5.1 Verifying and synchronizing a configuration with the GUI


To verify and synchronize the configuration by using the Synchronization and Verification
function of the SystemMirror plug-in, follow these steps:
1. Log in to IBM Systems Director.
2. Expand Availability and select PowerHA SystemMirror as shown in Figure 12-4 on
page 335.
3. Under Cluster Management, select the Manage Clusters link.

360 IBM PowerHA SystemMirror 7.1 for AIX


4. In the Cluster and Resource Group Management wizard, select the cluster for which you
want to perform the synchronize and verification function. Then select the Action button or
right-click the cluster to access the Verify and Synchronize option as shown in
Figure 12-43.

Figure 12-43 Cluster management option list

5. In the Verify and Synchronize pane (Figure 12-44), select whether you want to
synchronize the entire configuration, only the unsynchronized changes, or verify. Then
click OK.

Figure 12-44 Verify and Synchronize window

Chapter 12. Creating and managing a cluster using IBM Systems Director 361
6. Optional: Undo the changes to the configuration after synchronization.
a. To access this option, in the Cluster and Resource Group Management wizard, on the
Clusters tab, select the cluster for which you want to perform the synchronize and
verification function (Figure 12-43 on page 361).
b. As shown in (Figure 12-45), select the Recovery  Undo local changes of
configuration.

Figure 12-45 Recovering the configuration option

c. When you see the Undo Local Changes of the Configuration message (Figure 12-46),
click OK.

Figure 12-46 Undo changes message window

Snapshot for the undo changes option: The undo changes option creates a
snapshot before it deletes the configuration since the last synchronization.

362 IBM PowerHA SystemMirror 7.1 for AIX


12.5.2 Verifying and synchronizing with the CLI
This section shows examples of performing cluster verification and synchronization by using
the CLI functionality:
 Synchronization
You can use the synccluster command to verify and synchronize the cluster. This
command copies the cluster configuration from the controlling node of the specified cluster
to each of the other nodes in the cluster.
The help option is available by using the smcli synccluster -h -v command as shown in
Example 12-8. Here you see options such as to perform a verification or synchronization
(see Example 12-9).

Example 12-8 The help option of the smcli synccluster command


# smcli sysmirror/synccluster -h -v

smcli sysmirror/synccluster {-h|-?|--help} [-v|--verbose]


smcli sysmirror/synccluster [-n|--no_verification}] \
<CLUSTER>
smcli sysmirror/synccluster [-x|--fix_errors}] \
[-C|--changes_only}] \
[-t|--custom_tests_only}] \
[{-M|--methods} <METHOD>[,<METHOD#2>,...] ] \
[{-e|--maximum_errors} <##>] \
[-F|--force] \
[{-l|--logfile} <full_path_to_file>] \
<CLUSTER>

Command Alias: sycl


.....
.....
<output truncated>

Example 12-9 shows how to synchronize cluster changes and to log the output in its own
specific log file.

Example 12-9 smcli synccluster changes only with the log file option
# smcli synccluster -C -l /tmp/sync.log selma04_cluster

 Undo changes
To restore the cluster configuration back to the configuration after any synchronization,
use the smcli undochanges command. This operation restores the cluster configuration
from the active configuration database. Typically, this command has the effect of
discarding any unsynchronized changes.
The help option is available by using the smcli undochanges -h -v command as shown in
Example 12-10.

Example 12-10 The help option for the smcli undochanges command
# smcli undochanges -h -v

smcli sysmirror/undochanges {-h|-?|--help} [-v|--verbose]


smcli sysmirror/undochanges <CLUSTER>

Chapter 12. Creating and managing a cluster using IBM Systems Director 363
Command Alias: undo

-h|-?|--help
Requests help for this command.
-v|--verbose
Requests maximum details in the displayed information.
<CLUSTER> The label of a cluster to perform this operation on.
...
<output truncated >

12.6 Performing cluster monitoring with the SystemMirror


plug-in
This topic explains how to monitor the status of the cluster and the resource group before and
while the cluster services are active. It also covers problem determination steps and how to
collect log files to analyze cluster issues.

12.6.1 Monitoring cluster activities before starting a cluster


This section explains the features you can use to monitor for cluster activities before starting
the cluster:
 Topology view
After the cluster and its resource groups are configured, select the topology view to
understand the overall status of cluster and its configuration:
a. Log in to IBM Systems Director.
b. Expand Availability and select PowerHA SystemMirror as shown in Figure 12-4 on
page 335.
c. In the right pane, select the cluster to be monitored and click Actions. Select Map
View (Figure 12-47) to access the Map view of the cluster configuration.

Figure 12-47 Map view of cluster configuration

364 IBM PowerHA SystemMirror 7.1 for AIX


Map view: The map view is available for resource configuration. As shown in
Figure 12-47 on page 364, select the Resource Groups tab. Click Action, and click
Map View to see the map view of the resource group configuration as shown in
Figure 12-48.

Test_AhRG

myRG RG_test_NChg
_testinggg

RG_testing11
RG_testing9

RG01_selma03
RG_testing6

selma_04_cluster

RG_testing2
RG05_selma03_04

RG_TEST_4
RG06_selma03_04

Figure 12-48 Map view of resource group configuration

Chapter 12. Creating and managing a cluster using IBM Systems Director 365
 Cluster subsystem services status:
You can view the status of PowerHA services, such as the clcomd subsystem, by using the
Status feature. To access this feature, select the cluster for which the service status is to
be viewed. Click the Action button and select Reports  Status.
You now see the cluster service status details, similar to the example in Figure 12-49.

Figure 12-49 Cluster Services status

 Cluster Configuration Report


Before starting the cluster services, access the cluster configuration report. Select the
cluster for which the configuration report is to be viewed. Click the Action button and
select Reports, which shows the Cluster Configuration Report page (Figure 12-50).

Figure 12-50 Cluster Configuration Report

366 IBM PowerHA SystemMirror 7.1 for AIX


You can also view the Cluster Topology Configuration Report by using the following
command:
/usr/es/sbin/cluster/utilities/cltopinfo
Then select the cluster, click the Action button, and select Reports  Configuration.
You see the results in a format similar to the example in Figure 12-51.

Figure 12-51 Cluster Topology Configuration Report

Similarly you can view the configuration report for the resource group as shown in
Figure 12-52. On the Resource Groups tab, select the resource group for which you want
to view the configuration. Then click the Action button and select Reports.

Figure 12-52 Resource Group Configuration Report

Chapter 12. Creating and managing a cluster using IBM Systems Director 367
 Application monitoring
To locate the details of the application monitors that are configured and assigned to a
resource group, select the cluster. Click the Action button and select Reports 
Applications. Figure 12-53 shows the status of the application monitoring.

Figure 12-53 Application monitoring status

Similarly you can view the configuration report for networks and interfaces by selecting the
cluster, clicking the Action button, and selecting Reports  Networks and Interfaces.

12.6.2 Monitoring an active cluster


When the cluster service is active, to see the status of the resource group, select the cluster
for which the status is to be viewed. Click the Action button and select Report  Event
Summary. You can now access the online status of the resource group and events summary
as shown in Figure 12-54.

Figure 12-54 Resource group online status

368 IBM PowerHA SystemMirror 7.1 for AIX


12.6.3 Recovering from cluster configuration issues
To recover from cluster configuration issues, such as recovering from an event failure and
undoing local changes, consider the following tips:
 Getting the proper GUI
Select the cluster and click the Actions button. Then select Recovery and choose the
appropriate action as shown in Figure 12-55.

Figure 12-55 Recovery options

 Releasing cluster modification locks


After you issue the release of the cluster modification locks, you see a message similar to
the one shown in Figure 12-56. Before you perform the operation, save a snapshot of the
cluster as indicated in the message.

Figure 12-56 Release cluster modification locks

Chapter 12. Creating and managing a cluster using IBM Systems Director 369
 Recovering from an event failure
After you issue a cluster recover from event failure, you see a message similar to the one
shown in Figure 12-57. Verify that you have addressed all problems that led to the error
before continuing with the operation.

Figure 12-57 Recovery from an event failure

 Collecting problem determination data


To collect problem determination data, select the Turn on debugging option and Collect
the RSCT log files (Figure 12-58).

Figure 12-58 Collect Problem Determination Data window

 Undoing local changes of a configuration


To undo local changes of a configuration, see 12.5.1, “Verifying and synchronizing a
configuration with the GUI” on page 360.

370 IBM PowerHA SystemMirror 7.1 for AIX


13

Chapter 13. Disaster recovery using DS8700


Global Mirror
This chapter explains how to configure disaster recovery based on IBM PowerHA
SystemMirror for AIX Enterprise Edition using IBM System Storage DS8700 Global Mirror as
a replicated resource. This support was added in version 6.1 with service pack 3 (SP3).

This chapter includes the following topics:


 Planning for Global Mirror
 Installing the DSCLI client software
 Scenario description
 Configuring the Global Mirror resources
 Configuring AIX volume groups
 Configuring the cluster
 Failover testing
 LVM administration of DS8000 Global Mirror replicated resources

© Copyright IBM Corp. 2011. All rights reserved. 371


13.1 Planning for Global Mirror
Proper planning is crucial to the success of any disaster recovery solution. This topic reveals
the basic requirements to implement Global Mirror and integrate it with the IBM PowerHA
SystemMirror for AIX Enterprise Edition.

13.1.1 Software prerequisites


Global Mirror functionality works with all the AIX levels that are supported by PowerHA
SystemMirror Standard Edition. The following software is required for the configuration of the
PowerHA SystemMirror for AIX Enterprise Edition for Global Mirror:
 The following base file sets for PowerHA SystemMirror for AIX Enterprise Edition 6.1:
– cluster.es.pprc.cmds
– cluster.es.pprc.rte
– cluster.es.spprc.cmds
– cluster.es.spprc.rte
– cluster.msg.en_US.pprc

PPRC and SPPRC file sets: The PPRC and SPPRC file sets are not required for
Global Mirror support on PowerHA.

 The following additional file sets included in SP3 (must be installed separately and require
the acceptance of licenses during the installation):
– cluster.es.genxd
cluster.es.genxd.cmds 6.1.0.0 Generic XD support - Commands
cluster.es.genxd.rte 6.1.0.0 Generic XD support - Runtime
– cluster.msg.en_US.genxd
cluster.msg.en_US.genxd 6.1.0.0 Generic XD support - Messages
 AIX supported levels:
– 5.3 TL9, RSCT 2.4.12.0, or later
– 6.1 TL2 SP1, RSCT 2.5.4.0, or later
 The IBM DS8700 microcode bundle 75.1.145.0 or later
 DS8000 CLI (DSCLI) 6.5.1.203 or later client interface (must be installed on each
PowerHA SystemMirror node):
– Java™ 1.4.1 or later
– APAR IZ74478, which removes the previous Java requirement
 The path name for the DSCLI client in the PATH for the root user on each PowerHA
SystemMirror node (must be added)

13.1.2 Minimum DS8700 requirements


Before you implement PowerHA SystemMirror with Global Mirror, you must ensure that the
following requirements are met:
 Collect the following information for all the HMCs in your environment:
– IP addresses
– Login names and passwords
– Associations with storage units

372 IBM PowerHA SystemMirror 7.1 for AIX


 Verify that all the data volumes that must be mirrored are visible to all relevant AIX hosts.
 Verify that the DS8700 volumes are appropriately zoned so that the IBM FlashCopy®
volumes are not visible to the PowerHA SystemMirror nodes.
 Ensure all Hardware Management Consoles (HMCs) are accessible by using the Internet
Protocol network for all PowerHA SystemMirror nodes where you want to run Global
Mirror.

13.1.3 Considerations
The PowerHA SystemMirror Enterprise Edition using DS8700 Global Mirror has the following
considerations:
 The AIX Virtual SCSI is not supported in this initial release.
 No auto-recovery is available from a PPRC path or link failure.
If the PPRC path or link between Global Mirror volumes breaks down, the PowerHA
Enterprise Edition is unaware of it. (PowerHA does not process Simple Network
Management Protocol (SNMP) for volumes that use DS8K Global Mirror technology for
mirroring). In this case, the user must identify and correct the PPRC path failure.
Depending on timing conditions, such an event can result in the corresponding Global
Mirror session to go to a “Fatal” state. If this situation occurs, the user must manually stop
and restart the corresponding Global Mirror Session (using the rmgmir and mkgmir DSCLI
commands) or an equivalent DS8700 interface.
 Cluster Single Point Of Control (C-SPOC) cannot perform the some Logical Volume
Manager (LVM) operations on nodes at the remote site that contain the target volumes.
Operations that require nodes at the target site to read from the target volumes result in an
error message in C-SPOC. Such operations include such functions as changing the file
system size, changing the mount point, and adding LVM mirrors. However, nodes on the
same site as the source volumes can successfully perform these tasks, and the changes
can be propagated later to the other site by using a lazy update.

Attention: For C-SPOC operations to work on all other LVM operations, you must
perform all C-SPOC operations with the DS8700 Global Mirror volume pairs in a
synchronized or consistent state. Alternatively, you must perform them in the active
cluster on all nodes.

 The volume group names must be listed in the same order as the DS8700 mirror group
names in the resource group.

13.2 Installing the DSCLI client software


You can download the latest version of the DS8000 DSCLI client software from the following
web page:

ftp://ftp.software.ibm.com/storage/ds8000/updates/DS8K_Customer_Download_Files/CLI

Install the DS8000 DSCLI software on each PowerHA SystemMirror node. By default, the
installation process installs the DSCLI in the /opt/ibm/dscli directory. Add the installation
directory of the DSCLI into the PATH environment variable for the root user.

For more details about the DS8000 DSCLI, see the IBM System Storage DS8000:
Command-Line Interface User’s Guide, SC26-7916.

Chapter 13. Disaster recovery using DS8700 Global Mirror 373


13.3 Scenario description
This scenario uses a three-node cluster named Txrmnia. Two nodes are in the primary site,
Texas, and one node is in the site Romania. The jordan and leeann nodes are at the Texas
site and the robert node is at the Romania site. The primary site, Texas, has both local
automatic failover and remote recovery. Figure 13-1 provides a software and hardware
overview of the tested configuration between the two sites.

Txrmnia

Figure 13-1 DS8700 Global Mirror test scenario

For this test, the resources are limited. Each system has a single IP, an XD_ip network, and
single Fibre Channel (FC) host adapters. Ideally, redundancy might exist throughout the
system, including in the local Ethernet networks, cross-site XD_ip networks, and FC
connectivity. This scenario has a single resource group, ds8kgmrg, which consists of a service
IP address (service_1), a volume group (txvg), and a DS8000 Global Mirror replicated
resource (texasmg). To configure the cluster, see 13.6, “Configuring the cluster” on page 385.

13.4 Configuring the Global Mirror resources


This section explains how to perform the following tasks:
 Checking the prerequisites
 Identifying the source and target volumes
 Configuring the Global Mirror relationships

For each task, the DS8000 storage units are already added to the storage area network
(SAN) fabric and zoned appropriately. Also, the volumes are already provisioned to the nodes.

374 IBM PowerHA SystemMirror 7.1 for AIX


For details about how to set up the storage units, see IBM System Storage DS8700
Architecture and Implementation, SG24-8786.

13.4.1 Checking the prerequisites


To check the prerequisites, follow these steps:
1. Ensure that the DSCLI installation path is in the PATH environment variable on all nodes.
2. Verify that you have the appropriate microcode version on each storage unit by running
the ver -lmc command in a DSCLI session as shown in Example 13-1.

Example 13-1 Checking the microcode level


(0) root @ r9r4m21: : /
# dscli -cfg /opt/ibm/dscli/profile/dscli.profile.hmc1
Date/Time: October 6, 2010 2:15:33 PM CDT IBM DSCLI Version: 6.5.15.19 DS:
IBM.2107-75DC890

dscli> ver -lmc


Date/Time: October 6, 2010 2:15:41 PM CDT IBM DSCLI Version: 6.5.15.19 DS: -
Storage Image LMC
==========================
IBM.2107-75DC890 5.5.1.490
dscli>

3. Check the code bundle level that corresponds to your LMC version on the “DS8700 Code
Bundle Information” web page at:
http://www.ibm.com/support/docview.wss?uid=ssg1S1003593
The code bundle level must be at version 75.1.145.0 or later. Also on the same page,
verify that your displayed DSCLI version corresponds to the installed code bundle level or
a later level.

Example 13-2 shows the extra parameters inserted into the DSCLI configuration file for the
storage unit in the primary site, /opt/ibm/dscli/profile/dscli.profile.hmc1. Adding these
parameters helps to prevent from having to type them each time they are required.

Example 13-2 Editing the DSCLI configuration file


username: redbook
password: r3dbook
hmc1: 9.3.207.122
devid: IBM.2107-75DC890
remotedevid: IBM.2107-75DC980

13.4.2 Identifying the source and target volumes


Figure 13-2 on page 376 shows the volume allocation in DS8000 units for the scenario in this
chapter. Global Copy source volumes are attached to both nodes in the primary site, Texas,
and the corresponding Global Copy target volumes are attached to the node in the secondary
site, Romania. The gray volumes, FlashCopy targets, are not exposed to the hosts.

Chapter 13. Disaster recovery using DS8700 Global Mirror 375


Texas Romania

2604 2600 2C00 2C04

0A08 2E00 2800 2804

Global Copy
Flash Copy
Data Volume

Flash Copy Volume

Figure 13-2 Volume allocation in DS8000 units

Table 13-1 shows the association between the source and target volumes of the replication
relationship and between their logical subsystems (LSS, the two most significant digits of a
volume identifier highlighted in bold in the table). Table 13-1 also indicates the mapping
between the volumes in the DS8000 units and their disk names on the attached AIX hosts.

Table 13-1 AIX hdisk to DS8000 volume mapping


Site Texas Site Romania

AIX disk LSS/VOL ID LSS/VOL ID AIX disk

hdisk10 2E00 2800 hdisk2

hdisk6 2600 2C00 hdisk6

You can easily obtain this mapping by using the lscfg -vl hdiskX | grep Serial command
as shown in Example 13-3. The hdisk serial number is a concatenation of the storage image
serial number and the ID of the volume at the storage level.

Example 13-3 The hdisk serial number in the lscfg command output
# lscfg -vl hdisk10 | grep Serial
Serial Number...............75DC8902E00
# lscfg -vl hdisk6 | grep Serial
Serial Number...............75DC8902600

Symmetrical configuration: In an actual environment (and different from this sample


environment), to simplify the management of your Global Mirror environment, maintain a
symmetrical configuration in terms of both physical and logical elements. With this type of
configuration, you can keep the same AIX disk definitions on all nodes. It also helps you
during configuration and management operations of the disk volumes within the cluster.

376 IBM PowerHA SystemMirror 7.1 for AIX


13.4.3 Configuring the Global Mirror relationships
In this section, you configure the Global Mirror replication relationships by performing the
following tasks:
 Creating PPRC paths
 Creating Global Copy relationships
 Creating FlashCopy relationships
 Selecting an available Global Mirror session identifier
 Defining Global Mirror sessions for all involved LSSs
 Including all the source and target volumes in the Global Mirror session

Creating PPRC paths


In this task, the appropriate FC links have been configured between the storage units.
Example 13-4 shows the FC links that are available for the setup.

Example 13-4 Available FC links


dscli> lsavailpprcport -remotewwnn 5005076308FFC804 2e:28
Date/Time: October 5, 2010 5:48:09 PM CDT IBM DSCLI Version: 6.5.15.19 DS:
IBM.2107-75DC890
Local Port Attached Port Type
=============================
I0010 I0210 FCP
I0013 I0203 FCP
I0013 I0310 FCP
I0030 I0200 FCP
I0030 I0230 FCP
I0030 I0330 FCP
I0040 I0200 FCP
I0040 I0230 FCP
I0041 I0232 FCP
I0041 I0331 FCP
I0042 I0211 FCP
I0110 I0203 FCP
I0110 I0310 FCP
I0110 I0311 FCP
I0111 I0310 FCP
I0111 I0311 FCP
I0130 I0200 FCP
I0130 I0230 FCP
I0130 I0300 FCP
I0130 I0330 FCP
I0132 I0232 FCP
I0132 I0331 FCP
dscli>

Complete the following steps:


1. Run the lssi command on the remote storage unit to obtain the remote wwnn parameter
for the lsavailpprcport command. The last parameter is one possible pair of your source
and target LSSs.
2. For redundancy and bandwidth, configure more FC links by using redundant SAN fabrics.

Chapter 13. Disaster recovery using DS8700 Global Mirror 377


3. Among the multiple displayed links, choose two that have their ports on different adapters.
Use them to create the PPRC path for the 2e:28 LSS pair (see Example 13-5).

Example 13-5 Creating pprc paths


dscli> mkpprcpath -remotewwnn 5005076308FFC804 -srclss 2e -tgtlss 28
I0030:I0230 I0110:I0203
Date/Time: October 5, 2010 5:55:46 PM CDT IBM DSCLI Version: 6.5.15.19 DS:
IBM.2107-75DC890
CMUC00149I mkpprcpath: Remote Mirror and Copy path 2e:28 successfully
established.
dscli> lspprcpath 2e
Date/Time: October 5, 2010 5:56:13 PM CDT IBM DSCLI Version: 6.5.15.19 DS:
IBM.2107-75DC890
Src Tgt State SS Port Attached Port Tgt WWNN
=========================================================
2E 28 Success FF28 I0030 I0230 5005076308FFC804
2E 28 Success FF28 I0110 I0203 5005076308FFC804
dscli>

4. In a similar manner, configure one PPRC path for each other involved LSS pair.
5. Because the PPRC paths are unidirectional, create a second path, in the opposite
direction, for each LSS pair. You use the same procedure, but work on the other storage
unit (see Example 13-6). We select different FC links for this direction.

Example 13-6 Creating PPRC paths in opposite directions


dscli> mkpprcpath -remotewwnn 5005076308FFC004 -srclss 28 -tgtlss 2e
I0311:I0111 I0300:I0130
Date/Time: October 5, 2010 5:57:02 PM CDT IBM DSCLI Version: 6.5.15.19 DS:
IBM.2107-75DC980
CMUC00149I mkpprcpath: Remote Mirror and Copy path 28:2e successfully
established.
dscli>

Creating Global Copy relationships


Create Global Copy relationship between the source and target volumes and then check their
status by using the commands shown in Example 13-7.

Example 13-7 Creating Global Copy relationships


dscli> mkpprc -type gcp 2e00:2800 2600:2c00
Date/Time: October 5, 2010 5:57:13 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
CMUC00153I mkpprc: Remote Mirror and Copy volume pair relationship 2E00:2800 successfully created.
CMUC00153I mkpprc: Remote Mirror and Copy volume pair relationship 2600:2C00 successfully created.
dscli> lspprc 2e00:2800 2600:2c00
Date/Time: October 5, 2010 5:57:42 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status
==================================================================================================
2600:2C00 Copy Pending - Global Copy 26 60 Disabled True
2E00:2800 Copy Pending - Global Copy 2E 60 Disabled True
dscli>

378 IBM PowerHA SystemMirror 7.1 for AIX


Creating FlashCopy relationships
Create FlashCopy relationships on both DS8000 storage units as shown in Example 13-8.

Example 13-8 Creating FlashCopy relationships


dscli> mkflash -tgtinhibit -nocp -record 2e00:0a08 2600:2604
Date/Time: October 5, 2010 4:17:13 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
CMUC00137I mkflash: FlashCopy pair 2E00:0A08 successfully created.
CMUC00137I mkflash: FlashCopy pair 2600:2604 successfully created.
dscli> lsflash 2e00:0a08 2600:2604
Date/Time: October 5, 2010 4:17:31 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID SrcLSS SequenceNum Timeout ActiveCopy Recording Persistent Revertible
SourceWriteEnabled TargetWriteEnabled BackgroundCopy
===========================================================================================
2E00:0A08 0A 0 60 Disabled Enabled Enabled Disabled Enabled
Disabled Disabled
2600:2604 26 0 60 Disabled Enabled Enabled Disabled Enabled
Disabled Disabled
dscli>

dscli> mkflash -tgtinhibit -nocp -record 2800:2804 2c00:2c04


Date/Time: October 5, 2010 4:20:14 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
CMUC00137I mkflash: FlashCopy pair 2800:2804 successfully created.
CMUC00137I mkflash: FlashCopy pair 2C00:2C04 successfully created.
dscli> lsflash 2800:2804 2c00:2c04
Date/Time: October 5, 2010 4:20:38 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID SrcLSS SequenceNum Timeout ActiveCopy Recording Persistent Revertible
SourceWriteEnabled TargetWriteEnabled BackgroundCopy
===========================================================================================
2800:2804 28 0 60 Disabled Enabled Enabled Disabled Enabled
Disabled Disabled
2C00:2C04 2C 0 60 Disabled Enabled Enabled Disabled Enabled
Disabled Disabled
dscli>

Selecting an available Global Mirror session identifier


Example 13-9 lists the Global Mirror sessions that are already defined on each DS8000
storage unit. In this scenario, we chose 03 as the session identifier because it is free on both
storage units.

Example 13-9 Sessions defined on both DS8000 storage units


dscli> lssession 00-ff
Date/Time: October 5, 2010 6:07:19 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
04 77 Normal 0400 Join Pending Primary Copy Pending Secondary Simplex True Disable
0A 04 Normal 0A04 Join Pending Primary Suspended Secondary Simplex False Disable
16 05 Normal 1604 Join Pending Primary Suspended Secondary Simplex False Disable
16 05 Normal 1605 Join Pending Primary Suspended Secondary Simplex False Disable
18 02 Normal 1800 Join Pending Primary Suspended Secondary Simplex False Disable
1C 04 Normal 1C00 Join Pending Primary Suspended Secondary Simplex False Disable
1C 04 Normal 1C01 Join Pending Primary Suspended Secondary Simplex False Disable

dscli> lssession 00-ff


Date/Time: October 5, 2010 6:08:23 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980

Chapter 13. Disaster recovery using DS8700 Global Mirror 379


LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
1A 20 Normal 1A00 Join Pending Primary Simplex Secondary Copy Pending True Disable
1C 01 - - - - - - -
30 77 Normal 3000 Join Pending Primary Simplex Secondary Copy Pending True Disable
dscli>

Defining Global Mirror sessions for all involved LSSs


Define the Global Mirror sessions for all the LSSs associated with source and target volumes
as shown in Example 13-10. The same freely available session identifier, determined in
“Selecting an available Global Mirror session identifier” on page 379, is used on both storage
units.

Example 13-10 Defining the GM session for the source and target volumes
dscli> mksession -lss 2e 03
Date/Time: October 5, 2010 6:11:07 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
CMUC00145I mksession: Session 03 opened successfully.
dscli> mksession -lss 26 03
Date/Time: October 5, 2010 6:11:25 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
CMUC00145I mksession: Session 03 opened successfully.

dscli> mksession -lss 28 03


Date/Time: October 6, 2010 5:39:02 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
CMUC00145I mksession: Session 03 opened successfully.
dscli> mksession -lss 2c 03
Date/Time: October 6, 2010 5:39:15 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
CMUC00145I mksession: Session 03 opened successfully.
dscli>

Including all the source and target volumes in the Global Mirror session
Add the volumes in the Global Mirror sessions and verify their status by using the commands
shown in Example 13-11.

Example 13-11 Adding source and target volumes to the Global Mirror sessions
dscli> chsession -lss 26 -action add -volume 2600 03
Date/Time: October 5, 2010 6:15:17 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
CMUC00147I chsession: Session 03 successfully modified.
dscli> chsession -lss 2e -action add -volume 2e00 03
Date/Time: October 5, 2010 6:15:56 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
CMUC00147I chsession: Session 03 successfully modified.
dscli> lssession 26 2e
Date/Time: October 5, 2010 6:16:21 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
26 03 Normal 2600 Join Pending Primary Copy Pending Secondary Simplex True Disable
2E 03 Normal 2E00 Join Pending Primary Copy Pending Secondary Simplex True Disable

dscli>
dscli> chsession -lss 2c -action add -volume 2c00 03
Date/Time: October 6, 2010 5:41:12 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
CMUC00147I chsession: Session 03 successfully modified.
dscli> chsession -lss 28 -action add -volume 2800 03
Date/Time: October 6, 2010 5:41:56 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980

380 IBM PowerHA SystemMirror 7.1 for AIX


CMUC00147I chsession: Session 03 successfully modified.
dscli> lssession 28 2c
Date/Time: October 6, 2010 5:44:02 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
28 03 Normal 2800 Join Pending Primary Simplex Secondary Copy Pending True Disable
2C 03 Normal 2C00 Join Pending Primary Simplex Secondary Copy Pending True Disable
dscli>

13.5 Configuring AIX volume groups


In this scenario, you create a volume group and a file system on the hdisks associated with
the DS8000 source volumes. These volumes are already identified in 13.4.2, “Identifying the
source and target volumes” on page 375. They are hdisk6 and hdisk10 on the jordan node.

You must configure the volume groups and file systems on the cluster nodes. The application
might need the same major number for the volume group on all nodes. Perform this
configuration task because it might be useful later for additional configuration of the Network
File System (NFS).

For the nodes on the primary site, you can use the standard procedure. You define the
volume groups and file systems on one node and then import them to the other nodes. For
the nodes on the secondary site, you must first suspend the replication on the involved target
volumes.

13.5.1 Configuring volume groups and file systems on primary site


In this task, you create an AIX volume group on the hdisks associated with the DS8000
source volumes on the jordan node and import it on the leeann node. Follow these steps:
1. Choose the next free major number on all cluster nodes by running the lvlstmajor
command on each cluster node. The next common free major number on all systems is 50
as shown in Example 13-12.

Example 13-12 Running the lvlstmajor command on all cluster nodes


root@leeann: lvlstmajor
50...

root@robert: lvlstmajor
44..54,56...

root@jordan: # lvlstmajor
50...

Chapter 13. Disaster recovery using DS8700 Global Mirror 381


2. Create a volume group, called txvg, and a file system, called /txro. These volumes are
already identified in 13.4.2, “Identifying the source and target volumes” on page 375. They
are hdisk6 and hdisk10 on the jordan node. Example 13-13 shows a list of commands to
run on the jordan node.

Example 13-13 Creating txvg volume group on jordan


root@jordan: mkvg -V 50 -y txvg hdisk6 hdisk10
0516-1254 mkvg: Changing the PVID in the ODM.
txvg
root@jordan:chvg -a n xvg
root@jordan: mklv -e x -t jfs2 -y txlv txvg 250
txlv
root@jordan: mklv -e x -t jfs2log -y txloglv txvg 1
txloglv
root@jordan: crfs -v jfs2 -d /dev/txlv -a log=/dev/txloglv -m /txro -A no
File system created successfully.
1023764 kilobytes total disk space.
New File System size is 2048000
root@jordan: lsvg -p txvg
txvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk6 active 511 385
102..00..79..102..102
hdisk10 active 511 386
103..00..79..102..102
root@jordan:lspv|grep -e hdisk6 -e hdisk10
hdisk6 000a625afe2a4958 txvg active
hdisk10 000a624a833e440f txvg active
root@jordan: varyoffvg txvg
root@jordan:

3. Import the volume group on the second node on the primary site, leeann, as shown in
Example 13-14:
a. Verify that the shared disks have the same PVID on both nodes.
b. Run the rmdev -dl command for each hdisk.
c. Run the cfgmgr program.
d. Run the importvg command.

Example 13-14 Importing the txvg volume group on the leeann node
root@leean: rmdev -dl hdisk6
hdisk6 deleted
root@leean: rmdev -dl hdisk10
hdisk10 deleted
root@leean: cfgmgr
root@leean:lspv | grep -e hdisk6 -e hdisk10
hdisk6 000a625afe2a4958 txvg
hdisk10 000a624a833e440f txvg
root@leean: importvg -V 51 -y txvg hdisk6
txvg
root@leean: lsvg -l txvg
txvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
txlv jfs2 250 250 2 open/syncd /txro
txloglv jfs2log 1 1 1 open/syncd N/A

382 IBM PowerHA SystemMirror 7.1 for AIX


root@leean: chvg -a n txvg
root@leean: varyoffvg txvg

13.5.2 Importing the volume groups in the remote site


To import the volume groups in the remote site, use the following steps. Example 13-15
shows the commands to run on the primary site.
1. Obtain a consistent replica of the data, on the primary site, by ensuring that the volume
group is varied off as shown by the last command in Example 13-14.
2. Ensure that the Global Copy is in progress and that the Out of Sync count is 0.
3. Suspend the replication by using the pausepprc command.

Example 13-15 Pausing the Global Copy relationship on the primary site
dscli> lspprc -l 2600 2e00
Date/Time: October 6, 2010 3:40:56 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID State Reason Type Out Of Sync Tracks Tgt Read Src Cascade Tgt Cascade Date
Suspended SourceLSS Timeout (secs) Critical Mode First Pass Status Incremental Resync Tgt Write GMIR CG
PPRC CG isTgtSE DisableAutoResync
===========================================================================================================
===========================================================================================================
2600:2C00 Copy Pending - Global Copy 0 Disabled Disabled Invalid -
26 60 Disabled True Disabled Disabled N/A Disabled
Unknown False
2E00:2800 Copy Pending - Global Copy 0 Disabled Disabled Invalid -
2E 60 Disabled True Disabled Disabled N/A Disabled
Unknown False
dscli> pausepprc 2600:2C00 2E00:2800
Date/Time: October 6, 2010 3:49:29 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
CMUC00157I pausepprc: Remote Mirror and Copy volume pair 2600:2C00 relationship successfully paused.
CMUC00157I pausepprc: Remote Mirror and Copy volume pair 2E00:2800 relationship successfully paused.
dscli> lspprc -l 2600 2e00
Date/Time: October 6, 2010 3:49:41 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID State Reason Type Out Of Sync Tracks Tgt Read Src Cascade Tgt Cascade Date
Suspended SourceLSS Timeout (secs) Critical Mode First Pass Status Incremental Resync Tgt Write GMIR CG
PPRC CG isTgtSE DisableAutoResync
===========================================================================================================
===========================================================================================================
2600:2C00 Suspended Host Source Global Copy 0 Disabled Disabled Invalid -
26 60 Disabled True Disabled Disabled N/A Disabled
Unknown False
2E00:2800 Suspended Host Source Global Copy 0 Disabled Disabled Invalid -
2E 60 Disabled True Disabled Disabled N/A Disabled
Unknown False
dscli>

4. To make the target volumes available to the attached hosts, use the failoverpprc
command on the secondary site as shown in Example 13-16.

Example 13-16 The failoverpprc command on the secondary site storage unit
dscli> failoverpprc -type gcp 2C00:2600 2800:2E00
Date/Time: October 6, 2010 3:55:19 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
CMUC00196I failoverpprc: Remote Mirror and Copy pair 2C00:2600 successfully reversed.
CMUC00196I failoverpprc: Remote Mirror and Copy pair 2800:2E00 successfully reversed.
dscli> lspprc 2C00:2600 2800:2E00
Date/Time: October 6, 2010 3:55:35 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980

Chapter 13. Disaster recovery using DS8700 Global Mirror 383


ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status
====================================================================================================
2800:2E00 Suspended Host Source Global Copy 28 60 Disabled True
2C00:2600 Suspended Host Source Global Copy 2C 60 Disabled True
dscli>

5. Refresh and check the PVIDs. Then import and vary off the volume group as shown in
Example 13-17.

Example 13-17 Importing the volume group txvg on the secondary site node, robert
root@robert: rmdev -dl hdisk2
hdisk2 deleted
root@robert: rmdev -dl hdisk6
hdisk6 deleted
root@robert: cfgmgr
root@robert: lspv |grep -e hdisk2 -e hdisk6
hdisk2 000a624a833e440f txvg
hdisk6 000a625afe2a4958 txvg
root@robert: importvg -V 50 -y txvg hdisk2
txvg
root@robert: lsvg -l txvg
txvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
txlv jfs2 250 250 2 closed/syncd /txro
txloglv jfs2log 1 1 1 closed/syncd N/A
root@robert: varyoffvg txvg

6. Re-establish the Global Copy relationship as shown in Example 13-18.

Example 13-18 Re-establishing the initial Global Copy relationship


dscli> failbackpprc -type gcp 2600:2C00 2E00:2800
Date/Time: October 6, 2010 4:24:10 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
CMUC00197I failbackpprc: Remote Mirror and Copy pair 2600:2C00 successfully failed back.
CMUC00197I failbackpprc: Remote Mirror and Copy pair 2E00:2800 successfully failed back.
dscli> lspprc 2600:2C00 2E00:2800
Date/Time: October 6, 2010 4:24:41 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status
==================================================================================================
2600:2C00 Copy Pending - Global Copy 26 60 Disabled True
2E00:2800 Copy Pending - Global Copy 2E 60 Disabled True

dscli> lspprc 2800 2c00


Date/Time: October 6, 2010 4:24:57 AM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status
=========================================================================================================
2600:2C00 Target Copy Pending - Global Copy 26 unknown Disabled Invalid
2E00:2800 Target Copy Pending - Global Copy 2E unknown Disabled Invalid
dscli>

384 IBM PowerHA SystemMirror 7.1 for AIX


13.6 Configuring the cluster
To configure the cluster, you must complete all software prerequisites. Also you must
configure the /etc/hosts file properly, and verify that the clcomdES subsystem is running on
each node.

To configure the cluster, follow these steps:


1. Add a cluster.
2. Add all three nodes.
3. Add both sites.
4. Add the XD_ip network.
5. Add the disk heartbeat network.
6. Add the base interfaces to XD_ip network.
7. Add the service IP address.
8. Add the DS8000 Global Mirror replicated resources.
9. Add a resource group.
10.Add a service IP, application server, volume group, and DS8000 Global Mirror Replicated
Resource to the resource group.

13.6.1 Configuring the cluster topology


Configuring a cluster entails the following tasks:
 Adding a cluster
 Adding nodes
 Adding sites
 Adding networks
 Adding communication interfaces

Adding a cluster
To add a cluster, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select Extended Configuration  Extended Topology Configuration 
Configure an HACMP Cluster  Add/Change/Show an HACMP Cluster.
3. Enter the cluster name, which is Txrmnia in this scenario, as shown in Figure 13-3. Press
Enter.

Add/Change/Show an HACMP Cluster

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Cluster Name [Txrmnia]
Figure 13-3 Adding a cluster in the SMIT menu

The output is displayed in the SMIT Command Status window.

Chapter 13. Disaster recovery using DS8700 Global Mirror 385


Adding nodes
To add the nodes, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Topology
Configuration  Configure HACMP Nodes  Add a Node to the HACMP Cluster.
3. Enter the desired node name, which is jordan in this case, as shown in Figure 13-4. Press
Enter.
The output is displayed in the SMIT Command Status window.

Add a Node to the HACMP Cluster

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Node Name [jordan]
Communication Path to Node [] +
Figure 13-4 Add a Node SMIT menu

4. In this scenario, repeat these steps two more times to add the additional nodes of leeann
and robert.

Adding sites
To add the nodes, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Topology
Configuration  Configure HACMP Sites  Add a Site.
3. Enter the desired site name, which in this scenario is the Texas site with the nodes jordan
and leeann, as shown in Figure 13-5. Press Enter.
The output is displayed in the SMIT Command Status window.

Add a Site

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Site Name [Texas] +
* Site Nodes jordan leeann +
Figure 13-5 Add a Site SMIT menu

4. In this scenario, repeat these steps to add the Romania site with the robert node.

386 IBM PowerHA SystemMirror 7.1 for AIX


Example 13-19 shows the site definitions. The dominance information is displayed, but not
relevant until a resource group is defined later by using the nodes.

Example 13-19 cllssite information about site definitions


./cllssite
----------------------------------------------------
Sitename Site Nodes Dominance Protection Type
---------------------------------------------------
Texas jordan leeann NONE
Romania robert NONE

Adding networks
To add the nodes, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Topology
Configuration  Configure HACMP Networks  Add a Network to the HACMP
Cluster.
3. Choose the desired network type, which in this scenario is XD_ip.
4. Keep the default network name and press Enter (Figure 13-6).

Add an IP-Based Network to the HACMP Cluster

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Network Name [net_XD_ip_01]
* Network Type XD_ip
* Netmask(IPv4)/Prefix Length(IPv6) [255.255.255.0]
* Enable IP Address Takeover via IP Aliases [Yes] +
IP Address Offset for Heartbeating over IP Aliases []
Figure 13-6 Add an IP-Based Network SMIT menu

5. Repeat these steps but select a network type of diskhb for the disk heartbeat network and
keep the default network name of net_diskhb_01.

Adding communication interfaces


To add the nodes, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Topology
Configuration  Configure HACMP Communication Interfaces/Devices  Add
Communication Interfaces/Devices  Add Pre-defined Communication Interfaces
and Devices  Communication Interfaces.
3. Select the previously created network, which in this scenario is net_XD_ip_01.

Chapter 13. Disaster recovery using DS8700 Global Mirror 387


4. Complete the SMIT menu fields. The first interface in this scenario is for jordan is shown
in Figure 13-7. Press Enter.
The output is displayed in the SMIT Command Status window.

Add a Communication Interface

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* IP Label/Address [jordan_base] +
* Network Type XD_ip
* Network Name net_XD_ip_01
* Node Name [jordan] +
Figure 13-7 Add communication interface SMIT menu

5. Repeat these steps and select Communication Devices to complete the disk heartbeat
network.

The topology is now configured. Also you can see all the interfaces and devices from the
cllsif command output shown in Figure 13-8.

Adapter Type Network Net Type Attribute Node IP Address


jordan_base boot net_XD_ip_01 XD_ip public jordan 9.3.207.209
jordandhb service net_diskhb_01 diskhb serial jordan /dev/hdisk8
leeann_base boot net_XD_ip_01 XD_ip public leeann 9.3.207.208
leeanndhb service net_diskhb_01 diskhb serial leeann /dev/hdisk8
robert_base boot net_XD_ip_01 XD_ip public robert 9.3.207.207

Figure 13-8 Cluster interfaces and devices defined

13.6.2 Configuring cluster resources and resource group


The test scenario has only one resource group, which contains the resources of the service
IP address, volume group, and DS8000 replicated resources. Configure the cluster resources
and resource group as explained in the following sections.

Defining the service IP


Define the service IP by following these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Resource
Configuration  HACMP Extended Resources Configuration  Configure HACMP
Service IP Labels/Addresses  Add a Service IP Label/Address  Configurable on
Multiple Nodes.
3. Choose the net_XD_ip_01 network and press Enter.
4. Choose the appropriate IP label or address. Press Enter.
The output is displayed in the SMIT Command Status window.

388 IBM PowerHA SystemMirror 7.1 for AIX


In this scenario, we added serviceip_2, as shown in Figure 13-9.

Add a Service IP Label/Address configurable on Multiple Nodes (extended)

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* IP Label/Address serviceip_2 +
Netmask(IPv4)/Prefix Length(IPv6) []
* Network Name net_XD_ip_01
Alternate HW Address to accompany IP Label/Address []
Associated Site ignore
Figure 13-9 Add a Service IP Label SMIT menu

In most true site scenarios, where each site is on different segments, it is common to create
at least two service IP labels. You create one for each site by using the Associated Site
option, which indicates the desire to have site-specific service IP labels. With this option, you
can have a unique service IP label at each site. However, we do not use them in this test
because we are on the same network segment.

Defining the DS8000 Global Mirror resources


To fully define the Global Mirror resources, follow these steps:
1. Add a storage agent or agents.
2. Add a storage system or systems.
3. Add a mirror group or groups.

Because these options are all new, define each one before you configure them:
Storage agent A generic name given by PowerHA SystemMirror for an entity such as
the IBM DS8000 HMC. Storage agents typically provide a one-point
coordination point and often use TCP/IP as their transport for
communication. You must provide the IP address and authentication
information that will be used to communicate with the HMC.
Storage system A generic name given by PowerHA SystemMirror for an entity such as
a DS8700 Storage Unit. When using Global Mirror, you must associate
one storage agent with each storage system. You must provide the
IBM DS8700 system identifier for the storage system. For example,
IBM.2107-75ABTV1 is a storage identifier for a DS8000 Storage
System.
Mirror group A generic name given by PowerHA SystemMirror for a logical
collection of volumes that must be mirrored to another storage system
that resides on a remote site. A Global Mirror session represents a
mirror group.

Adding a storage agent


To add a storage agent, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Resource
Configuration  HACMP Extended Resources Configuration  Configure DS8000
Global Mirror Resources  Configure Storage Agents  Add a Storage Agent.

Chapter 13. Disaster recovery using DS8700 Global Mirror 389


3. Complete the menu appropriately and press Enter. Figure 13-10 shows the configuration
for this scenario.
The output is displayed in the SMIT Command Status window.

Add a Storage Agent

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Storage Agent Name [ds8khmc]
* IP Addresses [9.3.207.122]
* User ID [redbook]
* Password [r3dbook]
Figure 13-10 Add a Storage Agent SMIT menu

It is possible to have multiple storage agents. However, this test scenario has only one
storage agent that manages both storage units.

Important: The user ID and password are stored as flat text in the
HACMPxd_storage_agent.odm file.

Adding a storage system


To add the storage systems, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Resource
Configuration  HACMP Extended Resources Configuration  Configure DS8000
Global Mirror Resources  Configure Storage Systems  Add a Storage System.
3. Complete the menu appropriately and press Enter. Figure 13-11 shows the configuration
for this scenario.
The output is displayed in the SMIT Command Status window.

Add a Storage System

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Storage System Name [texasds8k]
* Storage Agent Name(s) ds8kmainhmc +
* Site Association Texas +
* Vendor Specific Identification [IBM.2107-75DC890] +
* WWNN [5005076308FFC004] +
Figure 13-11 Add a Storage System SMIT menu

390 IBM PowerHA SystemMirror 7.1 for AIX


4. Repeat these steps for the storage system at Romania site, and name it romaniads8k.
Example 13-20 shows the configuration.

Example 13-20 Storage systems definitions


Storage System Name texasds8k
Storage Agent Name(s) ds8kmainhmc
Site Association Texas
Vendor Specific Identification IBM.2107-75DC890
WWNN 5005076308FFC004

Storage System Name romaniads8k


Storage Agent Name(s) ds8kmainhmc
Site Association Romania
Vendor Specific Identification IBM.2107-75DC980
WWNN 5005076308FFC804

Adding a mirror group


You are now ready to add the storage systems. To add a storage system, perform the
following steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Resource
Configuration  HACMP Extended Resources Configuration  Configure DS8000
Global Mirror Resources  Configure Mirror Groups  Add a Mirror Group.
3. Complete the menu appropriately and press Enter. Figure 13-12 show the configuration
for this scenario.
The output is displayed in the SMIT Command Status window.

Add a Mirror Group

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Mirror Group Name [texasmg]
* Storage System Name texasds8k romaniads8k +
* Vendor Specific Identifier [03] +
* Recovery Action automatic +
Maximum Coordination Time [50]
Maximum Drain Time [30]
Consistency Group Interval Time [0]
Figure 13-12 Add a Mirror Group SMIT menu

Vendor Specific Identifier field: For the Vendor Specific Identifier field, provide only the
Global Mirror session number.

Defining a resource group and Global Mirror resources


Now that you have all the components configured that are required for the DS8700 replicated
resource, you can create a resource group and add your resources to it.

Chapter 13. Disaster recovery using DS8700 Global Mirror 391


Adding a resource group
To add a resource group, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Resource
Configuration  HACMP Extended Resources Group Configuration  Add a
Resource Group.
3. Complete the menu appropriately and press Enter. Figure 13-13 shows the configuration
in this scenario. Notice that for the Inter-Site Management Policy, we chose Prefer
Primary Site. This option ensures that resource group starts automatically when the
cluster is started in the primary Texas site.
The output is displayed in the SMIT Command Status window.

Add a Resource Group (extended)

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Resource Group Name [ds8kgmrg]

Inter-Site Management Policy [Prefer Primary Site] +


* Participating Nodes from Primary Site [jordan leeann] +
Participating Nodes from Secondary Site [robert] +

Startup Policy Online On Home Node Only+


Fallover Policy Fallover To Next Priority Node > +
Fallback Policy Never Fallback

Figure 13-13 Add a Resource Group SMIT menu

Adding resources to a resource group


To add resources to a resource group, perform the following steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Resource
Configuration  Change/Show Resources and Attributes for a Resource Group.
3. Choose the resource group, which in this example is ds8kgmrg.
4. Complete the menu appropriately and press Enter. Figure 13-13 shows the configuration
for this scenario.
The output is displayed in the SMIT Command Status window.

In this scenario, we only added a service IP label, the volume group, and the DS8000 Global
Mirror Replicated Resources as shown in the streamlined clshowres command output in
Example 13-21.

Volume group: The volume group names must be listed in the same order as the DS8700
mirror group names in the resource group.

Example 13-21 Resource group attributes and resources


Resource Group Name ds8kgmrg
Inter-site Management Policy Prefer Primary Site
Participating Nodes from Primary Site jordan leeann

392 IBM PowerHA SystemMirror 7.1 for AIX


Participating Nodes from Secondary Site robert
Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node
Fallback Policy Never Fallback
Service IP Label serviceip_2
Volume Groups txvg +
GENXD Replicated Resources texasmg +

DS8000 Global Mirror Replicated Resources field: In the SMIT menu for adding
resources to the resource group, notice that the appropriate field is named DS8000 Global
Mirror Replicated Resources. However, when viewing the menu by using the clshowres
command (Example 13-21 on page 392), the field is called GENXD Replicated Resources.

You can now synchronize the cluster, start the cluster, and begin testing it.

13.7 Failover testing


This section takes you through basic failover testing scenarios with the DS8000 Global Mirror
replicated resources locally within the site and across sites. You must carefully plan the
testing of a site cluster failover because more time is required to manipulate the secondary
target LUNs at the recovery site. Also when testing the asynchronous replication, because of
the nature of the asynchronous replication, it can also impact the data.

In these scenarios, redundancy tests, such as on IP networks that have only a single network,
cannot be performed. Instead you must configure redundant IP or non-IP communication
paths to avoid isolation of the sites. The loss of all the communication paths between sites
leads to a partitioned state of the cluster. Such a loss also leads to data divergence between
sites if the replication links are also unavailable.

Another specific failure scenario is the loss of replication paths between the storage
subsystems while the cluster is running on both sites. To avoid this type of loss, configure a
redundant PPRC path or links for the replication. You must manually recover the status of the
pairs after the storage links are operational again.

Important: If the PPRC path or link between Global Mirror volumes breaks down, the
PowerHA Enterprise Edition is unaware. The reason is that PowerHA does not process
SNMP for volumes that use DS8700 Global Mirror technology for mirroring. In such a case,
you must identify and correct the PPRC path failure. Depending upon some timing
conditions, such an event can result in the corresponding Global Mirror session going into
a fatal state. In this situation, you must manually stop and restart the corresponding Global
Mirror session (by using the rmgmir and mkgmir DSCLI commands) or an equivalent
DS8700 interface.

This topic takes you through the following tests:


 Graceful site failover
 Rolling site failure
 Site re-integration

Each test, other than the re-integration test, begins in the same initial state of the primary site
hosting the ds8kgmrg resource group on the primary node as shown in Example 13-22 on
page 394. Before each test, we start copying data from another file system to the replicated
file systems. After each test, we verify that the service IP address is online and that new data

Chapter 13. Disaster recovery using DS8700 Global Mirror 393


is in the file systems. We also had a script that inserted the current time and date, along with
the local node name, into a file on each file system.

Example 13-22 Beginning of the test cluster resource group states


jordan# clRGinfo
-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
ds8kgmrg ONLINE jordan@Texas
OFFLINE leeann@Texas
ONLINE SECONDARY robert@Romania

After each test, we show the Global Mirror states. Example 13-23 shows the normal running
production status of the Global Mirror pairs from each site.

Example 13-23 Beginning states of the Global Mirror pairs


*******************From node jordan at site Texas***************************

dscli> lssession 26 2E
Date/Time: October 10, 2010 4:00:04 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
==============
26 03 CG In Progress 2600 Active Primary Copy Pending Secondary Simplex True
Disable
2E 03 CG In Progress 2E00 Active Primary Copy Pending Secondary Simplex True
Disable

dscli> lspprc 2600 2E00


Date/Time: October 10, 2010 4:00:43 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status
==================================================================================================
2600:2C00 Copy Pending - Global Copy 26 60 Disabled True
2E00:2800 Copy Pending - Global Copy 2E 60 Disabled True

*******************From remote node robert at site Romania***************************

dscli> lssession 28 2c
Date/Time: October 10, 2010 3:54:58 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
======
28 03 Normal 2800 Join Pending Primary Simplex Secondary Copy Pending True Disable
2C 03 Normal 2C00 Join Pending Primary Simplex Secondary Copy Pending True Disable

dscli> lspprc 2800 2c00


Date/Time: October 10, 2010 3:55:48 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status
=========================================================================================================
2600:2C00 Target Copy Pending - Global Copy 26 unknown Disabled Invalid
2E00:2800 Target Copy Pending - Global Copy 2E unknown Disabled Invalid

394 IBM PowerHA SystemMirror 7.1 for AIX


13.7.1 Graceful site failover
Performing a controlled move of a production environment across sites is a basic test to
ensure that the remote site can bring the production environment online. This test is done
only during initial implementation testing or during a planned production outage of the site. In
this test, we perform the graceful failover operation between sites by performing a resource
group move.

In a true maintenance scenario, you might most likely perform a graceful site failover by
stopping the cluster on the local standby node first. Then you stop the cluster on the
production node by using Move Resource Group.

Moving the resource group to another site: In this scenario, because we only have one
node at the Romania site, we use the option to move the resource group to another site. If
multiple remote nodes are members of the resource, use the option to move the resource
group to another node instead.

During this move, the following operations are performed:


 Release the primary online instance of ds8kgmrg at the Texas site. This operation entails
the following tasks:
– Executes the application server stop.
– Unmounts the file systems.
– Varies off the volume group.
– Removes the service IP address.
 Release the secondary online instance of ds8kgmrg at the Romania site.
 Acquire ds8kgmrg in the secondary online state at the Texas site.
 Acquire ds8kgmrg in the online primary state at the Romania site.

To perform the resource group move by using SMIT, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path System Management (C-SPOC)  Resource Groups and
Applications  Move a Resource Group to Another Node / Site  Move Resource
Groups to Another Site.

Chapter 13. Disaster recovery using DS8700 Global Mirror 395


3. Select the ONLINE instance of ds8kgmrg to be moved as shown in Figure 13-14.

Move a Resource Group to Another Node / Site

Move cursor to desired item and press Enter.

Move Resource Groups to Another Node


Move +--------------------------------------------------------------------------+
| Select a Resource Group |
| |
| Move cursor to desired item and press Enter. Use arrow keys to scroll. |
| |
| # |
| # Resource Group State Node(s) / Site |
| # |
| ds8kgmrg ONLINE jordan / Texas |
| ds8kgmrg ONLINE SECONDARY robert / Romani |
| |
| # |
| # Resource groups in node or site collocation configuration: |
| # Resource Group(s) State Node / Site |
| # |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1=Help| /=Find n=Find Next |
F9=Shel+--------------------------------------------------------------------------+

Figure 13-14 Selecting a resource group

4. Select the Romania site from the next menu as shown in Figure 13-15.

+--------------------------------------------------------------------------+
| Select a Destination Site |
| |
| Move cursor to desired item and press Enter. |
| |
| # *Denotes Originally Configured Primary Site |
| Romania |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
| /=Find n=Find Next |
+--------------------------------------------------------------------------+
Figure 13-15 Selecting a site for a resource group move

5. Verify the information in the final menu and Press Enter.

396 IBM PowerHA SystemMirror 7.1 for AIX


Upon completion of the move, ds8kgmrg is online on the node robert as shown
Example 13-24.

Attention: During our testing, a problem was encountered. After performing the first
resource group move between sites, we are unable to move it back due to the pick list
for destination site is empty. We can move it back by node. Later in our testing, the
by-site option started working. However, it moved the resource group to the standby
node at the primary site instead of the original primary node. If you encounter similar
problems, contact IBM support.

Example 13-24 Resource group status after the site move to Romania
-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
ds8kgmrg ONLINE SECONDARY jordan@Texas
OFFLINE leeann@Texas
ONLINE robert@Romania

6. Repeat the resource group move to move it back to its original primary site, Texas, and
node, jordan, to return to the original starting state. However, instead of using the option
to move it another site, use the option to move it to another node.

Example 13-25 shows that the Global Mirror statuses are now swapped, and the local site is
showing the LUNs now as the target volumes.

Example 13-25 Global Mirror status after the resource group move
*******************From node jordan at site Texas***************************

dscli> lssession 26 2E
Date/Time: October 10, 2010 4:04:44 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
======
26 03 Normal 2600 Active Primary Simplex Secondary Copy Pending True Disable
2E 03 Normal 2E00 Active Primary Simplex Secondary Copy Pending True Disable

dscli> lspprc 2600 2E00


Date/Time: October 10, 2010 4:05:26 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status
=========================================================================================================
2800:2E00 Target Copy Pending - Global Copy 28 unknown Disabled Invalid
2C00:2600 Target Copy Pending - Global Copy 2C unknown Disabled Invalid

*******************From remote node robert at site Romania***************************

dscli> lssession 28 2C
Date/Time: October 10, 2010 3:59:25 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
==============
28 03 CG In Progress 2800 Active Primary Copy Pending Secondary Simplex True
Disable
2C 03 CG In Progress 2C00 Active Primary Copy Pending Secondary Simplex True
Disable

Chapter 13. Disaster recovery using DS8700 Global Mirror 397


dscli> lspprc 2800 2C00
Date/Time: October 10, 2010 3:59:35 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status
==================================================================================================
2800:2E00 Copy Pending - Global Copy 28 60 Disabled True
2C00:2600 Copy Pending - Global Copy 2C 60 Disabled True

13.7.2 Rolling site failure


This scenario entails performing a rolling site failure of the Texas site by using the following
steps:
1. Halt the primary production node jordan at the Texas site.
2. Verify that the resource group ds8kgmrg is acquired locally by the node leeann.
3. Verify that the Global Mirror pairs are in the same status as before the system failure.
4. Halt the node leeann to produce a site down.
5. Verify that the resource group ds8kgmrg is acquired remotely by the robert node.
6. Verify that the Global Mirror pair states are changed.

Begin with all three nodes active in the cluster and the resource group online on the primary
node as shown in Example 13-22 on page 394.

On the node jordan, we run the reboot -q command. The node leeann acquires the
ds8kgmrg resource group as shown in Example 13-26.

Example 13-26 Local node failover within the site Texas


root@leeann: clRGinfo
------------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
ds8kgmrg OFFLINE jordan@Texas
ONLINE leeann@Texas
ONLINE SECONDARY robert@Romania

Example 13-27 shows that the statuses are the same as when we started.

Example 13-27 Global Mirror pair status after a local failover


*******************From node leeann at site Texas***************************

dscli> lssession 26 2E
Date/Time: October 10, 2010 4:10:04 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
==============
26 03 CG In Progress 2600 Active Primary Copy Pending Secondary Simplex True
Disable
2E 03 CG In Progress 2E00 Active Primary Copy Pending Secondary Simplex True
Disable

dscli> lspprc 2600 2E00


Date/Time: October 10, 2010 4:10:43 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status
==================================================================================================
2600:2C00 Copy Pending - Global Copy 26 60 Disabled True
2E00:2800 Copy Pending - Global Copy 2E 60 Disabled True

398 IBM PowerHA SystemMirror 7.1 for AIX


*******************From remote node robert at site Romania***************************

dscli> lssession 28 2c
Date/Time: October 10, 2010 4:04:58 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
28 03 Normal 2800 Join Pending Primary Simplex Secondary Copy Pending True Disable
2C 03 Normal 2C00 Join Pending Primary Simplex Secondary Copy Pending True Disable

dscli> lspprc 2800 2c00


Date/Time: October 10, 2010 4:05:48 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status
=========================================================================================================
2600:2C00 Target Copy Pending - Global Copy 26 unknown Disabled Invalid
2E00:2800 Target Copy Pending - Global Copy 2E unknown Disabled Invalid

Upon the cluster stabilization, we run the reboot -q command on the leeann node invoking a
site_down event. The robert node at the Romania site acquires the ds8kgmrg resource group
as shown in Example 13-28.

Example 13-28 Hard failover between sites


root@robert: clRGinfo
-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
ds8kgmrg OFFLINE jordan@Texas
OFFLINE leeann@Texas
ONLINE robert@Romania

You can also see that the replicated pairs are now in the suspended state at the remote site as
shown in Example 13-29.

Example 13-29 Global Mirror pair status after site failover


*******************From remote node robert at site Romania***************************
dscli> lssession 28 2c
Date/Time: October 10, 2010 4:17:28 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
28 03 Normal 2800 Join Pending Primary Suspended Secondary Simplex False Disable
2C 03 Normal 2C00 Join Pending Primary Suspended Secondary Simplex False Disable

dscli> lspprc 2800 2c00


Date/Time: October 10, 2010 4:17:55 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status
====================================================================================================
2800:2E00 Suspended Host Source Global Copy 28 60 Disabled False
2C00:2600 Suspended Host Source Global Copy 2C 60 Disabled False

Chapter 13. Disaster recovery using DS8700 Global Mirror 399


Important: Although the testing resulted in a site_down event, we never lost access to the
primary storage subsystem. PowerHA does not check storage connectivity back to the
primary site during this event. Before moving back to the primary site, re-establish the
replicated pairs and get them all back in sync. If you replace the storage, you might also
have to change the storage agent, storage subsystem, and mirror groups to ensure that
the new configuration is correct for the cluster.

13.7.3 Site re-integration


Before bringing the primary site node back into the cluster, the Global Mirror pairs must be
placed back in sync by using the following steps:

Tip: Follow these steps “as is” because you can accomplish the same results using various
methods:

1. Verify that the Global Mirror statuses at the primary site are suspended.
2. Fail back PPRC from the secondary site.
3. Verify that the Global Mirror status at the primary site shows the target status.
4. Verify that out-of-sync tracks are 0.
5. Stop the cluster to ensure that the volume group I/O is stopped.
6. Fail over the PPRC on the primary site.
7. Fail back the PPRC on the primary site.
8. Start the cluster.

Failing back the PPRC pairs to the secondary site


To fail back the PPRC pairs to the secondary site, follow these steps:
1. Verify the current state of the Global Mirror pairs at the primary site from the jordan node.
The pairs are suspended as shown in Example 13-30.

Example 13-30 Suspended pair status in Global Mirror on the primary site after node restart
*******************From node jordan at site Texas***************************
dscli> lspprc 2600 2e00
Date/Time: October 10, 2010 4:27:48 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status
====================================================================================================
2600:2C00 Suspended Host Source Global Copy 26 60 Disabled True
2E00:2800 Suspended Host Source Global Copy 2E 60 Disabled True

2. On the remote node robert, fail back the PPRC pairs as shown in Example 13-31.

Example 13-31 Failing back PPRC pairs at the remote site


*******************From node robert at site Romania***************************
dscli> failbackpprc -type gcp 2C00:2600 2800:2E00
Date/Time: October 10, 2010 4:22:09 PM CDT IBM DSCLI Version: 6.5.15.19 DS:
IBM.2107-75DC980
CMUC00197I failbackpprc: Remote Mirror and Copy pair 2C00:2600 successfully failed back.
CMUC00197I failbackpprc: Remote Mirror and Copy pair 2800:2E00 successfully

400 IBM PowerHA SystemMirror 7.1 for AIX


3. After executing the fallback, check the status again of the pairs from the primary site to
ensure that they are now shown as Target (Example 13-32).

Example 13-32 Verifying that the primary site LUNs are now target LUNs
*******************From node jordan at site Texas***************************
dscli> lspprc 2600 2e00
Date/Time: October 10, 2010 4:44:21 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First
Pass Status
================================================================================================
=========
2800:2E00 Target Copy Pending - Global Copy 28 unknown Disabled Invalid
2C00:2600 Target Copy Pending - Global Copy 2C unknown Disabled Invalid

4. Monitor that the status of replication at the remote site by watching the Out of Sync Tracks
field by using the lspprc -l command. After they are at 0, as shown in Example 13-33,
they are in sync. Then you can stop the remote site in preparation to move production
back to the primary site.

Example 13-33 Verifying that the Global Mirror pairs are back in sync
dscli> lspprc -l 2800 2c00
Date/Time: October 10, 2010 4:22:46 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID State Reason Type Out Of Sync Tracks Tgt Read Src Cascade Tgt Cascade Date
Suspended SourceLSS
===========================================================================================================
============
2800:2E00 Copy Pending - Global Copy 0 Disabled Disabled Invalid -
28
2C00:2600 Copy Pending - Global Copy 0 Disabled Disabled Invalid -
2C 6

Failing over the PPRC pairs back to the primary site


To fail over the PPRC pairs back to the primary site, follow these steps:
1. Stop the cluster on node robert by using the smitty clstop command to bring the
resource group down.
2. After the resources are offline, continue to fail over the PPRC on the primary site jordan
node as shown Example 13-34.

Example 13-34 Failover PPRC pairs at local primary site


*******************From node jordan at site Texas***************************
dscli> failoverpprc -type gcp 2600:2c00 2E00:2800
Date/Time: October 10, 2010 4:45:16 PM CDT IBM DSCLI Version: 6.5.15.19 DS:
IBM.2107-75DC890
CMUC00196I failoverpprc: Remote Mirror and Copy pair 2600:2C00 successfully
reversed.
CMUC00196I failoverpprc: Remote Mirror and Copy pair 2E00:2800 successfully
reversed.

Chapter 13. Disaster recovery using DS8700 Global Mirror 401


3. Again verify that the status is in the suspended state on the primary site and that the
remote site shows the copy state as shown in Example 13-35.

Example 13-35 Global Mirror pairs suspended on the primary site


*******************From node jordan at site Texas***************************
dscli> lspprc 2600 2E00
Date/Time: October 10, 2010 4:45:51 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass
Status
================================================================================================
====
2600:2C00 Suspended Host Source Global Copy 26 60 Disabled True
2E00:2800 Suspended Host Source Global Copy 2E 60 Disabled True

******************From node robert at site Romania***************************


dscli> lspprc 2800 2c00
Date/Time: October 10, 2010 4:39:27 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass
Status
================================================================================================
==
2800:2E00 Copy Pending - Global Copy 28 60 Disabled True
2C00:2600 Copy Pending - Global Copy 2C 60 Disabled True

Failing back the PPRC pairs to the primary site


You cannot complete the switchback to the primary site by performing a failback of the Global
Mirror pairs to the primary site by running the failbackpprc command as shown in
Example 13-36.

Example 13-36 Failing back the PPRC pairs on the primary site
*******************From node jordan at site Texas***************************
dscli> failbackpprc -type gcp 2600:2c00 2E00:2800
Date/Time: October 10, 2010 4:46:49 PM CDT IBM DSCLI Version: 6.5.15.19 DS:
IBM.2107-75DC890
CMUC00197I failbackpprc: Remote Mirror and Copy pair 2600:2C00 successfully failed back.
CMUC00197I failbackpprc: Remote Mirror and Copy pair 2E00:2800 successfully failed back.

Verify the status of the pairs at each site as shown in Example 13-37.

Example 13-37 Global Mirror pairs failed back to the primary site
*******************From node jordan at site Texas***************************
dscli> lspprc 2600 2e00
Date/Time: October 10, 2010 4:47:04 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status
==================================================================================================
2600:2C00 Copy Pending - Global Copy 26 60 Disabled True
2E00:2800 Copy Pending - Global Copy 2E 60 Disabled True

******************From node robert at site Romania***************************


dscli> lspprc 2800 2c00
Date/Time: October 10, 2010 4:40:44 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status

402 IBM PowerHA SystemMirror 7.1 for AIX


=========================================================================================================
2600:2C00 Target Copy Pending - Global Copy 26 unknown Disabled Invalid
2E00:2800 Target Copy Pending - Global Copy 2E unknown Disabled Invalid

Starting the cluster


To start the cluster, follow these steps:
1. Start all nodes in the cluster by using the smitty clstart command as shown
Figure 13-16.

Start Cluster Services

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Start now, on system restart or both now +
Start Cluster Services on these nodes [jordan,leeann,robert] +
* Manage Resource Groups Automatically +
BROADCAST message at startup? true +
Startup Cluster Information Daemon? true +
Ignore verification errors? false +
Automatically correct errors found during Interactively +
cluster start?
Figure 13-16 Restarting a cluster after a site failure

Upon startup of the primary node jordan, the resource group is automatically started on
jordan and back to the original starting point as shown in Example 13-38.

Example 13-38 Resource group status after restart


-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
ds8kgmrg ONLINE jordan@Texas
OFFLINE leeann@Texas
ONLINE SECONDARY robert@Romania

2. Verify the pair and session status on each site as shown in Example 13-39.

Example 13-39 Global Mirror pairs back to normal


*******************From node jordan at site Texas***************************
dscli>lssession 26 2e
Date/Time: October 10, 2010 5:02:11 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
==============
26 03 CG In Progress 2600 Active Primary Copy Pending Secondary Simplex True
Disable
2E 03 CG In Progress 2E00 Active Primary Copy Pending Secondary Simplex True
Disable

dscli> lspprc 2600 2e00

Chapter 13. Disaster recovery using DS8700 Global Mirror 403


Date/Time: October 10, 2010 5:02:26 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status
==================================================================================================
2600:2C00 Copy Pending - Global Copy 26 60 Disabled True
2E00:2800 Copy Pending - Global Copy 2E 60 Disabled True

******************From node robert at site Romania***************************


dscli>lssession 28 2C
Date/Time: October 10, 2010 4:56:11 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete
AllowCascading
===========================================================================================================
======
28 03 Normal 2800 Active Primary Simplex Secondary Copy Pending True Disable
2C 03 Normal 2C00 Active Primary Simplex Secondary Copy Pending True Disable

dscli> lspprc 2800 2c00


Date/Time: October 10, 2010 4:56:30 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status
=========================================================================================================
2600:2C00 Target Copy Pending - Global Copy 26 unknown Disabled Invalid
2E00:2800 Target Copy Pending - Global Copy 2E unknown Disabled Invalid

13.8 LVM administration of DS8000 Global Mirror replicated


resources
This section provides the common scenarios for adding additional storage to an existing
Global Mirror replicated environment. These scenarios work primarily with the Texas site and
the ds8kgmrg resource group. You perform the following tasks:
 Adding a new Global Mirror pair to an existing volume group
 Adding a Global Mirror pair into a new volume group

Dynamically expanding a volume: This topic does not provide information about
dynamically expanding a volume because this option is not supported.

13.8.1 Adding a new Global Mirror pair to an existing volume group


To add a new Global Mirror pair to an existing volume group, follow these steps:
1. Assign a new LUN to each site, add the FlashCopy devices, and add the new pair into the
existing session as explained in 13.4.3, “Configuring the Global Mirror relationships” on
page 377. Table 13-2 summarizes the LUNs that are used from each site.
Table 13-2 Summary of the LUNs used on each site
Texas Romania

AIX DISK LSS/VOL ID AIX DISK LSS/VOL ID

hdisk11 2605 hdisk10 2C06

404 IBM PowerHA SystemMirror 7.1 for AIX


2. Define the new LUNs:
a. Run the cfgmgr command on the primary node jordan.
b. Assign the PVID on the node jordan.
chdev -l hdisk11 -a pv=yes
c. Configure disk and PVID on local node leeann by using the cfgmgr command.
d. Verify that the PVID is displayed by running the lspv command.
e. Pause the PPRC on the primary site.
f. Fail over the PPRC to the secondary site.
g. Configure the disk and PVID on the remote node robert with the cfgmgr command.
h. Verify that the PVID is displayed by running the lspv command.
i. Fail back the PPRC to the primary site.
3. Add the new disk into the volume group by using C-SPOC as follows:

Important: C-SPOC cannot perform the certain LVM operations on nodes at the
remote site (that contain the target volumes). Such operations include those operations
that require nodes at the target site to read from the target volumes. These operations
cause an error message in C-SPOC. This includes functions such as changing file
system size, changing mount point, and adding LVM mirrors. However, nodes on the
same site as the source volumes can successfully perform these tasks. The changes
can be propagated later to the other site by using a lazy update.

For C-SPOC operations to work on all other LVM operations, perform all C-SPOC
operations with the Global Mirror volume pairs in synchronized or consistent states or
the ACTIVE cluster on all nodes.

a. From the command line, type the smitty cl_admin command.


b. In SMIT, select the path System Management (C-SPOC)  Storage  Volume
Groups  Add a Volume to a Volume Group.
c. Select the txvg volume group from the pop-up menu.

Chapter 13. Disaster recovery using DS8700 Global Mirror 405


d. Select the disk or disks by PVID as shown in Figure 13-17.

Set Characteristics of a Volume Group

Move cursor to desired item and press Enter.

Add a Volume to a Volume Group


Change/Show characteristics of a Volume Group
Remove a Volume from a Volume Group
Enable/Disable a Volume Group for Cross-Site LVM Mirroring Verification

+--------------------------------------------------------------------------+
| Physical Volume Names |
| |
| Move cursor to desired item and press Enter. |
| |
| 000a624a987825c8 ( hdisk10 on node robert ) |
| 000a624a987825c8 ( hdisk11 on nodes jordan,leeann ) |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 13-17 Disk selection to add to the volume group

e. Verify the menu information, as shown in Figure 13-18, and press Enter.

Add a Volume to a Volume Group

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
VOLUME GROUP name txvg
Resource Group Name ds8kgmrg
Node List jordan,leeann,robert
Reference node robert
VOLUME names hdisk10
Figure 13-18 Add a Volume C-SPOC SMIT menu

Upon completion of the C-SPOC operation, the local nodes have been updated but the
remote node has not been updated as shown in Example 13-40. This node was not updated
because the target volumes are not readable until the relationship is swapped. You receive an
error message from C-SPOC, as shown in the note after Example 13-40. However, the lazy
update procedure at the time of failover pulls in the remaining volume group information.

Example 13-40 New disk added to volume group on all nodes


root@jordan: lspv |grep txvg
hdisk6 000a625afe2a4958 txvg
hdisk10 000a624a833e440f txvg
hdisk11 000a624a987825c8 txvg

406 IBM PowerHA SystemMirror 7.1 for AIX


root@leeann: lspv |grep txvg
hdisk6 000a625afe2a4958 txvg
hdisk10 000a624a833e440f txvg
hdisk11 000a624a987825c8 txvg

root@robert: lspv
hdisk2 000a624a833e440f txvg
hdisk6 000a625afe2a4958 txvg
hdisk10 000a624a987825c8 none

Attention: When using C-SPOC to modify a volume group containing a Global Mirror
replicated resource, you can expect to see the following error message:
cl_extendvg: Error executing clupdatevg txvg 000a624a833e440f on node robert

You do not need to synchronize the cluster because all of these changes are made to an
existing volume group. However, consider running a verification.

Adding a new logical volume


Again you use C-SPOC to add a new logical volume. As noted earlier, this process updates
the local nodes within the site. For the remote site, when a failover occurs, the lazy update
process updates the volume group information as needed. This process also adds a bit of
extra time to the failover time.

To add a new logical volume, follow these steps:


1. From the command line, type the smitty cl_admin command.
2. In SMIT, select the path System Management (C-SPOC)  Storage  Logical
Volumes  Add a Logical Volume.
3. Select the txvg volume group from the pop-up menu.

Chapter 13. Disaster recovery using DS8700 Global Mirror 407


4. Select the newly added disk hdisk11 as shown in Figure 13-19.

Logical Volumes

Move cursor to desired item and press Enter.

List All Logical Volumes by Volume Group


Add a Logical Volume
Show Characteristics of a Logical Volume
Set Characteristics of a Logical Volume
+--------------------------------------------------------------------------+
| Physical Volume Names |
| |
| Move cursor to desired item and press F7. |
| ONE OR MORE items can be selected. |
| Press Enter AFTER making all selections. |
| |
| Auto-select |
| jordan hdisk6 |
| jordan hdisk10 |
| jordan hdisk11 |
| |
| F1=Help F2=Refresh F3=Cancel |
| F7=Select F8=Image F10=Exit |
F1| Enter=Do /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 13-19 Choose disk for new logical volume creation

408 IBM PowerHA SystemMirror 7.1 for AIX


5. Complete the information in the final menu (Figure 13-20), and press Enter.
We added a new logical volume, named pattilv, which consists of 100 logical partitions
(LPARs) and selected raw for the type. We left all other values with their defaults.

Add a Logical Volume

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[TOP] [Entry Fields]


Resource Group Name ds8kgmrg
VOLUME GROUP name txvg
Node List jordan,leeann,robert
Reference node jordan
* Number of LOGICAL PARTITIONS [100] #
PHYSICAL VOLUME names hdisk11
Logical volume NAME [pattilv]
Logical volume TYPE [raw] +
POSITION on physical volume outer_middle +
RANGE of physical volumes minimum +
MAXIMUM NUMBER of PHYSICAL VOLUMES [] #
to use for allocation
Number of COPIES of each logical 1 +
[MORE...15]
Figure 13-20 New logical volume C-SPOC SMIT menu

6. Upon completion of the C-SPOC operation, verify that the new logical volume is created
locally on node jordan as shown in Example 13-41.

Example 13-41 Newly created logical volume


root@jordan:lsvg -l txvg
txvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
txlv jfs2 250 150 3 open/syncd /txro
txloglv jfs2log 1 1 1 open/syncd N/A
pattilv raw 100 100 1 closed/syncd N/A

Similar to when you create the volume group, you see an error message (Figure 13-21) about
being unable to update the remote node.

COMMAND STATUS

Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below.

jordan: pattilv
cl_mklv: Error executing clupdatevg txvg 000a625afe2a4958 on node robert
Figure 13-21 C-SPOC normal error upon logical volume creation

Chapter 13. Disaster recovery using DS8700 Global Mirror 409


Increasing the size of an existing file system
Again you use C-SPOC to perform this operation. As noted previously, this process updates
the local nodes within the site. For the remote site, when a failover occurs, the lazy update
process updates the volume group information as needed. This process also adds a bit of
extra time to the failover time.

To increase the size of an existing file system, follow these steps:


1. From the command line, type the smitty cl_admin command.
2. In SMIT, select the path System Management (C-SPOC)  Storage  File Systems 
Change / Show Characteristics of a File System.
3. Select the txro file system from the pop-up menu.
4. Complete the information in the final menu, and press Enter. In the example in
Figure 13-22, notice that we change the size from 1024 MB to 1250 MB.

Change/Show Characteristics of a Enhanced Journaled File System

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[TOP] [Entry Fields]


Volume group name txvg
Resource Group Name ds8kgmrg
* Node Names robert,leeann,jordan

* File system name /txro


NEW mount point [/txro] /
SIZE of file system
Unit Size Megabytes +
Number of Units [1250] #
Mount GROUP []
Mount AUTOMATICALLY at system restart? no +
PERMISSIONS read/write +
Mount OPTIONS [] +
[MORE...7]
Figure 13-22 Changing the file system size on the final C-SPOC menu

5. Upon completion of the C-SPOC operation, verify that the new file system size locally on
node jordan has increased from 250 LPAR as shown in Example 13-41 on page 409 to
313 LPAR as shown Example 13-42.

Example 13-42 Newly increased file system size


root@jordan:lsvg -l txvg
txvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
txlv jfs2 313 313 3 open/syncd /txro
txloglv jfs2log 1 1 1 open/syncd N/A
pattilv raw 100 100 1 closed/syncd N/A

A cluster synchronization is not required, because technically the resources have not
changed. All of the changes were made to an existing volume group that is already a resource
in the resource group.

410 IBM PowerHA SystemMirror 7.1 for AIX


Testing the fallover after making the LVM changes
Because you do not know if the cluster is going to work when you need it, repeat the steps
from 13.7.2, “Rolling site failure” on page 398. The new logical volume pattilv and additional
space on /txro show up on each node. However, a noticeable difference is on the site failover
when the lazy update is performed to update the volume group changes.

13.8.2 Adding a Global Mirror pair into a new volume group


The steps to add a new volume begin the same as the steps in 13.5, “Configuring AIX volume
groups” on page 381. However, for completeness, this section provides an overview of the
steps again and then provide details about the new LUNs to be used.

In this scenario, we re-use the LUNs from the previous section. We removed them from the
volume group and removed the disks for all nodes except the main primary node jordan. In
our process, we cleared the PVID and then assigned a new PVID for a clean start.

Table 13-3 provides a summary of the LUNs that we implemented in each site.

Table 13-3 Summary of the LUNs implemented in each site


Texas Romania

AIX dISK LSS/VOL ID AIX dISK LSS/VOL ID

hdisk11 2605 hdisk10 2C06

Now continue with the following steps, which are the same as those steps for defining new
LUNs:
1. Run the cfgmgr command on the primary node jordan.
2. Assign the PVID on the node jordan:
chdev -l hdisk11 -a pv=yes
3. Configure the disk and PVID on the local node leeann by using the cfgmgr command.
4. Verify that PVID shows up by using the lspv command.
5. Pause the PPRC on the primary site.
6. Fail over the PPRC to the secondary site.
7. Fail back the PPRC to the secondary site.
8. Configure the disk and PVID on the remote node robert by using the cfgmgr command.
9. Verify that PVID shows up by using the lspv command.
10.Pause the PPRC on the secondary site.
11.Fail over the PPRC to the primary site.
12.Fail back the PPRC to the primary site.

The main difference between adding a new volume group and extending an existing one is
that, when adding a new volume group, you must swap the pairs twice. When extending an
existing volume group, you can get away with only swapping once.

The main difference between adding a new volume group and extending an existing one is
similar to the original setup where we created all LVM components on the primary site and
swap the PPRC pairs to the remote site to import the volume group and then swap it back.

You can avoid performing two swaps, as we showed, by not choosing to include the third node
when creating the volume group. Then you can swap the pairs, run cfgmgr on the new disk
with the PVID, import the volume group, and swap the pairs back.

Chapter 13. Disaster recovery using DS8700 Global Mirror 411


Creating a volume group
Create a volume group by using C-SPOC:
1. From the command line, type the smitty cl_admin command.
2. In SMIT, select the path System Management (C-SPOC)  Storage  Volume
Groups  Create a Volume to a Volume Group.
3. Select the specific nodes. In this case, we chose all three nodes as shown in Figure 13-23.

Volume Groups

Move cursor to desired item and press Enter.

List All Volume Groups


Create a Volume Group
Create a Volume Group with Data Path Devices

+--------------------------------------------------------------------------+
| Node Names |
| |
| Move cursor to desired item and press F7. |
| ONE OR MORE items can be selected. |
| Press Enter AFTER making all selections. |
| |
| > jordan |
| > leeann |
| > robert |
| |
| |
| F1=Help F2=Refresh F3=Cancel |
| F7=Select F8=Image F10=Exit |
F1| Enter=Do /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 13-23 Adding a volume group node pick list

412 IBM PowerHA SystemMirror 7.1 for AIX


4. Select the disk or disks by PVID as shown in Figure 13-24.

Volume Groups

Move cursor to desired item and press Enter.

List All Volume Groups


Create a Volume Group
Create a Volume Group with Data Path Devices

Set Characteristics of a Volume Group


Enable a Volume Group for Fast Disk Takeover or Concurrent Access
+--------------------------------------------------------------------------+
| Physical Volume Names |
| |
| Move cursor to desired item and press F7. |
| ONE OR MORE items can be selected. |
| Press Enter AFTER making all selections. |
| |
| 000a624a9bb74ac3 ( hdisk10 on node robert ) |
| 000a624a9bb74ac3 ( hdisk11 on nodes jordan,leeann ) |
| |
| F1=Help F2=Refresh F3=Cancel |
| F7=Select F8=Image F10=Exit |
F1| Enter=Do /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 13-24 Selecting the disk or disks for the new volume group pick list

Chapter 13. Disaster recovery using DS8700 Global Mirror 413


5. Select the volume group type. In this scenario, we select scalable as shown in
Figure 13-25.

Volume Groups

Move cursor to desired item and press Enter.

List All Volume Groups


Create a Volume Group
Create a Volume Group with Data Path Devices

Set Characteristics of a Volume Group


+--------------------------------------------------------------------------+
| Volume Group Type |
| |
| Move cursor to desired item and press Enter. |
| |
| Legacy |
| Original |
| Big |
| Scalable |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 13-25 Choosing the volume group type for the new volume group pick list

6. Select the proper resource group. We select ds8kgmrg as shown in Figure 13-26.

Create a Scalable Volume Group

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[TOP] [Entry Fields]


Node Names jordan,leeann,robert
Resource Group Name [ds8kgmrg] +
PVID 000a624a9bb74ac3
VOLUME GROUP name [princessvg]
Physical partition SIZE in megabytes 4 +
Volume group MAJOR NUMBER [51] #
Enable Cross-Site LVM Mirroring Verification false +
Enable Fast Disk Takeover or Concurrent Access Fast Disk Takeover or> +
Volume Group Type Scalable
Maximum Physical Partitions in units of 1024 32 +
Maximum Number of Logical Volumes 256 +
Figure 13-26 Create a Scalable Volume Group (final) menu

7. Select a volume group name. We select princessvg. Then press Enter.

414 IBM PowerHA SystemMirror 7.1 for AIX


Instead of using C-SPOC, you can perform the steps manually and then import the volume
groups on each node as needed. However, remember to add the volume group into the
resource group after creating it. With C-SPOC, you can automatically add it to the resource
group while you are creating the volume group.

You can also use the C-SPOC CLI commands (Example 13-43). These commands are in the
/usr/es/sbin/cluster/cspoc directory, and all begin with the cli_ prefix. Similar to the SMIT
menus, their operation output is also saved in the cspoc.log file.

Example 13-43 C-SPOC CLI commands


root@jordan: ls cli_*
cli_assign_pvids cli_extendlv cli_mkvg cli_rmlv
cli_chfs cli_extendvg cli_on_cluster cli_rmlvcopy
cli_chlv cli_importvg cli_on_node cli_syncvg
cli_chvg cli_mirrorvg cli_reducevg cli_unmirrorvg
cli_crfs cli_mklv cli_replacepv cli_updatevg
cli_crlvfs cli_mklvcopy cli_rmfs

Upon completion of the C-SPOC operation, the local nodes are updated, but the remote node
is not as shown in Example 13-44. The remote nodes are not updated because the target
volumes are not readable until the relationship is swapped. You see an error message from
C-SPOC as shown in the note following Example 13-44. After you create all LVM structures,
you swap the pairs back to the remote node and import the new volume group and logical
volume.

Example 13-44 New disk added to volume group on all nodes


root@jordan: lspv |grep princessvg
hdisk11 000a624a9bb74ac3 princessvg

root@leeann: lspv |grep princessvg


hdisk11 000a624a9bb74ac3 princessvg

root@robert: lspv |grep princessvg

Attention: When using C-SPOC to add a new volume group that contains a Global Mirror
replicated resource, you might see the following error message:
cl_importvg: Error executing climportvg -V 51 -c -y princessvg -Q
000a624a9bb74ac3 on node robert

While this message is normal, if you select any remote nodes, you can omit the remote
nodes and then you do not see the error message. This step is allowed because you
manually import it anyway.

When creating the volume group, it usually is automatically added to the resource group as
shown in Example 13-45 on page 416. However, with the error message indicted in the
previous attention box, it might not be automatically added. Therefore, double check that the
volume group is added into the resource group before continuing. Otherwise we do not have
to change the resource group any further. The new LUN pairs are added to the same storage
subsystems and the same session (3) that is already defined in the mirror group texasmg.

Chapter 13. Disaster recovery using DS8700 Global Mirror 415


Example 13-45 New volume group added to existing resource group
Resource Group Name ds8kgmrg
Inter-site Management Policy Prefer Primary Site
Participating Nodes from Primary Site jordan leeann
Participating Nodes from Secondary Site robert
Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node
Fallback Policy Never Fallback
Service IP Label serviceip_2
Volume Groups txvg princessvg +
GENXD Replicated Resources texasmg

Adding a new logical volume on the new volume group


You repeat the steps in “Adding a new logical volume” on page 407 to create a new logical
volume, named princesslv, on the newly created volume group, princessvg, as shown in
Example 13-46.

Example 13-46 New logical volume on the newly added volume group
root@jordan: lsvg -l princessvg
princessvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
princesslv raw 38 38 1 closed/syncd N/A

Importing the new volume group to the remote site


To import the volume group, follow the steps in 13.5.2, “Importing the volume groups in the
remote site” on page 383. As a review, we perform the following steps:
1. Vary off the volume group on the local site.
2. Pause the PPRC pairs on the local site.
3. Fail over the PPRC pairs on the remote site.
4. Fail back the PPRC pairs on the remote site.
5. Import the volume group.
6. Vary off the volume group on the remote site.
7. Pause the PPRC pairs on the remote site.
8. Fail over the PPRC pairs on the local site.
9. Fail back the PPRC pairs on the local site.

Synchronizing and verifying the cluster configuration


You now synchronize the resource group change to include the new volume group that was
added. However, first run a verification only to check for errors. If you find errors, you must fix
them manually because they are not automatically fixed in a running environment.

Then synchronize and verify it:


1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Verification and
Synchronization and Verification.

416 IBM PowerHA SystemMirror 7.1 for AIX


3. Select the options as shown in Figure 13-27.

HACMP Verification and Synchronization (Active Cluster Nodes Exist)

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Verify changes only? [No] +
* Logging [Standard] +

F1=Help F2=Refresh F3=Cancel F4=List


F5=Reset F6=Command F7=Edit F8=Image
Figure 13-27 Extended Verification and Synchronization SMIT menu

4. Verify that the information is correct, and press Enter.

Upon completion, the cluster configuration is synchronize and can now be tested.

Testing the failover after adding a new volume group


Because you do not know if the cluster is going to work when needed, repeat the steps from
13.7.2, “Rolling site failure” on page 398. The new volume group princessvg and logical
volume princesslv are showing up in each node.

Chapter 13. Disaster recovery using DS8700 Global Mirror 417


418 IBM PowerHA SystemMirror 7.1 for AIX
14

Chapter 14. Disaster recovery using Hitachi


TrueCopy and Universal
Replicator
This chapter explains how to configure disaster recovery based on IBM PowerHA
SystemMirror for AIX Enterprise Edition using Hitachi TrueCopy/Hitachi Universal Replicator
(HUR) replication services. This support is added in version 6.1 with service pack 3 (SP3).

This chapter includes the following topics:


 Planning for TrueCopy/HUR management
 Overview of TrueCopy/HUR management
 Scenario description
 Configuring the TrueCopy/HUR resources
 Failover testing
 LVM administration of TrueCopy/HUR replicated pairs

© Copyright IBM Corp. 2011. All rights reserved. 419


14.1 Planning for TrueCopy/HUR management
Proper planning is crucial to the success of any implementation. Plan the storage deployment
and replication necessary for your environment. This process is related to the applications
and middleware that are being deployed in the environment, which can eventually be
managed by PowerHA SystemMirror Enterprise Edition. This topic lightly covers site, network,
storage area network (SAN), and storage planning, which are all key factors. However, the
primary focus of this topic is the software prerequisites and support considerations.

14.1.1 Software prerequisites


The following software is required:
 One of the following AIX levels or later:
– AIX 5.3 TL9 and RSCT 2.4.12.0
– AIX 6.1 TL2 SP3 and RSCT 2.5.4.0
 Multipathing software
– AIX MPIO
– Hitachi Dynamic Link Manager (HDLM)
 PowerHA 6.1 Enterprise Edition with SP3
The following additional file sets are included in SP3, must be installed separately, and
require the acceptance of the license during the installation:
– cluster.es.tc
6.1.0.0 ES HACMP - Hitachi support - Runtime Commands
6.1.0.0 ES HACMP - Hitachi support Commands
– cluster.msg.en_US.tc (optional)
6.1.0.0 HACMP Hitachi support Messages - U.S. English
6.1.0.0 HACMP Hitachi Messages - U.S. English IBM-850
6.1.0.0 HACMP Hitachi Messages – Japanese
6.1.0.0 HACMP Hitachi Messages - Japanese IBM-eucJP
 Hitachi Command Control Interface (CCI) Version 01-23-03/06 or later
 USPV Microcode Level 60-06-05/00 or later

14.1.2 Minimum connectivity requirements for TrueCopy/HUR


For TrueCopy/HUR connectivity, you must have the following minimum requirements in place:
 Ensure connectivity from the local Universal Storage Platform VM (USP VM) to the AIX
host ports.
The external storage ports on the local USP VMs (Data Center 1 and Data Center 2) are
zoned and cabled to their corresponding existing storage systems.
 Present both the primary and secondary source devices to the local USP VMs.
Primary and secondary source volumes in the migration group are presented from the
existing storage systems to the corresponding local USP VMs. This step is transparent to
the servers in the migration set. No devices are imported or accessed by the local USP
VMs at this stage.

420 IBM PowerHA SystemMirror 7.1 for AIX


 Establish replication connectivity between the target storage systems.
TrueCopy initiator and MCU target ports are configured on the pair of target USP VMs,
and an MCU/RCU pairing is established to validate the configuration.
 Ensure replication connectivity from the local USP VMs to the remote USP VM
TrueCopy/HUR initiator. Also ensure that MCU target ports are configured on the local and
remote USP VMs. In addition, confirm that MCU and RCU pairing is established to
validate the configuration.
 For HUR, configure Universal Replicator Journal Groups on local and remote USP VM
storage systems.
 Configure the target devices.
Logical devices on the target USP VM devices are formatted and presented to front-end
ports or host storage domains. This way, device sizes, logical unit numbers, host modes,
and presentation worldwide names (WWNs) are identical on the source and target storage
systems. Devices are presented to host storage domains that correspond to both
production and disaster recovery standby servers.
 Configure the target zoning.
Zones are defined between servers in the migration group and the target storage system
front-end ports, but new zones are not activated at this point.

Ideally the connectivity is through redundant links, switches, and fabrics to the hosts and
between the storage units themselves.

14.1.3 Considerations
Keep in mind the following considerations for mirroring PowerHA SystemMirror Enterprise
Edition with TrueCopy/HUR:
 AIX Virtual SCSI is not supported in this initial release.
 Logical Unit Size Expansion (LUSE) for Hitachi is not supported.
 Only fence-level NEVER is supported for synchronous mirroring.
 Only HUR is supported for asynchronous mirroring.
 The dev_name must map to a logical devices, and the dev_group must be defined in the
HORCM_LDEV section of the horcm.conf file.
 The PowerHA SystemMirror Enterprise Edition TrueCopy/HUR solution uses dev_group
for any basic operation, such as the pairresync, pairevtwait, or horctakeover operation.
If several dev_names are in a dev_group, the dev_group must be enabled for consistency.
 PowerHA SystemMirror Enterprise Edition does not trap Simple Network Management
Protocol (SNMP) notification events for TrueCopy/HUR storage. If a TrueCopy link goes
down when the cluster is up and later the link is repaired, you must manually
resynchronize the pairs.
 The creation of pairs is done outside the cluster control. You must create the pairs before
you start the cluster services.
 Resource groups that are managed by PowerHA SystemMirror Enterprise Edition cannot
contain volume groups with both TrueCopy/HUR-protected and
non-TrueCopy/HUR-protected disks.
 All nodes in the PowerHA SystemMirror Enterprise Edition cluster must use same horcm
instance.

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 421
 You cannot use Cluster Single Point Of Control (C-SPOC) for the following Logical Volume
Manager (LVM) operations to configure nodes at the remote site that contain the target
volume:
– Creating a volume group
– Operations that require nodes at the target site to write to the target volumes
For example, changing the file system size, changing the mount point, or adding LVM
mirrors cause an error message in C-SPOC. However, nodes on the same site as the
source volumes can successfully perform these tasks. The changes are then
propagated to the other site by using a lazy update.

C-SPOC on other LVM operations: For C-SPOC operations to work on all other LVM
operations, perform all C-SPOC operations when the cluster is active on all PowerHA
SystemMirror Enterprise Edition nodes and the underlying TrueCopy/HUR PAIRs are in
a PAIR state.

14.2 Overview of TrueCopy/HUR management


Hitachi TrueCopy/HUR storage management uses Command Control Interface (CCI)
operations from the AIX operating system and PowerHA SystemMirror Enterprise Edition
environment. PowerHA SystemMirror Enterprise Edition uses these interfaces to discover and
integrate the Hitachi Storage replicated storage into the framework of PowerHA SystemMirror
Enterprise Edition. With this integration, you can manage high availability disaster recovery
(HADR) for applications by using the mirrored storage.

Integration of TrueCopy/HUR and PowerHA SystemMirror Enterprise Edition provides the


following benefits:
 Support for the Inter-site Management policy of Prefer Primary Site or Online on Either
Site
 Flexible user-customizable resource group policies
 Support for cluster verification and synchronization
 Limited support for the C-SPOC in PowerHA SystemMirror Enterprise Edition
 Automatic failover and re-integration of server nodes attached to pairs of TrueCopy/HUR
disk subsystem within sites and across sites
 Automatic management for TrueCopy/HUR links
 Management for switching the direction of the TrueCopy/HUR relationships when a site
failure occurs. With this process, the backup site can take control of the managed
resource groups in PowerHA SystemMirror Enterprise Edition from the primary site

14.2.1 Installing the Hitachi CCI software


Use the following steps as a guideline to help you install the Hitachi CCI on the AIX cluster
nodes. You can also find this information in the /usr/sbin/cluster/release_notes_xd file.
However, the release notes only exist if you already have the PowerHA SystemMirror
Enterprise Edition software installed. Always consult the latest version of the Hitachi
Command Control Interface (CCI) User and Reference Guide, MK-90RD011, which you can
download from:
http://communities.vmware.com/servlet/JiveServlet/download/1183307-19474

422 IBM PowerHA SystemMirror 7.1 for AIX


If you are installing CCI from a CD, use the RMinstsh and RMuninst scripts on the CD to
automatically install and uninstall the CCI software.

Important: You must install the Hitachi CCI software into the /HORCM/usr/bin directory.
Otherwise, you must create a symbolic link to this directory.

For other media, use the instructions in the following sections.

Installing the Hitachi CCI software into a root directory


To install the Hitachi CCI software into the root directory, follow these steps:
1. Insert the installation medium into the proper I/O device.
2. Move to the current root directory:
# cd /
3. Copy all files from the installation medium by using the cpio command:
# cpio -idmu < /dev/XXXX XXXX = I/O device
Preserve the directory structure (d flag) and file modification times (m flag), and copy
unconditionally (u flag). For diskettes, load them sequentially, and repeat the command.
An I/O device name of “floppy disk” designates a surface partition of the raw device file
(unpartitioned raw device file).
4. Execute the Hitachi Open Remote Copy Manager (HORCM) installation command:
# /HORCM/horcminstall.sh
5. Verify installation of the proper version by using the raidqry command:
# raidqry -h
Model: RAID-Manager/AIX
Ver&Rev: 01-23-03/06
Usage: raidqry [options] for HORC

Installing the Hitachi CCI software into a non-root directory


To install the Hitachi CCI software into a non-root directory, follow these steps:
1. Insert the installation medium, such as a CD, into the proper I/O device.
2. Move to the desired directory for CCI. The specified directory must be mounted by a
partition of except root disk or an external disk.
# cd /Specified Directory
3. Copy all files from the installation medium by using the cpio command:
# cpio -idmu < /dev/XXXX XXXX = I/O device
Preserve the directory structure (d flag) and file modification times (m flag), and copy
unconditionally (u flag). For diskettes, load them sequentially, and repeat the command.
An I/O device name of “floppy disk” designates a surface partition of the raw device file
(unpartitioned raw device file).
4. Make a symbolic link to the /HORCM directory:
# ln -s /Specified Directory/HORCM /HORCM
5. Run the HORCM installation command:
# /HORCM/horcminstall.sh

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 423
6. Verify installation of the proper version by using the raidqry command:
# raidqry -h
Model: RAID-Manager/AIX
Ver&Rev: 01-23-03/06
Usage: raidqry [options] for HORC

Installing a newer version of the Hitachi CCI software


To install a newer version of the CCI software:
1. Confirm that HORCM is not running. If it is running, shut it down:
One CCI instance: # horcmshutdown.sh
Two CCI instances: # horcmshutdown.sh 0 1
If Hitachi TrueCopy commands are running in the interactive mode, terminate the
interactive mode and exit these commands by using the -q option.
2. Insert the installation medium, such as a CD, into the proper I/O device.
3. Move to the directory that contains the HORCM directory as in the following example for
the root directory:
# cd /
4. Copy all files from the installation medium by using the cpio command:
# cpio -idmu < /dev/XXXX XXXX = I/O device
Preserve the directory structure (d flag) and file modification times (m flag) and copy
unconditionally (u flag). For diskettes, load them sequentially, and repeat the command.
An I/O device name of “floppy disk” designates a surface partition of the raw device file
(unpartitioned raw device file).
5. Execute the HORCM installation command:
# /HORCM/horcminstall.sh
6. Verify installation of the proper version by using the raidqry command:
# raidqry -h
Model: RAID-Manager/AIX
Ver&Rev: 01-23-03/06
Usage: raidqry [options] for HORC

14.2.2 Overview of the CCI instance


The CCI components on the storage system include the command device or devices and the
Hitachi TrueCopy volumes, ShadowImage volumes, or both. Each CCI instance on a
UNIX/PC server includes the following components:
 HORCM:
– Log and trace files
– A command server
– Error monitoring and event reporting files
– A configuration management feature
 Configuration definition file that is defined by the user
 The Hitachi TrueCopy user execution environment, ShadowImage user execution
environment, or both, which contain the TrueCopy/ShadowImage commands, a command
log, and a monitoring function.

424 IBM PowerHA SystemMirror 7.1 for AIX


14.2.3 Creating and editing the horcm.conf files
The configuration definition file is a text file that is created and edited by using any standard
text editor, such as the vi editor. A sample configuration definition file, HORCM_CONF
(/HORCM/etc/horcm.conf), is included with the CCI software. Use this file as the basis for
creating your configuration definition files. The system administrator must copy the sample
file, set the necessary parameters in the copied file, and place the copied file in the proper
directory. For detailed descriptions of the configuration definition files for sample CCI
configurations, see the Hitachi Command Control Interface (CCI) User and Reference Guide,
MK-90RD011, which you can download from:
http://communities.vmware.com/servlet/JiveServlet/download/1183307-19474

Important: Do not edit the configuration definition file while HORCM is running. Shut down
HORCM, edit the configuration file as needed, and then restart HORCM.

You might have multiple CCI instances, each of which uses its own specific horcm#.conf file.
For example, instance 0 might be horcm0.conf, instance 1 (Example 14-1) might be
horcm1.conf, and so on. The test scenario presented later in this chapter uses instance 2 and
provides examples of the horcm2.conf file on each cluster node.

Example 14-1 The hormc.conf file

Example configuration files:

horcm1.conf file on local node


------------------------------
HORCM_MON
#ip_address => Address of the local node
#ip_address service poll(10ms) timeout(10ms)
10.15.11.194 horcm1 12000 3000

HORCM_CMD
#dev_name => hdisk of Command Device
#UnitID 0 (Serial# eg. 45306)
/dev/hdisk19

HORCM_DEV
#Map dev_grp to LDEV#
#dev_group dev_name port# TargetID LU# MU#
VG01 test01 CL1-B 1 5 0
VG01 work01 CL1-B 1 24 0
VG01 work02 CL1-B 1 25 0

HORCM_INST
#dev_group ip_address service
VG01 10.15.11.195 horcm1

horcm1.conf file on remote node


-------------------------------
HORCM_MON
#ip_address => Address of the local node
#ip_address service poll(10ms) timeout(10ms)
10.15.11.195 horcm1 12000 3000

HORCM_CMD

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 425
#dev_name => hdisk of Command Device
#UnitID 0 (Serial# eg. 45306)
/dev/hdisk19

HORCM_DEV
#Map dev_grp to LDEV#
#dev_group dev_name port# TargetID LU# MU#
VG01 test01 CL1-B 1 5 0
VG01 work01 CL1-B 1 21 0
VG01 work02 CL1-B 1 22 0

HORCM_INST
#dev_group ip_address service
VG01 10.15.11.194 horcm1

NOTE 1: For the horcm instance to use any available command device, in case one of
them fails, it is RECOMMENDED that, in your horcm file, under HORCM_CMD
section, the command device, is presented in the format below,
where 10133 is the serial # of the array:

\\.\CMD-10133:/dev/hdisk/

For example:

\\.\CMD-10133:/dev/rhdisk19 /dev/rhdisk20 ( note space in between).

NOTE 2: The Device_File will show "-----" for the "pairdisplay -fd" command,
which will also cause verification to fail, if the ShadowImage license
has not been activated on the storage system and the MU# column is not
empty.
It is therefore recommended that the MU# column be left blank if the
ShadowImage license is NOT activated on the storage system.

Starting the HORCM instances


To start one instance of the CCI, follow these steps:
1. Modify the /etc/services file to register the port name/number (service) of the
configuration definition file. Make the port name/number the same on all servers.
horcm xxxxx/udp xxxxx = the port name/number of horcm.conf
2. Optional: If you want HORCM to start automatically each time the system starts, add
/etc/horcmstart.sh to the system automatic startup file (such as the /sbin/rc file).
3. Run the horcmstart.sh script manually to start the CCI instance:
# horcmstart.sh
4. Set the log directory (HORCC_LOG) in the command execution environment as needed.
5. Optional: If you want to perform Hitachi TrueCopy operations, do not set the HORCC_MRCF
environment variable.
– For the B shell:
# HORCC_MRCF=1
# export HORCC_MRCF

426 IBM PowerHA SystemMirror 7.1 for AIX


– For the C shell:
# setenv HORCC_MRCF 1
# pairdisplay -g xxxx xxxx = group name

To start two instances of the CCI, follow these steps:


1. Modify the /etc/services file to register the port name/number (service) of each
configuration definition file. The port name/number must be different for each CCI
instance.
horcm0 xxxxx/udp xxxxx = the port name/number for horcm0.conf
horcm1 yyyyy/udp yyyyy = the port name/number for horcm1.conf
2. If you want HORCM to start automatically each time the system starts, add
/etc/horcmstart.sh 0 1 to the system automatic startup file (such as the /sbin/rc file).
3. Run the horcmstart.sh script manually to start the CCI instances:
# horcmstart.sh 0 1
4. Set an instance number to the environment that executes a command:
For the B shell:
# HORCMINST=X X = instance number = 0 or 1
# export HORCMINST
For the C shell:
# setenv HORCMINST X
5. Set the log directory (HORCC_LOG) in the command execution environment as needed.
6. If you want to perform Hitachi TrueCopy operations, do not set the HORCC_MRCF
environment variable.
For B shell:
# HORCC_MRCF=1
# export HORCC_MRCF
For C shell:
# setenv HORCC_MRCF 1
# pairdisplay -g xxxx xxxx = group name

14.3 Scenario description


This scenario uses four nodes, two in each of the two sites: Austin and Miami. Nodes
jessica and bina are in the Austin site, and nodes krod and maddi are in the Miami site. Each
site provides local automatic failover, along with remote recovery for the other site, which is
often referred to as a mutual takeover configuration. Figure 14-1 on page 428 provides a
software and hardware overview of the tested configuration between the two sites.

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 427
Austin Miami

AIX 6.1 TL6 AIX 6.1 TL6


PowerHA PowerHA
6.1 SP3 6.1 SP3
CCI01-23-03/06 CCI01-23-03/06

Jessica Bina Krod Maddi

USPV Microcode USPV Microcode


60-06-05/00 60-06-16/00

FC Links

USP-V Ser#45306 USP-VM Ser#35754


hdisc38 hdisc39 hdisc38 hdisc39

truesyncvg TrueCopy sync truesyncvg

hdisc40 hdisc41 hdisc40 hdisc41

ursasyncvg URS async


ursasyncvg

Figure 14-1 Hitachi replication lab environment test configuration1

Each site consists of two type Ethernet networks. In this case, both networks are used for a
public Ethernet and for cross-site networks. Usually the cross-site network is on separate
segments and is an XD_ip network. It is also common to use site-specific service IP labels.
Example 14-2 shows the interlace list from the cluster topology.

Example 14-2 Test topology information


root@jessica: llsif
Adapter Type Network Net Type Attribute Node IP Address
jessica boot net_ether_02 ether public jessica 9.3.207.24
jessicaalt boot net_ether_03 ether public jessica 207.24.1.1
service_1 service net_ether_03 ether public jessica 1.2.3.4
service_2 service net_ether_03 ether public jessica 1.2.3.5
bina boot net_ether_02 ether public bina 9.3.207.77
bina alt boot net_ether_03 ether public bina 207.24.1.2
service_1 service net_ether_03 ether public bina 1.2.3.4
service_2 service net_ether_03 ether public bina 1.2.3.5
krod boot net_ether_02 ether public krod 9.3.207.79
krod alt boot net_ether_03 ether public krod 207.24.1.3
service_1 service net_ether_03 ether public krod 1.2.3.4
service_2 service net_ether_03 ether public krod 1.2.3.5
maddi boot net_ether_02 ether public maddi 9.3.207.78
maddi alt boot net_ether_03 ether public maddi 207.24.1.4
service_1 service net_ether_03 ether public maddi 1.2.3.4
service_2 service net_ether_03 ether public maddi 1.2.3.5

1 Courtesy of Hitachi Data Systems

428 IBM PowerHA SystemMirror 7.1 for AIX


In this scenario, each node or site has four unique disks defined through each of the two
separate Hitachi storage units. The jessica and bina nodes at the Austin site have two disks,
hdisk38 and hdisk3. These disks are the primary source volumes that use TrueCopy
synchronous replication for the truesyncvg volume group. The other two disks, hdisk40 and
hdisk41, are to be used as the target secondary volumes that use HUR for asynchronous
replication from the Miami site for the ursasyncvg volume group.

The krod and bina nodes at the Miami site have two disks, hdisk38 and hdisk39. These disks
are the secondary target volumes for the TrueCopy synchronous replication of the truesyncvg
volume group from the Austin site. The other two disks, hdisk40 and hdisk41, are to be used
as the primary source volumes for the ursasyncvg volume group that uses HUR for
asynchronous replication.

14.4 Configuring the TrueCopy/HUR resources


This topic explains how to perform the following tasks to configure the resources for
TrueCopy/HUR:
 Assigning LUNs to the hosts (host groups)
 Creating replicated pairs
 Configuring an AIX disk and dev_group association

For each of these tasks, the Hitachi storage units have been added to the SAN fabric and
zoned appropriately. Also, the host groups have been created for the appropriate node
adapters, and the LUNs have been created within the storage unit.

14.4.1 Assigning LUNs to the hosts (host groups)


In this task, you assign LUNs by using the Hitachi Storage Navigator. Although an overview of
the steps is provided, always refer to the official Hitachi documentation for your version as
needed.

To begin, the Hitachi USP-V storage unit is at the Austin site. The host group, JessBina, is
assigned to port CL1-E on the Hitachi storage unit with the serial number 45306. Usually the
host group is assigned to multiple ports for full multipath redundancy.

To assign the LUNs to the hosts, follow these steps:


1. Locate the free LUNs and assign them to the proper host group.
a. Verify whether LUNs are currently assigned by checking the number of paths
associated with the LUN. If the fields are blank, the LUN is currently unassigned.
b. Assign the LUNs. To assign one LUN, click and drag it to a free LUN/LDEV location. To
assign multiple LUNs, hold down the Shift key and click each LUN. Then right-click the
selected LUNs and drag them to a free location.
This free location is indicated by a black and white disk image that also contains no
information in the corresponding attribute columns of LDEV/UUID/Emulation as shown
in Figure 14-2 on page 430.

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 429
Figure 14-2 Assigning LUNs to the Austin site nodes2

2. In the path verification window (Figure 14-3), check the information and record the LUN
number and LDEV numbers. You use this information later. However, you can also retrieve
this information from the AIX system after the devices are configured by the host. Click
OK.

Figure 14-3 Checking the paths for the Austin LUNs3


2
Courtesy of Hitachi Data Systems
3 Courtesy of Hitachi Data Systems

430 IBM PowerHA SystemMirror 7.1 for AIX


3. Back on the LUN Manager tab (Figure 14-4), click Apply for these paths to become active
and the assignment to be completed.

Figure 14-4 Applying LUN assignments for Austin4

You have completed assigning four more LUNs for the nodes at the Austin site. However the
lab environment already had several LUNs, including both command and journaling LUNs in
the cluster nodes. These LUNs were added solely for this test scenario.

Important: If these LUNs are the first ones to be allocated to the hosts, you must also
assign the command LUNs. See the appropriate Hitachi documentation as needed.

For the storage unit at the Miami site, repeat the steps that you performed for the Austin site.
The host group, KrodMaddi, is assigned to port CL1-B on the Hitachi UPS-VM storage unit
with the serial number 35764. Usually the host group is assigned to multiple ports for full
multipath redundancy. Figure 14-5 on page 432 shows the result of these steps.

Again record both the LUN numbers and LDEV numbers so that you can easily refer to them
as needed when creating the replicated pairs. The numbers are also required when you add
the LUNs into device groups in the appropriate horcm.conf file.

4 Courtesy of Hitachi Data Systems

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 431
Figure 14-5 Miami site LUNs assigned5

14.4.2 Creating replicated pairs


PowerHA SystemMirror Enterprise Edition does not create replication pairs by using the
Hitachi interfaces. You must use the Hitachi Storage interfaces to create the same replicated
pairs before using PowerHA SystemMirror Enterprise Edition to achieve an HADR solution.
For information about setting up TrueCopy/HUR pairs, see the Hitachi Command Control
Interface (CCI) User and Reference Guide, MK-90RD011, which you can download from:
http://communities.vmware.com/servlet/JiveServlet/download/1183307-19474

You must know exactly which LUNs from each storage unit will be paired together. They must
be the same size. In this case, all of the LUNs that are used are 2 GB in size. The pairing of
LUNs also uses the LDEV numbers. The LDEV numbers are hexadecimal values that also
show up as decimal values on the AIX host.

5 Courtesy of Hitachi Data Systems

432 IBM PowerHA SystemMirror 7.1 for AIX


Table 14-1 translates the LDEV hex values of each LUN and its corresponding decimal value.

Table 14-1 LUN number to LDEV number comparison


Austin - 45306 Miami - 35764

LUN number LDEV-HEX LDEV-DEC number LUN number LDEV-HEX LDEV-DEC number

000A 00:01:10 272 001C 00:01:0C 268

000B 00:01:11 273 001D 00:01:0D 269

000C 00:01:12 274 001E 00:01:0E 271

000D 00:01:13 275 001F 00:01:0E 272

Although the pairing can be done by using the CCI, the example in this section shows how to
create the replicated pairs through the Hitachi Storage Navigator. The appropriate commands
are in the /HORCM/usr/bin directory. In this scenario, none of the devices have been
configured to the AIX cluster nodes.

Creating TrueCopy synchronous pairings


Beginning with the Austin Hitachi unit, create two synchronous TrueCopy replicated pairings.
1. From within Storage Navigator (Figure 14-6), select Go  TrueCopy  Pair Operation.

Figure 14-6 Storage Navigator menu options to perform a pair operation6

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 433
2. In the TrueCopy Pair Operation window (Figure 14-7), select the appropriate port, CL-1E,
and find the specific LUNs to use (00-00A and 00-00B).
In this scenario, we have predetermined that we want to pair these LUNs with 00-01C and
00-01D from the Miami Hitachi storage unit on port CL1-B. Notice in the occurrence of
SMPL in the Status column next to the LUNs. SMPL indicates simplex, meaning that no
mirroring is being used with that LUN.
3. Right-click the first Austin LUN (00-00A), and select Paircreate  Synchronize
(Figure 14-7).

Figure 14-7 Creating a TrueCopy synchronous pairing7

6
Courtesy of Hitachi Data Systems
7 Courtesy of Hitachi Data Systems

434 IBM PowerHA SystemMirror 7.1 for AIX


4. In the full synchronous Paircreate menu (Figure 14-8), select the proper port and LUN that
you previously created and recorded. Click Set.
Because we have only one additional remote storage unit, the RCU field already shows
the proper one for Miami.
5. Repeat step 4 for the second LUN pairing. Figure 14-8 shows details of the two pairings.

Figure 14-8 TrueCopy pairings8

8 Courtesy of Hitachi Data Systems

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 435
6. After you complete the pairing selections, on the Pair Operation tab, verify that the
information is correct and click Apply to apply them all at one time.
Figure 14-9 shows both of the source LUNs in the middle of the pane. It also shows an
overview of which remote LUNs they are to be paired with.

Figure 14-9 Applying TrueCopy pairings9

9 Courtesy of Hitachi Data Systems

436 IBM PowerHA SystemMirror 7.1 for AIX


This step automatically starts copying the LUNs from the local Austin primary source to the
remote Miami secondary source LUNs. You can also right-click a LUN and select Detailed
Information as shown in Figure 14-10.

Figure 14-10 Detailed LUN pairing and copy status information10

10 Courtesy of Hitachi Data Systems

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 437
After the copy has completed, the status is displayed as PAIR as shown in Figure 14-11. You
can also view this status from the management interface of either one of the storage units.

Figure 14-11 TrueCopy pairing and copy completed11

11 Courtesy of Hitachi Data Systems

438 IBM PowerHA SystemMirror 7.1 for AIX


Creating a Universal Replicator asynchronous pairing
Now switch over to the Miami Hitachi storage unit to create the asynchronous replicated
pairings.
1. From the Storage Navigator, select Go  Universal Replicator  Pair Operation
(Figure 14-12).

Figure 14-12 Menu selection to perform the pair operation12

12 Courtesy of Hitachi Data Systems

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 439
2. In the Universal Replicator Pair Operation window (Figure 14-13), select the appropriate
port CL-1B and find the specific LUNs that you want to use, which are 00-01E and 00-01F
in this example). We have already predetermined that we want to pair these LUNs with
00-0C and 00-00D from the Austin Hitachi storage unit on port CL1-E.
Right-click one of the desired LUNs and select Paircreate.

Figure 14-13 Selecting Paircreate in the Universal Replicator13

13 Courtesy of Hitachi Data Systems

440 IBM PowerHA SystemMirror 7.1 for AIX


3. In the full synchronous Paircreate window, complete these steps:
a. Select the proper port and LUN that you previously created and recorded.
b. Because we only have one additional remote storage unit, the RCU field already shows
the proper one for Austin.
c. Unlike when using TrueCopy synchronous replication, when using Universal
Replicator, specify a master journal volume (M-JNL), a remote journal volume
(R-JNL), and a consistency (CT) group.

Important: If these are the first Universal Replicator LUNs to be allocated, you must
also assign journaling groups and LUNs for both storage units. Refer to the
appropriate Hitachi Universal Replicator documentation as needed.

We chose ones that have been already previously created in the environment.
d. Click Set
e. Repeat these steps for the second LUN pairing.
Figure 14-14 shows details of the two pairings.

Figure 14-14 Paircreate details in Universal Replicator14

14 Courtesy of Hitachi Data Systems

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 441
4. After you complete the pairing selections, on the Pair Operation tab, verify that the
information is correct and click Apply to apply them all at one time.
When the pairing is established, the copy automatically begins to synchronize with the
remote LUNs at the Austin site. The status changes to COPY, as shown in Figure 14-15,
until the pairs are in sync. After the pairs are synchronized, their status changes to PAIR.

Figure 14-15 Asynchronous copy in progress in Universal Replicator15

15 Courtesy of Hitachi Data Systems

442 IBM PowerHA SystemMirror 7.1 for AIX


5. Upon completion of the synchronization of the LUNs, configure the LUNs into the AIX
cluster nodes. Figure 14-16 shows an overview of the Hitachi replicated environment.

Figure 14-16 Replicated Hitachi LUN overview16

14.4.3 Configuring an AIX disk and dev_group association


Before you continue with the steps in this section, you must ensure that the Hitachi hdisks are
made available to your nodes. You can run the cfgmgr command to configure the new hdisks.
Also the CCI must already be installed on each cluster node. If you must install the CCI, see
14.2.1, “Installing the Hitachi CCI software” on page 422.

In the test environment, we already have hdisk0-37 on each of the four cluster nodes. After
running the cfgmgr command one each node, one at a time, we now have four additional
disks, hdisk38-hdisk41, as shown in Example 14-3.

Example 14-3 New Hitachi disks


root@jessica:
hdisk38 none None
hdisk39 none None
hdisk40 none None
hdisk41 none None

Although the LUN and LDEV numbers were written down during the initial LUN assignments,
you must identify the correct LDEV numbers of the Hitachi disks and the corresponding AIX
hdisks by performing the following steps:

16 Courtesy of Hitachi Data Systems

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 443
1. On the PowerHA SystemMirror Enterprise Edition nodes, select the Hitachi disks and the
disks that will be used in the TrueCopy/HUR relationships by running the inqraid
command. Example 14-4 shows hdisk38-hdisk41, which are the Hitachi disks that we just
added.

Example 14-4 Hitachi disks added


root@jessica:
# lsdev -Cc disk|grep hdisk|/HORCM/usr/bin/inqraid
hdisk38 -> [SQ] CL1-E Ser = 45306 LDEV = 272 [HITACHI ] [OPEN-V ]
HORC = P-VOL HOMRCF[MU#0 = SMPL MU#1 = SMPL MU#2 = SMPL]
RAID5[Group 1- 2] SSID = 0x0005
hdisk39 -> [SQ] CL1-E Ser = 45306 LDEV = 273 [HITACHI ] [OPEN-V ]
HORC = P-VOL HOMRCF[MU#0 = SMPL MU#1 = SMPL MU#2 = SMPL]
RAID5[Group 1- 2] SSID = 0x0005
hdisk40 -> [SQ] CL1-E Ser = 45306 LDEV = 274 [HITACHI ] [OPEN-V ]
HORC = S-VOL HOMRCF[MU#0 = SMPL MU#1 = SMPL MU#2 = SMPL]
RAID5[Group 1- 2] SSID = 0x0005 CTGID = 10
hdisk41 -> [SQ] CL1-E Ser = 45306 LDEV = 275 [HITACHI ] [OPEN-V ]
HORC = S-VOL HOMRCF[MU#0 = SMPL MU#1 = SMPL MU#2 = SMPL]
RAID5[Group 1- 2] SSID = 0x0005 CTGID = 10

2. Edit the HORCM LDEV section in the horcm#.conf file to identify the dev_group that will be
managed by PowerHA SystemMirror Enterprise Edition. In this example, we use the
horcm2.conf file.
Hdisk38 (ldev 272) and hdisk39 (ldev 273) are the pair for the synchronous replicated
resource group, which is primary at the Austin site. Hdisk40 (ldev 275) and hdisk41
(ldev276) are the pair for an asynchronous replicated resource, which is primary at the
Miami site.
Specify the device groups (dev_group) in the horcm#.conf file. We are using dev_group
htcdg01 with dev_names htcd01 and htcd02 for the synchronous replicated pairs. For the
asynchronous pairs, we are using dev_group hurdg01 and dev_names hurd01 and hurd02.
The device group names are needed later when checking that status of the replicated
pairs and when defining the replicated pairs as a resource for PowerHA Enterprise Edition
to control.

Important: Do not edit the configuration definition file while HORCM is running. Shut
down HORCM, edit the configuration file as needed, and then restart HORCM.

Example 14-5 shows the horcm2.conf file from the jessica node, at the Austin site.
Because two nodes are at the Austin site, the same updates were performed to the
/etc/horcm2.conf file on the bina node. Notice that you can use either the decimal value
of the LDEV or the hexidecimal value.
We specifically did one pair each way just to show it and to demonstrate that it works.
Although several groups were already defined, only those that are relevant to this scenario
are shown.

Example 14-5 Horcm2.conf file used for the Austin site nodes
root@jessica:
/etc/horcm2.conf

HORCM_MON
#Address of local node...
#ip_address service poll(10ms) timeout(10ms)

444 IBM PowerHA SystemMirror 7.1 for AIX


r9r3m11.austin.ibm.com 52323 1000 3000

HORCM_CMD
#hdisk of Command Device...
#dev_name dev_name dev_name
#UnitID 0 (Serial# 45306)
#/dev/rhdisk10
\\.\CMD-45306:/dev/rhdisk10 /dev/rhdisk14

HORCM_LDEV
#Map dev_grp to LDEV#...
#dev_group dev_name Serial# CU:LDEV MU# siteA siteB
# (LDEV#) hdisk -> hdisk
#--------- --------- ------- -------- --- --------------------
htcdg01 htcd01 45306 272
htcdg01 htcd02 45306 273
hurdg01 hurd01 45306 01:12
hurdg01 hurd02 45306 01:13

# Address of remote node for each dev_grp...


HORCM_INST
#dev_group ip_address service
htcdg01 maddi.austin.ibm.com 52323
hurdg01 maddi.austin.ibm.com 52323

For the krod and maddi nodes at the Miami site, the dev_groups, dev_names, and the
LDEV numbers are the same. The difference is the specific serial number of the storage
unit at that site. Also, the remote system or IP address for the appropriate system in the
Austin site.
Example 14-6 shows the horcm2.conf file that we used for both nodes in the Miami site.
Notice that, for the ip_address fields, fully qualified names are used instead of the IP
address. As long as these names are resolvable, the format is still valid. However, the
format is seen using the actual addresses as shown in Example 14-1 on page 425.

Example 14-6 The horcm2.conf file used for the nodes in the Miami site
root@krod:
horcm2.conf

HORCM_MON
#Address of local node...
#ip_address service poll(10ms) timeout(10ms)
r9r3m13.austin.ibm.com 52323 1000 3000

HORCM_CMD
#hdisk of Command Device...
#dev_name dev_name dev_name
#UnitID 0 (Serial# 35764)
#/dev/rhdisk10
# /dev/hdisk19
\\.\CMD-45306:/dev/rhdisk11 /dev/rhdisk19

#HUR_GROUP HUR_103_153 45306 01:53 0


htcdg01 htcd01 35764 268
htcdg01 htcd02 35764 269
hurdg01 hurd01 35764 01:0E

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 445
hurdg01 hurd02 35764 01:0F
# Address of remote node for each dev_grp...

HORCM_INST
#dev_group ip_address service
htcdg01 bina.austin.ibm.com 52323
hurdg01 bina.austin.ibm.com 52323

3. Map the TrueCopy-protected hdisks to the TrueCopy device groups by using the raidscan
command. In the following example, 2 is the HORCM instance number:
lsdev -Cc disk|grep hdisk | /HORCM/usr/bin/raidscan -IH2 -find inst
The -find inst option of the raidscan command registers the device file name (hdisk) to
all mirror descriptors of the LDEV map table for HORCM. This option also permits the
matching volumes on the horcm.conf file in protection mode and is started automatically
by using the /etc/horcmgr command. Therefore you do not need to use this option
normally. This option is terminated to avoid wasteful scanning when the registration has
been finished based on HORCM.
Therefore, if HORCM no longer needs the registration, then no further action is taken and
it exits. You can use the -find inst option with the -fx option to view LDEV numbers in
the hexadecimal format.
4. Verify that the PAIRs are established by running either the pairvdisplay command or the
pairvolchk command against the device groups htcdg01 and hurdg01.
Example 14-7 shows how we use the pairvdisplay command. For device group htcdg01,
the status of PAIR and fence of NEVER indicates that they are a synchronous pair. For
device group hurdg01, the ASYNC fence option clearly indicates that it is in an
asynchronous pair. Also notice that the CTG field shows the consistency group number for
the asynchronous pair managed by HUR.

Example 14-7 The pairdisplay command to verify that the pair status is synchronized
# pairdisplay -g htcdg01 -IH2 -fe
Group PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M CTG JID AP
htcdg01 htcd01(L) (CL1-E-0, 0, 10)45306 272.P-VOL PAIR NEVER ,35764 268 - - - 1
htcdg01 htcd01(R) (CL1-B-0, 0, 28)35764 268.S-VOL PAIR NEVER ,----- 272 - - - -
htcdg01 htcd02(L) (CL1-E-0, 0, 11)45306 273.P-VOL PAIR NEVER ,35764 269 - - - 1
htcdg01 htcd02(R) (CL1-B-0, 0, 29)35764 269.S-VOL PAIR NEVER ,----- 273 - - - -

# pairdisplay -g hurdg01 -IH2 -fe


Group PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M CTG JID AP
hurdg01 hurd01(L) (CL1-E-0, 0, 12)45306 274.S-VOL PAIR ASYNC ,----- 270 - 10 3 1
hurdg01 hurd01(R) (CL1-B-0, 0, 30)35764 270.P-VOL PAIR ASYNC ,45306 274 - 10 3 2
hurdg01 hurd02(L) (CL1-E-0, 0, 13)45306 275.S-VOL PAIR ASYNC ,----- 271 - 10 3 1
hurdg01 hurd02(R) (CL1-B-0, 0, 31)35764 271.P-VOL PAIR ASYNC ,45306 275 - 10 3 2

To show the output in Example 14-7, we removed the last three columns of the output
because it was not relevant to what we are checking.

446 IBM PowerHA SystemMirror 7.1 for AIX


Unestablished pairs: If pairs are not yet established, the status is displayed as SMPL. To
continue, you must create the pairs. For instructions about creating pairs from the
command line, see the Hitachi Command Control Interface (CCI) User and Reference
Guide, MK-90RD011, which you can download from:
http://communities.vmware.com/servlet/JiveServlet/download/1183307-19474

Otherwise, if you are using Storage Navigator, see 14.4.2, “Creating replicated pairs” on
page 432.

Creating volume groups and file systems on replicated disks


After identifying the hdisks and dev_groups that will be managed by PowerHA SystemMirror
Enterprise Edition, you must create the volume groups and file systems. To set up volume
groups and file systems in the replicated disks, follow these steps:
1. On each of the four PowerHA SystemMirror Enterprise Edition cluster nodes, verify the
next free major number by running the lvlstmajor command on each cluster node. Also
verify that the physical volume name for the file system can also be used across sites.
In this scenario, we use the major numbers 56 for the truesyncvg volume group and 57 for
the ursasyncvg volume group. We use these numbers later when importing the volume to
the other cluster nodes. Although the major numbers are not required to match, it is a
preferred practice.
We create the truesyncvg scalable volume group on the jessica node where the primary
LUNs are located. We also create the logical volumes, jfslog, and file systems as shown
in Example 14-8.

Example 14-8 Details about the truesyncvg volume group


root@jessica:lsvg truesyncvg
VOLUME GROUP: truesyncvg VG IDENTIFIER:
00cb14ce00004c000000012b564c41b9
VG STATE: active PP SIZE: 4 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 988 (3952 megabytes)
MAX LVs: 256 FREE PPs: 737 (2948 megabytes)
LVs: 3 USED PPs: 251 (1004 megabytes)
OPEN LVs: 3 QUORUM: 2 (Enabled)
TOTAL PVs: 2 VG DESCRIPTORS: 3
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 2 AUTO ON: no
MAX PPs per VG: 32768 MAX PVs: 1024
LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
PV RESTRICTION: none

root@jessica:lsvg -l truesyncvg
lsvg -l truesyncvg
truesyncvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
oreolv jfs2 125 125 1 closed/syncd /oreofs
majorlv jfs2 125 125 1 closed/syncd /majorfs
truefsloglv jfs2log 1 1 1 closed/syncd N/A

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 447
We create the ursasyncvg big volume group on the krod node where the primary LUNs
are located. We also create the logical volumes, jfslog, and file systems as shown in
Example 14-9.

Example 14-9 Ursasyncvg volume group information


root@krod:lspv
hdisk40 00cb14ce5676ad24 ursasyncvg active
hdisk41 00cb14ce5676afcf ursasyncvg active

root@krod:lsvg ursasyncvg
VOLUME GROUP: ursasyncvg VG IDENTIFIER:
00cb14ce00004c000000012b5676b11e
VG STATE: active PP SIZE: 4 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 1018 (4072 megabytes)
MAX LVs: 512 FREE PPs: 596 (2384 megabytes)
LVs: 3 USED PPs: 422 (1688 megabytes)
OPEN LVs: 3 QUORUM: 2 (Enabled)
TOTAL PVs: 2 VG DESCRIPTORS: 3
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 2 AUTO ON: no
MAX PPs per VG: 130048
MAX PPs per PV: 1016 MAX PVs: 128
LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable

root@krod:lsvg -l ursasyncvg
ursasyncvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
ursfsloglv jfs2log 2 2 1 closed/syncd N/A
hannahlv jfs2 200 200 1 closed/syncd /hannahfs
julielv jfs2 220 220 1 closed/syncd /juliefs

2. Vary off the newly created volume groups by running the varyoffvg command. To import
the volume groups onto the other three systems, the pairs must be in sync.
We execute the pairresync command as shown in Example 14-10 on the local disks and
make sure that they are in the PAIR state. This process verifies that the local disk
information has been copied to the remote storage. Notice that the command is being run
on the respective node that contains the primary source LUNs and where the volume
groups are created.

Example 14-10 Pairresync command


#root@jessica:pairresync -g htcdg01 -IH2

#root@krod:pairresync -g hurdg01 -IH2

Verify that the pairs are in sync with the pairdisplay command as shown in Example 14-7
on page 446.

448 IBM PowerHA SystemMirror 7.1 for AIX


3. Split the pair relationship so that the remote systems can import the volume groups as
needed on each node. Run the pairsplit command against the device group as shown in
Example 14-11.

Example 14-11 The pairsplit command to suspend replication


root@jessica: pairsplit -g htcdg01 -IH2

root@krod: pairsplit -g hurdg01 -IH2

To verify that the pairs are split, check the status by using the pairdisplay command.
Example 14-12 shows that the pairs are in a suspended state.

Example 14-12 Pairdisplay shows pairs suspended


root@jessica: pairdisplay -g htcdg01 -IH2 -fe
Group PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M CTG JID AP
htcdg01 htcd01(L) (CL1-E-0, 0, 10)45306 272.P-VOL PSUS NEVER ,35764 268 - - - 1
htcdg01 htcd01(R) (CL1-B-0, 0, 28)35764 268.S-VOL SSUS NEVER ,----- 272 - - - -
htcdg01 htcd02(L) (CL1-E-0, 0, 11)45306 273.P-VOL PSUS NEVER ,35764 269 - - - 1
htcdg01 htcd02(R) (CL1-B-0, 0, 29)35764 269.S-VOL SSUS NEVER ,----- 273 - - - -

root@krod: pairdisplay -g hurdg01 -IH2 -fe


Group PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M CTG JID AP
hurdg01 hurd01(L) (CL1-B-0, 0, 30)35764 270.P-VOL PSUS ASYNC ,45306 274 - 10 3 2
hurdg01 hurd01(R) (CL1-E-0, 0, 12)45306 274.S-VOL SSUS ASYNC ,----- 270 - 10 3 1
hurdg01 hurd02(L) (CL1-B-0, 0, 31)35764 271.P-VOL PSUS ASYNC ,45306 275 - 10 3 2
hurdg01 hurd02(R) (CL1-E-0, 0, 13)45306 275.S-VOL SSUS ASYNC ,----- 271 - 10 3 1

4. To import the volume groups on the remaining nodes, ensure that the PVID is present on
the disks by using one of the following options:
– Run the rmdev -dl command for each hdisk and then run the cfgmgr command.
– Run the appropriate chdev command against each disk to pull in the PVID.
As shown in Example 14-13, we use the chdev command on each of the three additional
nodes.

Example 14-13 The chdev command to acquire the PVIDs


root@jessica: chdev -l hdisk40 -a pv=yes
root@jessica: chdev -l hdisk41 -a pv=yes

root@bina: chdev -l hdisk38 -a pv=yes


root@bina: chdev -l hdisk39 -a pv=yes
root@bina: chdev -l hdisk40 -a pv=yes
root@bina: chdev -l hdisk41 -a pv=yes

root@krod: chdev -l hdisk38 -a pv=yes


root@krod: chdev -l hdisk39 -a pv=yes

root@maddi: chdev -l hdisk38 -a pv=yes


root@maddi: chdev -l hdisk39 -a pv=yes
root@maddi: chdev -l hdisk40 -a pv=yes
root@maddi: chdev -l hdisk41 -a pv=yes

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 449
5. Verify that the PVIDs are correctly showing on each system by running the lspv command
as shown in Example 14-14. Because all four of the nodes have the exact hdisk
numbering, we show the output only from one node, the bina node.

Example 14-14 LSPV listing to verify PVIDs are present


bina@root: lspv
hdisk38 00cb14ce564c3f44 none
hdisk39 00cb14ce564c40fb none
hdisk40 00cb14ce5676ad24 none
hdisk41 00cb14ce5676afcf none

6. Import the volume groups on each node as needed by using the importvg command.
Specify the major number that you used earlier.
7. Disable both the auto varyon and quorum settings of the volume groups by using the chvg
command.
8. Vary off the volume group as shown in Example 14-15.

Attention: PowerHA SystemMirror Enterprise Edition attempts to automatically set the


AUTO VARYON to NO during verification, except in the case of remote TrueCopy/HUR.

Example 14-15 Importing the replicated volume groups


root@jessica: importvg -y ursasyncvg -V 57 hdisk40
root@jessica: chvg -a n -Q n ursasyncvg
root@jessica: varyoffvg ursasyncvg

root@bina: importvg -y truesyncvg -V 56 hdisk38


root@bina: importvg -y ursasyncvg -V 57 hdisk40
root@bina: chvg -a n -Q n truesyncvg
root@bina: chvg -a n -Q n ursasyncvg
root@bina: varyoffvg truesyncvg
root@bina: varyoffvg ursasyncvg

root@krod: importvg -y truesyncvg -V 56 hdisk38


root@krod: chvg -a n -Q n truesyncvg
root@krod: varyoffvg truesyncvg

root@maddi: importvg -y truesyncvg -V 56 hdisk38


root@maddi: importvg -y ursasyncvg -V 57 hdisk40
root@maddi: chvg -a n -Q n truesyncvg
root@maddi: chvg -a n -Q n ursasyncvg
root@maddi: varyoffvg truesyncvg
root@maddi: varyoffvg ursasyncvg

9. Re-establish the pairs that you split in step 3 on page 449 by running the pairresync
command again as shown in Example 14-10 on page 448.
10.Verify again if they are in sync by using the pairdisplay command as shown in
Example 14-7 on page 446.

450 IBM PowerHA SystemMirror 7.1 for AIX


14.4.4 Defining TrueCopy/HUR managed replicated resource to PowerHA
To add a replicated resource to be controlled by PowerHA consists of two specific steps per
device group, and four steps overall:
 Adding TrueCopy/HUR replicated resources
 Adding the TrueCopy/HUR replicated resources to a resource group
 Verifying the TrueCopy/HUR configuration
 Synchronizing the cluster configuration

In these steps, the cluster topology has been configured, including all four nodes, both sites,
and networks.

Adding TrueCopy/HUR replicated resources


To define a TrueCopy replicated resource, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Resource
Configuration  TrueCopy Replicated Resources  Add Hitachi TrueCopy/HUR
Replicated Resource.
3. In the Ad Hitachi TrueCopy/HUR Replication Resource panel, press Enter.
4. Complete the available fields appropriately and press Enter.

In this configuration, we created two replicated resources. One resource is for the
synchronous device group, htcdg01, named trulee. The second resource for the
asynchronous device group, hurdg01, named ursasyncRR. Example 14-16 shows both of the
replicated resources.

Example 14-16 TrueCopy/HUR replicated resource definitions


Add a HITACHI TRUECOPY(R)/HUR Replicated Resource

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* TRUECOPY(R)/HUR Resource Name [truelee]
* TRUECOPY(R)/HUR Mode SYNC +
* Device Groups [htcdg01] +
* Recovery Action AUTO +
* Horcm Instance [horcm2]
* Horctakeover Timeout Value [300] #
* Pairevtwait Timeout Value [3600] #

Add a HITACHI TRUECOPY(R)/HUR Replicated Resource

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* TRUECOPY(R)/HUR Resource Name [ursasyncRR]
* TRUECOPY(R)/HUR Mode ASYNC +
* Device Groups [hurdg01] +
* Recovery Action AUTO +

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 451
* Horcm Instance [horcm2]
* Horctakeover Timeout Value [300] #
* Pairevtwait Timeout Value [3600] #

For a complete list of all of defined TrueCopy/HUR replicated resources, run the cllstc
command, which is in the /usr/es/sbin/cluster/tc/cmds directory. Example 14-17 shows
the output of the cllstc command.

Example 14-17 The cllstc command to list the TrueCopy/HUR replicated resources
root@jessica: cllstc -a
Name CopyMode DeviceGrps RecoveryAction HorcmInstance HorcTimeOut PairevtTimeout
truelee SYNC htcdg01 AUTO horcm2 300 3600
ursasyncRR ASYNC hurdg01 AUTO horcm2 300 3600

Adding the TrueCopy/HUR replicated resources to a resource group


To add a TrueCopy replicated resource to a resource group, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Resource
Configuration  Extended Resource Group Configuration.
Depending on whether you are working with an existing resource group or creating a
resource group, the TrueCopy Replicated Resources entry is displayed at the bottom of
the page in SMIT. This entry is a pick list that shows the resource names that are created
in the previous task.
3. Ensure that the volume groups that are selected on the Resource Group configuration
display match the volume groups that are used in the TrueCopy/HUR Replicated
Resource:
– If you are changing an existing resource group, select Change/Show Resource
Group.
– If you are adding a resource group, select Add a Resource Group.
4. In the TrueCopy Replicated Resources field, press F4 for a list of the TrueCopy/HUR
replicated resources that were previously added. Verify that this resource matches the
volume group that is specified.

Important: You cannot mix regular (non-replicated) volume groups and TrueCopy/HUR
replicated volume groups in the same resource group.

Press Enter.

In this scenario, we changed an existing resource group, emlecRG, for the Austin site and
specifically chose a site relationship, also known as an Inter-site Management Policy of Prefer
Primary Site. We added a new resource group, valhallarg, for the Miami site and chose to
use the same site relationship. We also added the additional nodes from each site. We
configured both to failover locally within a site and failover between sites. If a site failure
occurs, the node falls over to the remote site standby node, but never to the remote
production node.

452 IBM PowerHA SystemMirror 7.1 for AIX


Example 14-18 shows the relevant resource group information.

Example 14-18 Resource groups for the TrueCopy/HUR replicated resources


Resource Group Name emlecRG
Participating Node Name(s) jessica bina maddi
Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node
Fallback Policy Never Fallback
Site Relationship Prefer Primary Site
Node Priority
Service IP Label service_1
Volume Groups truesyncvg
Hitachi TrueCopy Replicated Resources truelee

Resource Group Name valhallaRG


Participating Node Name(s) krod maddi bina
Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node
Fallback Policy Never Fallback
Site Relationship Prefer Primary Site
Node Priority
Service IP Label service_2
Volume Groups ursasyncvg
Hitachi TrueCopy Replicated Resources ursasyncRR

Verifying the TrueCopy/HUR configuration


Before synchronizing the new cluster configuration, verify the TrueCopy/HUR configuration:
1. To verify the configuration, run the following command:
/usr/es/sbin/cluster/tc/utils/cl_verify_tc_config
2. Correct any configuration errors that are shown.
If you see error messages such as those shown in Figure 14-17, usually these types of
messages indicate that the raidscan command was not run or was run incorrectly. See
step 3 on page 449 in “Creating volume groups and file systems on replicated disks” on
page 447.
3. Run the script again.

Figure 14-17 Error messages found during TrueCopy/HUR replicated resource verification

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 453
Synchronizing the cluster configuration
You must verify the PowerHA SystemMirror Enterprise Edition cluster and the TrueCopy/HUR
configuration before you can synchronize the cluster. To propagate the new TrueCopy/HUR
configuration information and the additional resource group that were created across the
cluster, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select Extended Configuration  Extended Verification and
Synchronization.
3. In the Verify Synchronize or Both field select Synchronize. In the Automatically correct
errors found during verification field select No. Press Enter.
The output is displayed in the SMIT Command Status window.

14.5 Failover testing


This topic explains the basic failover testing of the TrueCopy/HUR replicated resources locally
within the site and across sites. You must carefully plan the testing of the site cluster failover
because it requires more time to manipulate the secondary target LUNs at the recovery site.
Also when testing the asynchronous replication, because of the nature of asynchronous
replication, testing can also impact the data.

These scenarios do not entail performing a redundancy test with the IP networks. Instead you
configure redundant IP or non-IP communication paths to avoid isolation of the sites. The loss
of all the communication paths between sites leads to a partitioned state of the cluster and to
data divergence between sites if the replication links are also unavailable.

Another specific failure scenario is the loss of the replication paths between the storage
subsystems while the cluster is running on both sites. To avoid this situation, configure
redundant communication links for TrueCopy/HUR replication. You must manually recover the
status of the pairs after the storage links are operational again.

Important: PowerHA SystemMirror Enterprise Edition does not trap SNMP notification
events for TrueCopy/HUR storage. If a TrueCopy link goes down when the cluster is up and
the link is repaired later, you must manually resynchronize the pairs.

This topic explains how to perform the following tests for each site and resource group:
 Graceful site failover for the Austin site
 Rolling site failure of the Austin site
 Site re-integration for the Austin site
 Graceful site failover for the Miami site
 Rolling site failure of the Miami site
 Site re-integration for the Miami site

454 IBM PowerHA SystemMirror 7.1 for AIX


Each test, except for the last re-integration test, begins in the same initial state of each site
hosting its own production resource group on the primary node as shown in Example 14-19.

Example 14-19 Beginning of test cluster resource group states


clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
emlecRG ONLINE jessica@Austin
OFFLINE bina@Austin
ONLINE SECONDARY maddi@Miami
valhallaRG ONLINE krod@Miami
OFFLINE maddi@Miami
ONLINE SECONDARY bina@Austin

Before each test, we start copying data from another file system to the replicated file systems.
After each test, we verify that the site service IP address is online and new data is in the file
systems. We also had a script that inserts the current time and date into a file on each file
system. Because of the small amounts of I/O in our environment, we were unable to
determine to have lost any data in the asynchronous replication either.

14.5.1 Graceful site failover for the Austin site


Performing a controlled move of a production environment across sites is a basic test to
ensure that the remote site can bring the production environment online. However, this task is
done only during initial implementation testing or during a planned production outage of the
site. You perform the graceful failover operation between sites by performing a resource group
move.

In a true maintenance scenario, you most likely perform this task by stopping the cluster on
the local standby node first. Then you stop the cluster on the production node by using the
Move Resource Group. You perform the following operations during this move:
 Releasing the primary online instance of emlecRG at the Austin site
– Executes application server stop
– Unmounts the file systems
– Varies off the volume group
– Removes the service IP address
 Releasing the secondary online instance of emlecRG at the Miami site.
 Acquire the emlecRG resource group in the secondary online state at Austin site.
 Acquire the emlecRG resource group in the online primary state at the Miami site.

To move the resource group by using SMIT, follow these steps:


1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path System Management (C-SPOC)  Resource Groups and
Applications  Move a Resource Group to Another Node / Site  Move Resource
Groups to Another Site.

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 455
3. In the Move a Resource Group to Another Node / Site panel (Figure 14-18), select the
ONLINE instance of the emlecRG resource group to be moved.

Move a Resource Group to Another Node / Site

Move cursor to desired item and press Enter.


+--------------------------------------------------------------------------+
| Select Resource Group(s) |
| |
| Move cursor to desired item and press Enter. Use arrow keys to scroll. |
| |
| # |
| # Resource Group State Node(s) / Site |
| # |
| emlecRG ONLINE jessica / Austi |
| emlecRG ONLINE SECONDARY maddi / Miami |
| valhallarg ONLINE krod / Miami |
| |
| # |
| # Resource groups in node or site collocation configuration: |
| # Resource Group(s) State Node / Site |
| # |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 14-18 Moving the Austin resource group across to site Miami

4. In the Select a Destination Site panel, select the Miami site as shown in Figure 14-19.

+--------------------------------------------------------------------------+
| Select a Destination Site |
| |
| Move cursor to desired item and press Enter. |
| |
| # *Denotes Originally Configured Primary Site |
| Miami |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 14-19 Selecting the site for resource group move

456 IBM PowerHA SystemMirror 7.1 for AIX


5. Verify the information in the final menu and Press Enter.
Upon completion of the move, emlecRG is online on the maddi node at the Miami site as
shown in Example 14-20.

Example 14-20 Resource group status after a move to the Miami site
root@maddi# clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
emlecRG ONLINE SECONDARY jessica@Austin
OFFLINE bina@Austin
ONLINE maddi@Miami

valhallarg ONLINE krod@Miami


OFFLINE maddi@Miami
OFFLINE bina@Austin

6. Repeat the resource group move to move it back to its original primary site and node to
return to the original starting state.

Attention: In our environment, after the first resource group move between sites, we were
unable to move the resource group back without leaving the pick list for the destination site
empty. However, we were able to move it back by node, instead of by site. Later in our
testing, the by-site option started working, but it moved it to the standby node at the
primary site instead of the original primary node. If you encounter similar problems, contact
IBM support.

14.5.2 Rolling site failure of the Austin site


In this scenario, you perform a rolling site failure of the Austin site by performing the following
tasks:
1. Halt the primary production node jessica at the Austin site.
2. Verify that the resource group emlecRG is acquired locally by the bina node.
3. Halt the bina node to produce a site down.
4. Verify that the resource group emlecRG is acquired remotely by the maddi node.

To begin, all four nodes are active in the cluster and the resource groups are online on the
primary node as shown in Example 14-19 on page 455.
1. On the jessica node, run the reboot -q command. The bina node acquires the emlecRG
resource group as shown in Example 14-21.

Example 14-21 Local node failover within the Austin site


root@bina: clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
emlecRG OFFLINE jessica@Austin
ONLINE bina@Austin
OFFLINE maddi@Miami

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 457
valhallarg ONLINE krod@Miami
OFFLINE maddi@Miami
ONLINE SECONDARY bina@Austin

2. Run the pairdisplay command (as shown in Example 14-22) to verify that the pairs are
still established because the volume group is still active on the primary site.

Example 14-22 Pairdisplay status after a local site failover


root@bina: pairdisplay -g htcdg01 -IH2 -fe
Group PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M CTG JID AP
htcdg01 htcd01(L) (CL1-E-0, 0, 10)45306 272.P-VOL PAIR NEVER ,35764 268 - - - 1
htcdg01 htcd01(R) (CL1-B-0, 0, 28)35764 268.S-VOL PAIR NEVER ,----- 272 - - - -
htcdg01 htcd02(L) (CL1-E-0, 0, 11)45306 273.P-VOL PAIR NEVER ,35764 269 - - - 1
htcdg01 htcd02(R) (CL1-B-0, 0, 29)35764 269.S-VOL PAIR NEVER ,----- 273 - - - -

3. Upon cluster stabilization, run the reboot -q command on the bina node. The maddi node
at the Miami site acquires the emlecRG resource group as shown in Example 14-23.

Example 14-23 Hard failover between sites


root@maddi: clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
emlecRG OFFLINE jessica@Austin
OFFLINE bina@Austin
ONLINE maddi@Miami

valhallarg ONLINE krod@Miami


OFFLINE maddi@Miami
OFFLINE bina@Austin

4. Verify that the replicated pairs are now in the suspended state from the command line as
shown in Example 14-24.

Example 14-24 Pairdisplay status after a hard site failover


root@maddi: pairdisplay -g htcdg01 -IH2 -fe
Group PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M CTG JID AP
htcdg01 htcd01(L) (CL1-B-0, 0, 28)35764 268.S-VOL SSUS NEVER ,----- 272 W - - 1
htcdg01 htcd01(R) (CL1-E-0, 0, 10)45306 272.P-VOL PSUS NEVER ,35764 268 - - - 1
htcdg01 htcd02(L) (CL1-B-0, 0, 29)35764 269.S-VOL SSUS NEVER ,----- 273 W - - 1
htcdg01 htcd02(R) (CL1-E-0, 0, 11)45306 273.P-VOL PSUS NEVER ,35764 269 - - - 1

458 IBM PowerHA SystemMirror 7.1 for AIX


You can also verify that the replicated pairs are in the suspended state by using the
Storage Navigator (Figure 14-20).

Important: Although our testing resulted in a site_down event, we never lost access to
the primary storage subsystem. In a true site failure, including loss of storage,
re-establish the replicated pairs, and synchronize them before moving back to the
primary site. If you must change the storage LUNs, modify the horcm.conf file, and use
the same device group and device names. You do not have to change the cluster
resource configuration.

Figure 14-20 Pairs suspended after a site failover17

14.5.3 Site re-integration for the Austin site


In this scenario, we restart both cluster nodes at the Austin site by using the smitty clstart
command. Upon startup of the primary node jessica, the emlecRG resource group is
automatically gracefully moved back to and returns to the original starting point as shown in
Example 14-19 on page 455.

17 Courtesy of Hitachi Data Systems

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 459
Important: The resource group settings of the Inter-site Management Policy, also known
as the site relationship, dictate the behavior of what occurs upon re-integration of the
primary node. Because we chose Prefer Primary Site, the automatic fallback occurred.

Initially we are unable to restart the cluster on the jessica node because of verification errors
at startup, which are similar to the errors shown in Figure 14-17 on page 453. Of the two
possible reasons for these errors, one reason is that we failed to include starting the horcm
instance on bootup. The second is reason is that we also had to re-map the copy protected
device groups by running the raidscan command again.

Important: Always ensure that the horcm instance is running before rejoining a node into
the cluster. In some cases, if all instances, cluster nodes, or both have been down, you
might need to run the raidscan command again.

14.5.4 Graceful site failover for the Miami site


This move scenario starts from the states shown in Example 14-19 on page 455. You repeat
the steps from the previous three sections, one section at a time. However these steps are
performed to test the asynchronous replication of the Miami site.

The following tasks are performed during this move:


1. Release the primary online instance of valhallaRG at the Miami site.
– Executes the application server stop.
– Unmounts the file systems
– Varies off the volume group
– Removes the service IP address
2. Release the secondary online instance of valhallaRG at the Austin site.
3. Acquire valhallaRG in the secondary online state at the Miami site.
4. Acquire valhallaRG in the online primary state at the Austin site.

Perform the resource group move by using SMIT as follows:


1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path System Management (C-SPOC)  Resource Groups and
Applications  Move a Resource Group to Another Node / Site  Move Resource
Groups to Another Site.
3. Select the ONLINE instance of valhallaRG to be moved.
4. Select the Austin site from the pop-up menu.
5. Verify the information in the final menu and press Enter.
Upon completion of the move, the valhallaRG resource group is online on the bina node at
the Austin site. The resource group is online secondary on the local production krod node
at the Miami site as shown in Example 14-25.

Example 14-25 Resource group status after moving to the Austin site
root@bina: clRGinfo
Group Name Group State Node
-----------------------------------------------------------------------------
emlecRG ONLINE jessica@Austin
OFFLINE bina@Austin
ONLINE SECONDARY maddi@Miami

460 IBM PowerHA SystemMirror 7.1 for AIX


valhallarg ONLINE SECONDARY krod@Miami
OFFLINE maddi@Miami
ONLINE bina@Austin

6. Repeat these steps to move a resource group back to the original primary krod node at
the Miami site.

Attention: In our environment, after the first resource group move between sites, we were
unable to move the resource group back without leaving the pick list for the destination site
empty. However, we were able to move it back by node, instead of by site. Later in our
testing, the by-site option started working, but it moved it to the standby node at the
primary site instead of the original primary node. If you encounter similar problems, contact
IBM support.

14.5.5 Rolling site failure of the Miami site


In this scenario, you perform a rolling site failure of the Miami site by performing the following
tasks:
1. Halt primary production node krod at site Miami
2. Verify resource group valhallaRG is acquired locally by node maddi
3. Halt node maddi to produce a site down
4. Verify resource group valhallaRG is acquired remotely by node bina

To begin, all four nodes are active in the cluster, and the resource groups are online on the
primary node as shown in Example 14-19 on page 455. Follow these steps:
1. On the krod node, run the reboot -q command. The maddi node brings the valhallaRG
resource group online, and the remote bina node maintains the online secondary status as
shown in Example 14-26. This time the failover time was noticeably longer, specifically in
the fsck portion. The longer amount of time is most likely a symptom of the asynchronous
replication.

Example 14-26 Local node fallover within the Miami site


root@maddi: clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
emlecRG ONLINE jessica@Austin
OFFLINE bina@Austin
ONLINE SECONDARY maddi@Miami

valhallarg OFFLINE krod@Miami


ONLINE maddi@Miami
ONLINE SECONDARY bina@Austin

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 461
2. Run the pairdisplay command as shown in Example 14-27 to verify that the pairs are still
established because the volume group is still active on the primary site.

Example 14-27 Status using the pairdisplay command after the local Miami site fallover
root@maddi: pairdisplay -fd -g hurdg01 -IH2 -CLI
Group PairVol L/R Device_File Seq# LDEV# P/S Status Fence Seq# P-LDEV# M
hurdg01 hurd01 L hdisk40 35764 270 P-VOL PAIR ASYNC 45306 274 -
hurdg01 hurd01 R hdisk40 45306 274 S-VOL PAIR ASYNC - 270 -
hurdg01 hurd02 L hdisk41 35764 271 P-VOL PAIR ASYNC 45306 275 -
hurdg01 hurd02 R hdisk41 45306 275 S-VOL PAIR ASYNC - 271 -

3. Upon cluster stabilization, run the reboot -q command on the maddi node. The bina node
at the Austin sites acquires the valhallaRG resource group as shown in Example 14-28.

Example 14-28 Hard failover from Miami site to Austin site


root@bina: clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
emlecRG ONLINE jessica@Austin
OFFLINE bina@Austin
OFFLINE maddi@Miami

valhallarg OFFLINE krod@Miami


OFFLINE maddi@Miami
ONLINE bina@Austin

Important: Although our testing resulted in a site_down event, we never lost access to
the primary storage subsystem. In a true site failure, including loss of storage,
re-establish the replicated pairs, and synchronize them before moving back to the
primary site. If you must change the storage LUNs, modify the horcm.conf file, and use
the same device group and device names. You do not have to change the cluster
resource configuration.

14.5.6 Site re-integration for the Miami site


In this scenario, we restart both cluster nodes at the Miami site by using the smitty clstart
command. Upon startup of the primary node krod, the valhallaRG resource group is
automatically gracefully moved back to and returns to the original starting point as shown in
Example 14-19 on page 455.

Important: The resource group settings of the Inter-site Management Policy, also known
as the site relationship, dictate the behavior of what occurs upon re-integration of the
primary node. Because we chose Prefer Primary Site policy, the automatic fallback
occurred.

Initially we are unable to restart the cluster on the jessica node because of verification errors
at startup, which are similar to the errors shown in Figure 14-17 on page 453. Of the two
possible reasons for these errors, the first reason is that we failed to include starting the horcm
instance on bootup. The second is reason is that we also had to re-map the copy protected
device groups by running the raidscan command again.

462 IBM PowerHA SystemMirror 7.1 for AIX


Important: Always ensure that the horcm instance is running before rejoining a node into
the cluster. In some cases, if all instances, cluster nodes, or both have been down, you
might need to run the raidscan command again.

14.6 LVM administration of TrueCopy/HUR replicated pairs


This topic explains common scenarios for adding additional storage to an existing replicated
environment using Hitachi TrueCopy/HUR. In this scenario, you only work with the Austin site
and the emlecRG resource group in a TrueCopy synchronous replication. Overall the steps are
the same for both types of replication. The difference is the initial pair creation. You perform
the following tasks:
 Adding LUN pairs to an existing volume group
 Adding a new logical volume
 Increasing the size of an existing file system
 Adding a LUN pair to a new volume group

Important: This topic does not explain how to dynamically expand a volume through
Hitachi Logical Unit Size Expansion (LUSE) because this option is not supported.

14.6.1 Adding LUN pairs to an existing volume group


In this task, you assign a new LUN to each site as you did in 14.4.1, “Assigning LUNs to the
hosts (host groups)” on page 429. Table 14-2 shows a summary of the LUNs that are used.
Before continuing, the LUNs must already be established in a paired relationship, and the
LUNs or hdisk must be available on the appropriate cluster nodes.

Table 14-2 Summary of the LUNs implemented


Austin - Hitachi USPV - 45306 Miami - Hitachi USPVM - 35764

Port CL1-E Port CL-1B

CU 01 CU 01

LUN 000E LUN 001B

LDEV 01:14 LDEV 01:1F

jessica hdisk# hdisk42 krod hdisk# hdisk42

bina hdisk# hdisk42 maddi hdisk# hdisk42

Then follow the same steps from of defining new LUNs as follows:
1. Run the cfgmgr command on the primary node jessica.
2. Assign the PVID on the jessica node.
chdev -l hdisk42 -a pv=yes
3. Run the pairsplit command on the replicated LUNs.
4. Run the cfgmgr command on each of the remaining three nodes.
5. Verify that the PVID shows up on each node by using the lspv command.
6. Run the pairresync command on the replicated LUNs.

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 463
7. Shut down the horcm2 instance on each node:
/HORCM/usr/bin/horcmshutdown.sh 2
8. Edit the /etc/horcm2.conf file on each node as appropriate for each site:
– The krod and maddi nodes on the Miami site added the following new line:
htcdg01 htcd03 35764 01:1F
– The jessica and bina nodes on the Austin site added the following new line:
htcdg01 htcd03 45306 01:14
9. Restart horcm2 instance on each node:
/HORCM/usr/bin/horcmstart.sh 2
10.Map the devices and device group on any node:
lsdev -Cc disk|grep hdisk|/HORCM/usr/bin/raidscan -IH2 -find inst
We ran this command on the jessica node.
11.Verify that the htcgd01 device group pairs are now showing the new pairs, which consist of
hdisk42 on each system as shown in Example 14-29.

Example 14-29 New LUN pairs in the htcgd01 device group


root@jessica: pairdisplay -fd -g htcdg01 -IH2 -CLI
Group PairVol L/R Device_File Seq# LDEV# P/S Status Fence Seq# P-LDEV# M
htcdg01 htcd01 L hdisk38 45306 272 P-VOL PAIR NEVER 35764 268 -
htcdg01 htcd01 R hdisk38 35764 268 S-VOL PAIR NEVER - 272 -
htcdg01 htcd02 L hdisk39 45306 273 P-VOL PAIR NEVER 35764 269 -
htcdg01 htcd02 R hdisk39 35764 269 S-VOL PAIR NEVER - 273 -
htcdg01 htcd03 L hdisk42 45306 276 P-VOL PAIR NEVER 35764 287 -
htcdg01 htcd03 R hdisk42 35764 287 S-VOL PAIR NEVER - 276 -

You are now ready to use C-SPOC to add the new disk into the volume group:

Important: You cannot use C-SPOC for the following LVM operations to configure nodes at
the remote site that contain the target volume:
 Creating a volume group
 Operations that require nodes at the target site to write to the target volumes
For example, changing the file system size, changing the mount point, or adding LVM
mirrors cause an error message in C-SPOC. However, nodes on the same site as the
source volumes can successfully perform these tasks. The changes are then
propagated to the other site by using a lazy update.

For C-SPOC operations to work on all other LVM operations, perform all C-SPOC
operations with the (TrueCopy/HUR) volume pairs in the Synchronized or Consistent states
or the cluster ACTIVE on all nodes.

1. From the command line, type the smitty cl_admin command.


2. In SMIT, select the path System Management (C-SPOC)  Storage  Volume
Groups  Add a Volume to a Volume Group
3. Select the volume group truesyncvg from the pop-up menu.

464 IBM PowerHA SystemMirror 7.1 for AIX


4. Select hdisk42 as shown in Figure 14-21.

Set Characteristics of a Volume Group

Move cursor to desired item and press Enter.

Add a Volume to a Volume Group


Change/Show characteristics of a Volume Group
Remove a Volume from a Volume Group
+--------------------------------------------------------------------------+
| Physical Volume Names |
| |
| Move cursor to desired item and press Enter. |
| |
| 000a621aaf47ce83 ( hdisk2 on nodes bina,jessica ) |
| 000a621aaf47ce83 ( hdisk3 on nodes krod,maddi ) |
| 000cf1da43e72fc2 ( hdisk5 on nodes bina,jessica ) |
| 000cf1da43e72fc2 ( hdisk6 on nodes krod,maddi ) |
| 00cb14ce74090ef3 ( hdisk42 on all selected nodes ) |
| 00cb14ceb0f5bd25 ( hdisk4 on nodes bina,jessica ) |
| 00cb14ceb0f5bd25 ( hdisk14 on nodes krod,maddi ) |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 14-21 Selecting a disk to add to the volume group

5. Verify the menu information, as shown in Figure 14-22, and press Enter.

Add a Volume to a Volume Group

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
VOLUME GROUP name truesyncvg
Resource Group Name emlecRG
Node List bina,jessica,krod,mad>
Reference node bina
VOLUME names hdisk42
Figure 14-22 Adding a volume to a volume group

The krod node does not need the volume group because it is not a member of the resource
group. However, we started with all four nodes seeing all volume groups and decided to leave
the configuration that way. This way we have additional flexibility later if we need to change
the cluster configuration to allow the krod node to take over as a last resort.

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 465
Upon completion of the C-SPOC operation, all four nodes now have the new disk as a
member of the volume group as shown in Example 14-30.

Example 14-30 New disk added to the volume group on all nodes
root@jessica: lspv |grep truesyncvg
hdisk38 00cb14ce564c3f44 truesyncvg active
hdisk39 00cb14ce564c40fb truesyncvg active
hdisk42 00cb14ce74090ef3 truesyncvg active

root@bina: lspv |grep truesyncvg


hdisk38 00cb14ce564c3f44 truesyncvg
hdisk39 00cb14ce564c40fb truesyncvg
hdisk42 00cb14ce74090ef3 truesyncvg

root@krod: lspv |grep truesyncvg


hdisk38 00cb14ce564c3f44 truesyncvg
hdisk39 00cb14ce564c40fb truesyncvg
hdisk42 00cb14ce74090ef3 truesyncvg

root@maddi: lspv |grep truesyncvg


hdisk38 00cb14ce564c3f44 truesyncvg
hdisk39 00cb14ce564c40fb truesyncvg

hdisk42 00cb14ce74090ef3 truesyncvg

We do not need to synchronize the cluster because all of these changes are made to an
existing volume group. However, you might want to run the cl_verify_tc_config command to
verify the resources replicated correctly.

14.6.2 Adding a new logical volume


To perform this task, again you use C-SPOC, which updates the local nodes within the site.
For the remote site, when a failover occurs, the lazy update process updates the volume
group information as needed. This process also adds a bit of extra time to the failover time.

To add a new logical volume:


1. From the command line, type the smitty cl_admin command.
2. In SMIT, select the path System Management (C-SPOC)  Storage  Logical
Volumes  Add a Logical Volume.
3. Select the truesyncvg volume group from the pop-up menu.

466 IBM PowerHA SystemMirror 7.1 for AIX


4. Choose the newly added disk hdisk42 as shown in Figure 14-23.

Logical Volumes

Move cursor to desired item and press Enter.

List All Logical Volumes by Volume Group


Add a Logical Volume
Show Characteristics of a Logical Volume
Set Characteristics of a Logical Volume
+--------------------------------------------------------------------------+
| Physical Volume Names |
| |
| Move cursor to desired item and press F7. |
| ONE OR MORE items can be selected. |
| Press Enter AFTER making all selections. |
| |
| Auto-select |
| jessica hdisk38 |
| jessica hdisk39 |
| jessica hdisk42 |
| |
| F1=Help F2=Refresh F3=Cancel |
| F7=Select F8=Image F10=Exit |
F1| Enter=Do /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 14-23 Selecting a disk for new logical volume creation

5. Complete the information in the final menu and press Enter.


We added a new logical volume, named micah, which consists of 50 logical partitions
(LPARs) and selected a type of raw. We accepted the default values for all other fields as
shown in Figure 14-24.

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[TOP] [Entry Fields]


Resource Group Name emlecRG
VOLUME GROUP name truesyncvg
Node List bina,jessica,krod,mad>
Reference node jessica
* Number of LOGICAL PARTITIONS [50] #
PHYSICAL VOLUME names hdisk42
Logical volume NAME [micah]
Logical volume TYPE [raw] +
POSITION on physical volume outer_middle +
RANGE of physical volumes minimum +
MAXIMUM NUMBER of PHYSICAL VOLUMES [] #
to use for allocation
Number of COPIES of each logical 1 +
Figure 14-24 Defining a new logical volume

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 467
6. Upon completion of the C-SPOC operation, verify that the new logical was created locally
on the jessica node as shown Example 14-31.

Example 14-31 Newly created logical volume


root@jessica: lsvg -l truesyncvg
truesyncvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
oreolv jfs2 125 125 1 closed/syncd /oreofs
majorlv jfs2 125 125 1 closed/syncd /majorfs
truefsloglv jfs2log 1 1 1 closed/syncd N/A
micah raw 50 50 1 closed/syncd N/A

14.6.3 Increasing the size of an existing file system


To perform this task, again you use C-SPOC, which updates the local nodes within the site.
For the remote site, when a failover occurs, the lazy update process updates the volume
group information as needed. This process also adds a bit of extra time to the failover time.

To increase the size of an existing file system, follow these steps:


1. From the command line, type the smitty cl_admin command.
2. In SMIT, select the path System Management (C-SPOC)  Storage  File Systems 
Change / Show Characteristics of a File System.
3. Select the oreofs file system from the pop-up menu.
4. Complete the information in the final menu as desired and press Enter.
In this scenario, we roughly tripled the size of the file system from 500 MB (125 LPARs), as
shown in Example 14-31, to 1536 MB as shown in Figure 14-25.

Change/Show Characteristics of a Enhanced Journaled File System

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[TOP] [Entry Fields]


Volume group name truesyncvg
Resource Group Name emlecRG
* Node Names krod,maddi,bina,jessi>

* File system name /oreofs


NEW mount point [/oreofs] /
SIZE of file system
Unit Size M +
Number of units [1536] #
Mount GROUP []
Mount AUTOMATICALLY at system restart? no +
PERMISSIONS read/write +
Mount OPTIONS []
Figure 14-25 Changing the file system size

468 IBM PowerHA SystemMirror 7.1 for AIX


5. Upon completion of the C-SPOC operation, verify the new file system size locally on the
jessica node as shown in Example 14-32.

Example 14-32 Newly increased file system size


root@jessica: lsvg -l truesyncvg
truesyncvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
oreolv jfs2 384 384 1 closed/syncd /oreofs
majorlv jfs2 125 125 1 closed/syncd /majorfs
truefsloglv jfs2log 1 1 1 closed/syncd N/A
micah raw 50 50 1 closed/syncd N/A

You do not need to synchronize the cluster because all of these changes are made to an
existing volume group. However, you might want to make sure that the replicated resources
verify correctly. Use the cl_verify_tc_config command first to isolate the replicated
resources specifically.

Testing failover after making the LVM changes


Because you do not know if the cluster is going to work when needed, repeat the steps from
14.5.2, “Rolling site failure of the Austin site” on page 457. The new logical volume micah and
the additional space on /oreofs show up on each node. However, there is a noticeable
difference in the total time involved during the site failover when the lazy update was
performed to update the volume group changes.

14.6.4 Adding a LUN pair to a new volume group


The steps for adding a new volume are the same as the steps in 14.6.1, “Adding LUN pairs to
an existing volume group” on page 463. The differences are that you are creating a volume
group, which is required to add a new volume group into a resource group. For completeness,
the initial steps are documented here along with an overview of the new LUNs to be used:
1. Run the cfgmgr command on the primary node jessica.
2. Assign the PVID on the jessica node:
chdev -l hdisk43 -a pv=yes
3. Run the pairsplit command on the replicated LUNs.
4. Run the cfgmgr command on each of the remaining three nodes.
5. Verify that the PVID shows up on each node by using the lspv command.
6. Run the pairresync command on the replicated LUNs.
7. Shut down the horcm2 instance on each node:
/HORCM/usr/bin/horcmshutdown.sh 2
8. Edit the /etc/horcm2.conf file on each node as appropriate for each site:
– On the Miami site, the krod and maddi nodes added the following new line:
htcdg01 htcd04 45306 00:20
– On the Austin site, the jessica and bina nodes added the following new line:
htcdg01 htcd04 35764 00:0A
9. Restart the horcm2 instance on each node:
/HORCM/usr/bin/horcmstart.sh 2

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 469
10.Map the devices and device group on any node. We ran the raidscan command on the
jessica node. See Table 14-3 for additional configuration details.
lsdev -Cc disk|grep hdisk|/HORCM/usr/bin/raidscan -IH2 -find inst
Table 14-3 Details on the Austin and Miami LUNs
Austin - Hitachi USPV - 45306 Miami - Hitachi USPVM - 35764

Port CL1-E Port CL-1B

CU 00 CU 00

LUN 000F LUN 0021

LDEV 00:20 LDEV 00:0A

jessica hdisk# hdisk43 krod hdisk# hdisk43

bina hdisk# hdisk43 maddi hdisk# hdisk43

11.Verify that the htcgd01 device group pairs are now showing the new pairs that consist of
hdisk42 on each system as shown in Example 14-33.

Example 14-33 New LUN pairs add to htcgd01 device group


root@jessica: pairdisplay -fd -g htcdg01 -IH2 -CLI
Group PairVol L/R Device_File Seq# LDEV# P/S Status Fence Seq# P-LDEV# M
htcdg01 htcd01 L hdisk38 45306 272 P-VOL PAIR NEVER 35764 268 -
htcdg01 htcd01 R hdisk38 35764 268 S-VOL PAIR NEVER - 272 -
htcdg01 htcd02 L hdisk39 45306 273 P-VOL PAIR NEVER 35764 269 -
htcdg01 htcd02 R hdisk39 35764 269 S-VOL PAIR NEVER - 273 -
htcdg01 htcd04 L hdisk43 45306 32 P-VOL PAIR NEVER 35764 10 -
htcdg01 htcd04 R hdisk43 35764 10 S-VOL PAIR NEVER - 32 -

You are now ready to use C-SPOC to create a volume group:


1. From the command line, type the smitty cl_admin command.
2. In SMIT, select the path System Management (C-SPOC)  Storage  Volume
Groups  Create a Volume to a Volume Group.

470 IBM PowerHA SystemMirror 7.1 for AIX


3. In the Node Names panel, select the specific nodes. We chose all four as shown in
Figure 14-26.

Volume Groups

Move cursor to desired item and press Enter.

List All Volume Groups


Create a Volume Group
Create a Volume Group with Data Path Devices

+--------------------------------------------------------------------------+
| Node Names |
| |
| Move cursor to desired item and press F7. |
| ONE OR MORE items can be selected. |
| Press Enter AFTER making all selections. |
| |
| > bina |
| > jessica |
| > krod |
| > maddi |
| |
| F1=Help F2=Refresh F3=Cancel |
| F7=Select F8=Image F10=Exit |
F1| Enter=Do /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 14-26 Selecting a volume group node

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 471
4. In the Physical Volume Names panel (Figure 14-27), select hdisk43.

Volume Groups

Move cursor to desired item and press Enter.

List All Volume Groups


+--------------------------------------------------------------------------+
| Physical Volume Names |
| |
| Move cursor to desired item and press F7. |
| ONE OR MORE items can be selected. |
| Press Enter AFTER making all selections. |
| |
| 000a621aaf47ce83 ( hdisk2 on nodes bina,jessica ) |
| 000a621aaf47ce83 ( hdisk3 on nodes krod,maddi ) |
| 000cf1da43e72fc2 ( hdisk5 on nodes bina,jessica ) |
| 000cf1da43e72fc2 ( hdisk6 on nodes krod,maddi ) |
| 00cb14ce75bab41a ( hdisk43 on all selected nodes ) |
| 00cb14ceb0f5bd25 ( hdisk4 on nodes bina,jessica ) |
| 00cb14ceb0f5bd25 ( hdisk14 on nodes krod,maddi ) |
| |
| F1=Help F2=Refresh F3=Cancel |
| F7=Select F8=Image F10=Exit |
F1| Enter=Do /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 14-27 Selecting an hdisk for a new volume group

472 IBM PowerHA SystemMirror 7.1 for AIX


5. In the Volume Group Type panel, select the volume group type. We chose Scalable as
shown in Figure 14-28.

Volume Groups

Move cursor to desired item and press Enter.

List All Volume Groups


Create a Volume Group
Create a Volume Group with Data Path Devices

Set Characteristics of a Volume Group


+--------------------------------------------------------------------------+
| Volume Group Type |
| |
| Move cursor to desired item and press Enter. |
| |
| Legacy |
| Original |
| Big |
| Scalable |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 14-28 Selecting the volume group type for a new volume group

6. In the Create a Scalable Volume Group panel, select the proper resource group. We chose
emlecRG as shown in Figure 14-29.

Create a Scalable Volume Group

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[TOP] [Entry Fields]


Node Names bina,jessica,krod,mad>
Resource Group Name [emlecRG] +
PVID 00cb14ce75bab41a
VOLUME GROUP name [truetarahvg]
Physical partition SIZE in megabytes 4 +
Volume group MAJOR NUMBER [58] #
Enable Cross-Site LVM Mirroring Verification false +
Enable Fast Disk Takeover or Concurrent Access no +
Volume Group Type Scalable
Maximum Physical Partitions in units of 1024 32 +
Maximum Number of Logical Volumes 256 +
Figure 14-29 Create a volume group final C-SPOC SMIT menu

7. Choose a volume group name. We chose truetarahvg. Press Enter.

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 473
8. Verify that the volume group is successfully created, which we do on all four nodes as
shown in Example 14-34.

Example 14-34 Newly created volume group on all nodes


root@jessica: lspv |grep truetarahvg
hdisk43 00cb14ce75bab41a truetarahvg

root@bina: lspv |grep truetarahvg


hdisk43 00cb14ce75bab41a truetarahvg

root@krod: lspv |grep truetarahvg


hdisk43 00cb14ce75bab41a truetarahvg

root@maddi: lspv |grep truetarahvg


hdisk43 00cb14ce75bab41a truetarahvg

When creating the volume group, the volume group is automatically added to the resource
group as shown in Example 14-35. However, we do not have to change the resource
group any further, because the new disk and device are added to the same device group
and TrueCopy/HUR replicated resource.

Example 14-35 Newly added volume group also added to the resource group
Resource Group Name emlecRG
Participating Node Name(s) jessica bina maddi
Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node
Fallback Policy Never Fallback
Site Relationship Prefer Primary Site
Node Priority
Service IP Label service_1
Volume Groups truesyncvg truetarahvg
Hitachi TrueCopy Replicated Resources truelee

9. Repeat the steps in 14.6.2, “Adding a new logical volume” on page 466, to create a new
logical volume, named tarahlv on the newly created volume group truetarahvg.
Example 14-36 shows the new logical volume.

Example 14-36 New logical volume on newly added volume group


root@jessica: lsvg -l truetarahvg
truetarahvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
tarahlv raw 25 25 1 closed/syncd N/A

10.Manually run the cl_verify_tc_config command to verify that the new addition of the
replicated resources is complete.

474 IBM PowerHA SystemMirror 7.1 for AIX


Important: During our testing, we encountered a defect after the second volume group
was added to the resource group. The cl_verify_tc_config command produced the
following error messages:
cl_verify_tc_config: ERROR - Disk hdisk38 included in Device Group htcdg01 does
not match any hdisk in Volume Group truetarahvg.
cl_verify_tc_config: ERROR - Disk hdisk39 included in Device Group htcdg01 does
not match any hdisk in Volume Group truetarahvg.
cl_verify_tc_config: ERROR - Disk hdisk42 included in Device Group htcdg01 does
not match any hdisk in Volume Group truetarahvg.
Errors found verifying the HACMP TRUECOPY/HUR configuration. Status=3

These results incorrectly imply a one to one relationship between the device
group/replicated resource and the volume group, which is not intended. To work around
this problem, ensure that the cluster is down, do a forced synchronization, and then start
the cluster but ignore the verification errors. Usually performing both a forced
synchronization and then starting the cluster ignoring errors is not recommended. Contact
IBM support to see if a fix is available.

Synchronize the resource group change to include the new volume that you just added.
Usually you can perform this task within a running cluster. However, because of the defect
mentioned in the previous Important box, we had to have the cluster down to synchronize it.
To perform this task, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Verification and
Synchronization and Verification
3. In the HACMP Verification and Synchronization display (Figure 14-30), for Force
synchronization if verification fails, select Yes.

HACMP Verification and Synchronization

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Verify, Synchronize or Both [Both] +
* Automatically correct errors found during [No] +
verification?

* Force synchronization if verification fails? [Yes] +


* Verify changes only? [No] +
* Logging [Standard] +

F1=Help F2=Refresh F3=Cancel F4=List


F5=Reset F6=Command F7=Edit F8=Image
Figure 14-30 Extended Verification and Synchronization SMIT menu

4. Verify the information is correct, and press Enter. Upon completion, the cluster
configuration is in sync and can now be tested.
5. Repeat the steps for a rolling system failure as explained in 14.5.2, “Rolling site failure of
the Austin site” on page 457. In this scenario, the tests are successful.

Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 475
Testing failover after adding a new volume group
Because you do not know if the cluster is going to work when needed, repeat the steps of a
rolling site failure as explained in 14.5.2, “Rolling site failure of the Austin site” on page 457.
The new volume group truetarahvg and new logical volume tarahlv are displayed on each
node. However, there is a noticeable difference in total time involved during the site failover
when the lazy update is performed to update the volume group changes.

476 IBM PowerHA SystemMirror 7.1 for AIX


A

Appendix A. CAA cluster commands


This appendix provides a list of the Cluster Aware AIX (CAA) administration commands, and
examples of how to use them. The information about these commands has been gathered
from the new AIX man pages and placed in this appendix for your reference. This list is not an
exhaustive list of all new commands, but focuses on commands that you might come across
during the administration of your PowerHA cluster.

This appendix includes the following topics:


 The lscluster command
 The mkcluster command
 The rmcluster command
 The chcluster command
 The clusterconf command

© Copyright IBM Corp. 2011. All rights reserved. 477


The lscluster command
The lscluster command lists the cluster configuration information.

Syntax
lscluster -i [ -n ] | -s | -m | -d | -c

Description
The lscluster command shows the attributes that are associated with the cluster and the
cluster configuration.

Flags
-i Lists the cluster configuration interfaces on the local node.
-n Allows the cluster name to be queried for all interfaces (applicable only with the -i
flag).
-s Lists the cluster network statistics on the local node.
-m Lists the cluster node configuration information.
-d Lists the cluster storage interfaces.
-c Lists the cluster configuration.

Examples
 To list the cluster configuration for all nodes, enter the following command:
lscluster -m
 To list the cluster statistics for the local node, enter the following command:
lscluster -s
 To list the interface information for the local node, enter the following command:
lscluster -i
 To list the interface information for the cluster, enter the following command:
lscluster -i -n mycluster
 To list the storage interface information for the cluster, enter the following command:
lscluster -d
 To list the cluster configuration, enter the following command:
lscluster -c

The mkcluster command


The mkcluster command creates a cluster.

Syntax
mkcluster [ -n clustername ] [ -m node[,...] ] -r reposdev [-d shareddisk [,...]]
[-s multaddr_local ] [-v ]

478 IBM PowerHA SystemMirror 7.1 for AIX


Description
The mkcluster command creates a cluster. Each node that is added to the cluster must have
common storage area network (SAN) storage devices that are configured and zoned
appropriately. The SAN storage devices are used for the cluster repository disk and for any
clustered shared disks. (The shared disks that are added to a cluster configuration share the
same name across all the nodes in the cluster.)

A multicast address is used for cluster communications between the nodes in the cluster.
Therefore, if any network considerations must be reviewed before creating a cluster, consult
your network systems administrator.

Flags
-n clustername Sets the name of the local cluster being created. If no name is
specified when you first run the mkcluster command, a default of
SIRCOL_hostname is used, where hostname is the name
(gethostname()) of the local host.
-m node[,...] Lists the comma-separated resolvable host names or IP addresses for
nodes that are members of the cluster. The local host must be
included in the list. If the -m option is not used, the local host is implied,
causing a one-node local cluster to be created.
-r reposdev Specifies the name, such as hdisk10, of the SAN-shared storage
device that is used as the central repository for the cluster
configuration data. This device must be accessible from all nodes.
This device is required to be a minimum of 1 GB in size and backed by
a redundant and highly available SAN configuration. This flag is
required when you first run the mkcluster command within a Storage
Interconnected Resource Collection (SIRCOL), and cannot be used
thereafter.
-d shareddisk[,...] Specifies a comma-separated list of SAN-shared storage devices,
such as hdisk12,hdisk34, to be incorporated into the cluster
configuration.
These devices are renamed with a cldisk prefix. The same name is
assigned to this device on all cluster nodes from which the device is
accessible. Specified devices must not be open when the mkcluster
command is executed. This flag is used only when you first run the
mkcluster command.
-s multaddr_local Sets the multicast address of the local cluster that is being created.
This address is used for internal communication within the local
cluster. If the -s option is not specified when you first run the
mkcluster command within a SIRCOL, a multicast address is
automatically generated. This flag is used only when you first run the
mkcluster command within a SIRCOL.
-v Specifies the verbose mode.

Examples
 To create a cluster of one node and use the default values, enter the following command:
mkcluster -r hdisk1
The output is a cluster named SIRCOL_myhostname with a single node in the cluster. The
multicast address is automatically generated, and no shared disks are created for this
cluster. The repository device is set up on hdisk1, and this disk cannot be used by the

Appendix A. CAA cluster commands 479


node for any other purpose. The repository device is now dedicated to being the cluster
repository disk.
 To create a multinode cluster, enter the following command:
mkcluster -n mycluster -m nodeA,nodeB,nodeC -r hdisk1 -d
hdisk10,hdisk11,hdisk12
The output is a cluster of three nodes and uses the default values. The output also creates
a cluster with the specified name, and the multicast address is automatically created.
Three disks are created as shared clustered disks for this cluster, and these disks share
the same name across all the nodes in this cluster. You can run the lspv command to see
the new names after the cluster is created. The repository device is set up on hdisk1 and
cannot be used by any of the nodes for any other purpose. The repository device is now
dedicated to being the cluster repository disk. A volume group is created for the cluster
repository disk. These logical volumes are used exclusively by the clustering subsystem.

The rmcluster command


The rmcluster command removes the cluster configuration.

Syntax
rmcluster -n name [-f] [-v]

Description
The rmcluster command removes the cluster configuration. The repository disk and all SAN
Volume Controller (SVC) shared disks are released, and the SAN shared disks are
re-assigned to a generic hdisk name. The generic hdisk name cannot be the same name that
was initially used to add the disk to the cluster.

Flags
-n name Specifies the name of the cluster to be removed.
-f Forces certain errors to be ignored.
-v Specifies the verbose.

Example
To remove the cluster configuration, enter the following command:
rmcluster -n mycluster

The chcluster command


The chcluster command is used to change the cluster configuration.

Syntax
chcluster [ -n name ] [{ -d | -m } [+|-] name [,....]] ..... [ -q ][ -f ][ -v ]

Description
The chcluster command changes the cluster configuration. With this command, SAN shared
disks and nodes can be added and removed from the cluster configuration.

480 IBM PowerHA SystemMirror 7.1 for AIX


Flags
-d [+|-]shareddisk[,...]
Specifies a comma-separated list of shared storage-device names to
be added to or removed from a cluster configuration. The new shared
disks are renamed with a cldisk prefix. The same name is assigned to
this device on all cluster nodes from which the device can be
accessed. Deleted devices are re-assigned a generic hdisk name.
This newly reassigned hdisk name might not be the same as it was
before it was added to the cluster configuration. The shared disks
must not be open when the chcluster command is executed.
-m [+|-]node[,...] Specifies a comma-separated list of node names to be added or
removed from the cluster configuration.
-n name Specifies the name of the cluster to be changed. If omitted, the default
cluster is used.
-q The quick mode option, which performs the changes on the local node
only. If this option is used, the other nodes in the cluster configuration
are asynchronously contacted and the changes are performed.
-f The force option, which causes certain errors to be ignored.
-v Verbose mode

Examples
 To add shared disks to the cluster configuration, enter the following command:
chcluster -n mycluster -d +hdisk20,+hdisk21
 To remove shared disks from the cluster configuration, enter the following command:
chcluster -n mycluster -d -hdisk20,-hdisk21
 To add nodes to the cluster configuration, enter the following command:
chcluster -n mycluster -m +nodeD,+nodeE
 To remove nodes from the cluster configuration, enter the following command:
chcluster -n mycluster -m -nodeD,-nodeE

The clusterconf command


The clusterconf command is a service utility for administration of a cluster configuration.

Syntax
clusterconf [ -u [-f ] | -s | -r hdiskN ] [-v ]

Description
The clusterconf command allows administration of the cluster configuration. A node in a
cluster configuration might indicate a status of DOWN (viewable by issuing the lscluster -m
command). Alternatively, a node in a cluster might not be displayed in the cluster configuration,
and you know the node is part of the cluster configuration (viewable from another node in the
cluster by using the lscluster -m command). In these cases, the following flags allow the
node to search and read the repository disk and take self-correcting actions.

Do not use the clusterconf command option to remove a cluster configuration. Instead, use
the rmcluster command for normal removal of the cluster configuration.

Appendix A. CAA cluster commands 481


Flags
If no flags are specified, the clusterconf command performs a refresh operation by retrieving
the cluster repository configuration and performing the necessary actions. The following
actions might occur:
 A cluster node joins a cluster of which the node is a member and for some reason was
disconnected from the cluster (either from network or SAN problems)
 A cluster node might perform a resync with the cluster repository configuration (again from
some problems in the network or SAN)
 A cluster node might leave the cluster configuration if the node was removed from the
cluster repository configuration.

The clusterconf command is a normal cluster service and is automatically handled during
normal operation. This following flags are possible for this command:
-r hdiskN Has the cluster subsystem read the repository device if you know where the
repository disk is (lspv and look for cvg). It causes the node to join the cluster if
the node is configured in the repository disk.
-s Performs an exhaustive search for a cluster repository disk on all configured
hdisk devices. It stops when a cluster repository disk is found. This option
searches all disks that are looking for the signature of a repository device. If a
disk is found with the signature identifying it as the cluster repository, the search
is stopped. If the node finds itself in the cluster configuration on the disk, the
node joins the cluster. If the storage network is dirty and multiple repositories
are in the storage network (not supported), it stops at the first repository disk. If
the node is not in that repository configuration, it does not join the cluster.
Use the -v flag to see which disk was found. Then use the other options on the
clusterconf command to clean up the storage network until the desired results
are achieved.
-u Performs the unconfigure operation for the local node. If the node is in the
cluster repository configuration on the shared disk to which the other nodes
have access, the other nodes in the cluster request this node to rejoin the
cluster. The -u option is used when cleanup must be performed on the local
node. (The node was removed from the cluster configuration. For some reason,
the local node was either down or inaccessible from the network to be removed
during normal removal operations such as when the chcluster -m -nodeA
command was run). The updates to clean up the environment on the local node
are performed by the unconfigure operation.
-f The force option, which performs the unconfigure operation and ignores errors.
-v Verbose mode.

Examples
 To clean up the local node, the following command cleans up the nodes environment:
clusterconf -fu
 To recover the cluster configuration and start cluster services, enter the following
command:
clusterconf -r hdisk1
 To search for the cluster repository device and join the cluster, enter the following
command:
clusterconf -s

482 IBM PowerHA SystemMirror 7.1 for AIX


B

Appendix B. PowerHA SMIT tree


This appendix includes the PowerHA v7.1 SMIT tree. Depending on the version of PowerHA
that you have installed, you might notice some differences.

Note the following explanation to help you understand how to read the tree:
 The number of right-pointing double quotation marks (») indicates the number of screens
that you have to go down in the PowerHA SMIT tree. For example, » » » means that you
must page down three screens.
 The double en dashes (--) are used as a separator between the SMIT text and the SMIT
fast path.
 The parentheses (()) indicate the fast path.

» Cluster Nodes and Networks -- (cm_cluster_nodes_networks)


» » Initial Cluster Setup (Typical) -- (cm_setup_menu)
» » » Setup a Cluster, Nodes and Networks -- (cm_setup_cluster_nodes_networks)
» » » Define Repository Disk and Cluster IP Address -- cm_define_repos_ip_addr)
» » » What are a repository disk and cluster IP address ? -- (cm_whatis_repos_ip_addr)
» » Manage the Cluster -- (cm_manage_cluster)
» » »PowerHA SystemMirror Configuration -- (cm_show_cluster_top)
» » »Remove the Cluster Definition -- (cm_remove_cluster)
» » » Snapshot Configuration -- (cm_cfg_snap_menu)
» » » » Create a Snapshot of the Cluster Configuration -- (cm_add_snap.dialog)
» » » » Change/Show a Snapshot of the Cluster Configuration -- (cm_show_snap.select)
» » » » Remove a Snapshot of the Cluster Configuration -- (cm_rm_snap.select)
» » » » Restore the Cluster Configuration From a Snapshot -- (cm_apply_snap.select)
» » » » Configure a Custom Snapshot Method -- (clsnapshot_custom_menu)
» » » » » Add a Custom Snapshot Method -- (clsnapshot_custom_dialog_add)
» » » » » Change/Show a Custom Snapshot Method -- (clsnapshot_custom_dialog_cha.select)
» » » » » Remove a Custom Snapshot Method -- (clsnapshot_custom_dialog_rem.select)
» » Manage Nodes -- (cm_manage_nodes)
» » » Show Topology Information by Node -- (cllsnode_menu)
» » » » Show All Nodes -- (cllsnode.dialog)
» » » » Select a Node to Show -- (cllsnode_select)
» » » Add a Node -- (cm_add_node)
» » » Change/Show a Node -- (cm_change_show_node)
» » » Remove Nodes -- (cm_remove_node)
» » » Configure Persistent Node IP Label/Addresses -- (cm_persistent_addresses)

© Copyright IBM Corp. 2011. All rights reserved. 483


» » » » Add a Persistent Node IP Label/Address --
(cm_add_a_persistent_node_ip_label_address_select)
» » » » Change/Show a Persistent Node IP Label/Address --
(cm_change_show_a_persistent_node_ip_label_address_select)
» » » » Remove a Persistent Node IP Label/Address --
(cm_delete_a_persistent_node_ip_label_address_select)
» » » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync)
» » Manage Networks and Network Interfaces -- (cm_manage_networks_interfaces)
» » » Networks -- (cm_manage_networks_menu)
» » » » Add a Network -- (cm_add_network)
» » » » Change/Show a Network -- (cm_change_show_network)
» » » » Remove a Network -- (cm_remove_network)
» » » Network Interfaces -- (cm_manage_interfaces_menu)
» » » » Add a Network Interface -- (cm_add_interfaces)
» » » » Change/Show a Network Interface -- (cm_change_show_interfaces)
» » » » Remove a Network Interface -- (cm_remove_interfaces)
» » » Show Topology Information by Network -- (cllsnw_menu)
» » » » Show All Networks -- (cllsnw.dialog)
» » » » Select a Network to Show -- (cllsnw_select)
» » » Show Topology Information by Network Interface -- (cllsif_menu)
» » » » Show All Network Interfaces -- (cllsif.dialog)
» » » » Select a Network Interface to Show -- (cllsif_select)
» » » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync)
» » Discover Network Interfaces and Disks -- (cm_discover_nw_interfaces_and_disks)
» » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync)

» Cluster Applications and Resources -- (cm_apps_resources)


» » Make Applications Highly Available (Use Smart Assists) -- (clsa)
» » Resources -- (cm_resources_menu)
» » » Configure User Applications (Scripts and Monitors) -- (cm_user_apps)
» » » » Application Controller Scripts -- (cm_app_scripts)
» » » » » Add Application Controller Scripts -- (cm_add_app_scripts)
» » » » » Change/Show Application Controller Scripts -- (cm_change_show_app_scripts)
» » » » » Remove Application Controller Scripts -- (cm_remove_app_scripts)
» » » » » What is an "Application Controller" anyway ? -- (cm_app_controller_help)
» » » » Application Monitors -- (cm_appmon)
» » » » » Configure Process Application Monitors -- (cm_cfg_process_appmon)
» » » » » » Add a Process Application Monitor -- (cm_add_process_appmon)
» » » » » » Change/Show Process Application Monitor -- (cm_change_show_process_appmon)
» » » » » » Remove a Process Application Monitor -- (cm_remove_process_appmon)
» » » » » Configure Custom Application Monitors -- (cm_cfg_custom_appmon)
» » » » » » Add a Custom Application Monitor -- (cm_add_custom_appmon)
» » » » » » Change/Show Custom Application Monitor -- (cm_change_show_custom_appmon)
» » » » » » Remove a Custom Application Monitor -- (cm_remove_custom_appmon)
» » » » Configure Application for Dynamic LPAR and CoD Resources -- (cm_cfg_appondemand)
» » » » » Configure Communication Path to HMC -- (cm_cfg_apphmc)
» » » » » » Add HMC IP addresses for a node -- (cladd_apphmc.dialog)
» » » » » » Change/Show HMC IP addresses for a node -- (clch_apphmc.select)
» » » » » » Remove HMC IP addresses for a node -- (clrm_apphmc.select)
» » » » » Configure Dynamic LPAR and CoD Resources for Applications -- (cm_cfg_appdlpar)
» » » » » » Add Dynamic LPAR and CoD Resources for Applications -- (cm_add_appdlpar)
» » » » » » Change/Show Dynamic LPAR and CoD Resources for Applications --
(cm_change_show_appdlpar)
» » » » » » Remove Dynamic LPAR and CoD Resources for Applications -- (cm_remove_appdlpar)
» » » » Show Cluster Applications -- (cldisp.dialog)
» » » Configure Service IP Labels/Addresses -- (cm_service_ip)
» » » » Add a Service IP Label/Address -- (cm_add_a_service_ip_label_address.select_net)
» » » » Change/Show a Service IP Label/Address -- (cm_change_service_ip.select)
» » » » Remove Service IP Label(s)/Address(es) -- (cm_delete_service_ip.select)
» » » » Configure Service IP Label/Address Distribution Preferences --

484 IBM PowerHA SystemMirror 7.1 for AIX


(cm_change_show_service_ip_distribution_preference_select)
» » » Configure Tape Resources -- (cm_cfg_tape)
» » » » Add a Tape Resource -- (cm_add_tape)
» » » » Change/Show a Tape Resource -- (cm_change_tape)
» » » » Remove a Tape Resource -- (cm_remove_tape)
» » » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync)
» » Resource Groups -- (cm_resource_groups)
» » » Add a Resource Group -- (cm_add_resource_group)
» » » Change/Show Nodes and Policies for a Resource Group --
(cm_change_show_rg_nodes_policies)
» » » Change/Show Resources and Attributes for a Resource Group --
(cm_change_show_rg_resources)
» » » Remove a Resource Group -- (cm_remove_resource_group)
» » » Configure Resource Group Run-Time Policies --
(cm_config_resource_group_run-time_policies_menu_dmn)
» » » » Configure Dependencies between Resource Groups -- (cm_rg_dependencies_menu)
» » » » » Configure Parent/Child Dependency -- (cm_rg_dependencies)
» » » » » » Add Parent/Child Dependency between Resource Groups --
(cm_rg_dependencies add.select)
» » » » » » Change/Show Parent/Child Dependency between Resource Groups --
(cm_rg_dependencies ch.select)
» » » » » » Remove Parent/Child Dependency between Resource Groups --
(cm_rg_dependencies rm.select)
» » » » » » Display All Parent/Child Resource Group Dependencies --
(cm_rg_dependencies display.select)
» » » » » Configure Start After Resource Group Dependency --
(cm_rg_dependencies_startafter_main_menu)
» » » » » » Add Start After Resource Group Dependency -- (cm_rg_dependencies add.select startafter)
» » » » » » Change/Show Start After Resource Group Dependency --
(cm_rg_dependencies ch.select startafter)
» » » » » » Remove Start After Resource Group Dependency --
(cm_rg_dependencies rm.select startafter)
» » » » » » Display Start After Resource Group Dependencies --
(cm_rg_dependencies display.select startafter)
» » » » » Configure Stop After Resource Group Dependency --
(cm_rg_dependencies_stopafter_main_menu)
» » » » » » Add Stop After Resource Group Dependency --
(cm_rg_dependencies add.select stopafter)
» » » » » » Change/Show Stop After Resource Group Dependency --
(cm_rg_dependencies ch.select stopafter)
» » » » » » Remove Stop After Resource Group Dependency --
(cm_rg_dependencies rm.select stopafter)
» » » » » » Display Stop After Resource Group Dependencies --
(cm_rg_dependencies display.select stopafter)
» » » » » Configure Online on the Same Node Dependency -- (cm_rg_osn_dependencies)
» » » » » » Add Online on the Same Node Dependency Between Resource Groups --
(cm_rg_osn_dependencies add.dialog)
» » » » » » Change/Show Online on the Same Node Dependency Between Resource Groups --
(cm_rg_osn_dependencies ch.select)
» » » » » » Remove Online on the Same Node Dependency Between Resource --
(cm_rg_osn_dependencies rm.select)
» » » » » Configure Online on Different Nodes Dependency -- (cm_rg_odn_dependencies.dialog)
» » » » Configure Resource Group Processing Ordering -- (cm_processing_order)
» » » » Configure PowerHA SystemMirror Workload Manager Parameters -- (cm_cfg_wlm_runtime)
» » » » Configure Delayed Fallback Timer Policies -- (cm_timer_menu)
» » » » » Add a Delayed Fallback Timer Policy -- (cm_timer_add.select)
» » » » » Change/Show a Delayed Fallback Timer Policy -- (cm_timer_update.select)
» » » » » Remove a Delayed Fallback Timer Policy -- (cm_timer_remove.select)
» » » » Configure Settling Time for Resource Groups -- (cm_settling_timer_menu)
» » » Show All Resources by Node or Resource Group --

Appendix B. PowerHA SMIT tree 485


(cm_show_all_resources_by_node_or_resource_group_menu_dmn
» » » » Show Resource Information by Node -- (cllsres.select)
» » » » Show Resource Information by Resource Group -- (clshowres.select)
» » » » Show Current State of Applications and Resource Groups --
(cm_show_current_state_application_resource_group_menu_dwn)
» » » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync)
» » » What is a "Resource Group" anyway ? -- (cm_resource_group_help)
» » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync)

» System Management (C-SPOC) -- (cm_system_management_cspoc_menu_dmn)


» » Storage -- (cl_lvm)
» » » Volume Groups -- (cl_vg)
» » » » List All Volume Groups -- (cl_lsvgA)
» » » » Create a Volume Group -- (cl_createvg)
» » » » Create a Volume Group with Data Path Devices -- (cl_createvpathvg)
» » » » Set Characteristics of a Volume Group -- (cl_vgsc)
» » » » » Add a Volume to a Volume Group -- (cl_extendvg)
» » » » » Change/Show characteristics of a Volume Group -- (cl_chshsvg)
» » » » » Remove a Volume from a Volume Group -- (cl_reducevg)
» » » » » Enable/Disable a Volume Group for Cross-Site LVM Mirroring Verification --
(hacmp_sm_lv_svg_sc_ed)
» » » » Enable a Volume Group for Fast Disk Takeover or Concurrent Access -- (cl_vgforfdto)
» » » » Import a Volume Group -- (cl_importvg)
» » » » Mirror a Volume Group -- (cl_mirrorvg)
» » » » Unmirror a Volume Group -- (cl_unmirrorvg)
» » » » Manage Critical Volume Groups -- (cl_manage_critical_vgs)
» » » » » Mark a Volume Group as Critical -- (cl_mark_critical_vg.select)
» » » » » Show all Critical volume groups -- (cl_show_critical_vgs)
» » » » » Mark a Volume Group as non-Critical -- (cl_mark_noncritical_vg.select)
» » » » » Configure failure actions for Critical Volume Groups -- (cl_set_critical_vg_response)
» » » » Synchronize LVM Mirrors -- (cl_syncvg)
» » » » » Synchronize by Volume Group -- (cl_syncvg_vg)
» » » » » Synchronize by Logical Volume -- (cl_syncvg_lv)
» » » » Synchronize a Volume Group Definition -- (cl_updatevg)
» » » Logical Volumes -- (cl_lv)
» » » » List All Logical Volumes by Volume Group -- (cl_lslv0)
» » » » Add a Logical Volume -- (cl_mklv)
» » » » Show Characteristics of a Logical Volume -- (cl_lslv)
» » » » Set Characteristics of a Logical Volume -- (cl_lvsc)
» » » » » Rename a Logical Volume -- (cl_renamelv)
» » » » » Increase the Size of a Logical Volume -- (cl_extendlv)
» » » » » Add a Copy to a Logical Volume -- (cl_mklvcopy)
» » » » » Remove a Copy from a Logical Volume -- (cl_rmlvcopy)
» » » » Change a Logical Volume -- (cl_chlv1)
» » » » Remove a Logical Volume -- (cl_rmlv1)
» » » File Systems -- (cl_fs)
» » » » List All File Systems by Volume Group -- (cl_lsfs)
» » » » Add a File System -- (cl_mkfs)
» » » » Change / Show Characteristics of a File System -- (cl_chfs)
» » » » Remove a File System -- (cl_rmfs)
» » » Physical Volumes -- (cl_disk_man)
» » » » Add a Disk to the Cluster -- (cl_disk_man add nodes)
» » » » Remove a Disk From the Cluster -- (cl_disk_man rem nodes)
» » » » Cluster Disk Replacement -- (cl_disk_man.replace)
» » » » Cluster Data Path Device Management -- (cl_dpath_mgt)
» » » » » Display Data Path Device Configuration -- (cl_dpls_cfg.select)
» » » » » Display Data Path Device Status -- (cl_dp_stat.select)
» » » » » Display Data Path Device Adapter Status -- (cl_dpdadapter_stat.select)
» » » » » Define and Configure all Data Path Devices -- (cl_dpdefcfg_all.select)
» » » » » Add Paths to Available Data Path Devices -- (cl_dpaddpaths.select)

486 IBM PowerHA SystemMirror 7.1 for AIX


» » » » » Configure a Defined Data Path Device -- (cl_dpconfdef.select)
» » » » » Remove a Data Path Device -- (cl_dprmvp.select)
» » » » » Convert ESS hdisk Device Volume Group to an SDD VPATH Device --
(cl_dphd2vp.select)
» » » » » Convert SDD VPATH Device Volume Group to an ESS hdisk Device --
(cl_dpvp2hd.select)
» » » » Configure Disk/Site Locations for Cross-Site LVM Mirroring -- (hacmp_sm_pv_xsm_ds)
» » » » » Add Disk/Site Definition for Cross-Site LVM Mirroring -- (hacmp_sm_pv_xsm_ds_ad)
» » » » » Change/Show Disk/Site Definition for Cross-Site LVM Mirroring -- (hacmp_sm_pv_xsm_ds_cs)
» » » » » Remove Disk/Site Definition for Cross-Site LVM Mirroring -- (hacmp_sm_pv_xsm_ds_rm)
» » PowerHA SystemMirror Services -- (cl_cm_startstop_menu)
» » » Start Cluster Services -- (clstart)
» » » Stop Cluster Services -- (clstop)
» » » Show Cluster Services -- (clshowsrv.dialog)
» » Communication Interfaces --
(cm_hacmp_communication_interface_management_menu_dmn)
» » » Configure Communication Interfaces/Devices to the Operating System on a Node --
(cm_config_comm_dev_node.select)
» » » Update PowerHA SystemMirror Communication Interface with AIX Settings --
(cm_update_hacmp_interface_with_aix_settings)
» » » Swap IP Addresses between Communication Interfaces -- (cl_swap_adapter)
» » » PCI Hot Plug Replace a Network Interface Card --(cl_pcihp)
» » Resource Groups and Applications --
(cm_hacmp_resource_group_and_application_management_menu)
» » » Show the Current State of Applications and Resource Groups --
(cm_show_current_state_application_resource_group_menu_dwn)
» » » Bring a Resource Group Online -- (cl_resgrp_start.select)
» » » Bring a Resource Group Offline -- (cl_resgrp_stop.select)
» » » Move Resource Groups to Another Node -- (cl_resgrp_move_node.select)
» » » Suspend/Resume Application Monitoring -- (cm_suspend_resume_menu)
» » » » Suspend Application Monitoring -- (cm_suspend_appmon.select)
» » » » Resume Application Monitoring -- (cm_resume_appmon.select)
» » » Application Availability Analysis -- (cl_app_AAA.dialog)
» » PowerHA SystemMirror Logs -- (cm_hacmp_log_viewing_and_management_menu_dmn)
» » » View/Save/Delete PowerHA SystemMirror Event Summaries -- (cm_dsp_evs)
» » » » View Event Summaries -- (cm_show_evs)
» » » » Save Event Summaries to a file -- (dspevs.dialog)
» » » » Delete Event Summary History -- (cm_del_evs)
» » » View Detailed PowerHA SystemMirror Log Files -- (cm_log_menu)
» » » » Scan the PowerHA SystemMirror for AIX Scripts log -- (cm_scan_scripts_log_select)
» » » » Watch the PowerHA SystemMirror for AIX Scripts log -- (cm_watch_scripts_log.dialog)
» » » » Scan the PowerHA SystemMirror for AIX System log -- (cm_scan_syslog.dialog)
» » » » Watch the PowerHA SystemMirror for AIX System log -- (cm_watch_syslog.dialog)
» » » » Scan the C-SPOC System Log File -- (cl_scan_syslog.dialog)
» » » » Watch the C-SPOC System Log File -- (cl_watch_syslog.dialog)
» » » Change/Show PowerHA SystemMirror Log File Parameters -- (cm_run_time.select)
» » » Change/Show Cluster Manager Log File Parameters -- (cluster_manager_log_param)
» » » Change/Show a Cluster Log Directory -- (clusterlog_redir.select)
» » » Change All Cluster Logs Directory -- (clusterlog_redirall_cha)
» » » Collect Cluster log files for Problem Reporting -- (cm_clsnap_dialog)
» » File Collections -- (cm_filecollection_menu)
» » » Manage File Collections -- (cm_filecollection_mgt)
» » » » Add a File Collection -- (cm_filecollection_add)
» » » » Change/Show a File Collection -- (cm_filecollection_ch)
» » » » Remove a File Collection -- (cm_filecollection_rm)
» » » » Change/Show Automatic Update Time -- (cm_filecollection_time)
» » » Manage File in File Collections -- (cm_filesinfilecollection_mgt)
» » » » Add Files to a File Collection -- (cm_filesinfilecollection_add)
» » » » Remove Files from a File Collection -- (cm_filesfromfilecollection_selectfc)
» » » Propagate Files in File Collections -- (cm_filecollection_prop)

Appendix B. PowerHA SMIT tree 487


» » Security and Users -- (cl_usergroup)
» » » PowerHA SystemMirror Cluster Security -- (cm_config_security)
» » » » Configure Connection Authentication Mode -- (cm_config_security.connection)
» » » » Configure Message Authentication Mode and Key Management --
(cm_config_security.message)
» » » » » Configure Message Authentication Mode -- (cm_config_security.message_dialog)
» » » » » Generate/Distribute a Key -- (cm_config_security.message_key_dialog)
» » » » » Enable/Disable Automatic Key Distribution -- (cm_config_security.keydist_message_dialog)
» » » » » Activate the new key on all PowerHA SystemMirror cluster node --
(cm_config_security.keyrefr_message_dialog)
» » » Users in an PowerHA SystemMirror cluster -- (cl_users)
» » » » Add a User to the Cluster -- (cl_mkuser)
» » » » Change / Show Characteristics of a User in the Cluster -- (cl_chuser)
» » » » Remove a User from the Cluster -- (cl_rmuser)
» » » » List Users in the Cluster -- (cl_lsuser.hdr)
» » » Groups in an PowerHA SystemMirror cluster -- (cl_groups)
» » » » List All Groups in the Cluster -- (cl_lsgroup.hdr)
» » » » Add a Group to the Cluster -- (cl_mkgroup)
» » » » Change / Show Characteristics of a Group in the Cluster -- (cl_chgroup)
» » » » Remove a Group from the Cluster -- (cl_rmgroup)
» » » Passwords in an PowerHA SystemMirror cluster -- (cl_passwd)
» » » » Change a User's Password in the Cluster -- (cl_chpasswd)
» » » » Change Current Users Password -- (cl_chuserpasswd)
» » » » Manage List of Users Allowed to Change Password -- (cl_manageusers)
» » » » List Users Allowed to Change Password -- (cl_listmanageusers)
» » » » Modify System Password Utility -- (cl_modpasswdutil)
» » Open a SMIT Session on a Node -- (cm_open_a_smit_session_select)

» Problem Determination Tools -- (cm_problem_determination_tools_menu_dmn)


» » PowerHA SystemMirror Verification -- (cm_hacmp_verification_menu_dmn)
» » » Verify Cluster Configuration -- (clverify.dialog)
» » » Configure Custom Verification Method -- (clverify_custom_menu)
» » » » Add a Custom Verification Method -- (clverify_custom_dialog_add)
» » » » Change/Show a Custom Verification Method -- (clverify_custom_dialog_cha.select)
» » » » Remove a Custom Verification Method -- (clverify_custom_dialog_rem.select)
» » » Automatic Cluster Configuration Monitoring -- (clautover.dialog)
» » View Current State -- (cm_view_current_state_menu_dmn)
» » PowerHA SystemMirror Log Viewing and Management --
(cm_hacmp_log_viewing_and_management_menu_dmn)
» » » View/Save/Delete PowerHA SystemMirror Event Summaries -- (cm_dsp_evs)
» » » » View Event Summaries -- (cm_show_evs)
» » » » Save Event Summaries to a file -- (dspevs.dialog)
» » » » Delete Event Summary History -- (cm_del_evs)
» » » View Detailed PowerHA SystemMirror Log Files -- (cm_log_menu)
» » » » Scan the PowerHA SystemMirror for AIX Scripts log -- (cm_scan_scripts_log_select)
» » » » Watch the PowerHA SystemMirror for AIX Scripts log -- (cm_watch_scripts_log.dialog)
» » » » Scan the PowerHA SystemMirror for AIX System log -- (cm_scan_syslog.dialog)
» » » » Watch the PowerHA SystemMirror for AIX System log -- (cm_watch_syslog.dialog)
» » » » Scan the C-SPOC System Log File -- (cl_scan_syslog.dialog)
» » » » Watch the C-SPOC System Log File -- (cl_watch_syslog.dialog)
» » » Change/Show PowerHA SystemMirror Log File Parameters -- (cm_run_time.select)
» » » Change/Show Cluster Manager Log File Parameters -- (cluster_manager_log_param)
» » » Change/Show a Cluster Log Directory -- (clusterlog_redir.select)
» » » Change All Cluster Logs Directory -- (clusterlog_redirall_cha)
» » » Collect Cluster log files for Problem Reporting -- (cm_clsnap_dialog)
» » Recover From PowerHA SystemMirror Script Failure -- (clrecover.dialog.select)
» » Restore PowerHA SystemMirror Configuration Database from Active Configuration --
(cm_copy_acd_2dcd.dialog)
» » Release Locks Set By Dynamic Reconfiguration -- (cldarelock.dialog)
» » Cluster Test Tool -- (hacmp_testtool_menu)

488 IBM PowerHA SystemMirror 7.1 for AIX


» » » Execute Automated Test Procedure -- (hacmp_testtool_auto_extended)
» » » Execute Custom Test Procedure -- (hacmp_testtool_custom)
» » PowerHA SystemMirror Trace Facility -- (cm_trace_menu)
» » » Enable/Disable Tracing of PowerHA SystemMirror for AIX daemons -- (tracessys)
» » » » Start Trace -- (tracessyson)
» » » » Stop Trace -- (tracessysoff)
» » » Start/Stop/Report Tracing of PowerHA SystemMirror for AIX Service -- (trace)
» » » » START Trace -- (trcstart)
» » » » STOP Trace -- (trcstop)
» » » » Generate a Trace Report -- (trcrpt)
» » » » Manage Event Groups -- (grpmenu)
» » » » » List all Event Groups -- (lsgrp)
» » » » » Add an Event Group -- (addgrp)
» » » » » Change/Show an Event Group -- (chgrp)
» » » » » Remove Event Groups -- (delgrp.hdr)
» » » » Manage Trace -- (mngtrace)
» » » » » Change/Show Default Values -- (cngtrace)
» » » » » Reset Original Default Values -- (rstdflts)
» » PowerHA SystemMirror Error Notification -- (cm_EN_menu)
» » » Configure Automatic Error Notification -- (cm_AEN_menu)
» » » » List Error Notify Methods for Cluster Resources -- (cm_aen_list.dialog)
» » » » Add Error Notify Methods for Cluster Resources -- (cm_aen_add.dialog)
» » » » Remove Error Notify Methods for Cluster Resources -- (cm_aen_delete.dialog)
» » » Add a Notify Method -- (cm_add_notifymeth.dialog)
» » » Change/Show a Notify Method -- (cm_change_notifymeth_select)
» » » Remove a Notify Method -- (cm_del_notifymeth_select)
» » » Emulate Error Log Entry -- (show_err_emulate.select)
» » Stop RSCT Service -- (cm_manage_rsct_stop.dialog)
» » AIX Tracing for Cluster Resources -- (cm_trc_menu)
» » » Enable AIX Tracing for Cluster Resources -- (cm_trc_enable.select)
» » » Disable AIX Tracing for Cluster Resources -- (cm_trc_disable.dialog)
» » » Manage Command Groups for AIX Tracing for Cluster Resources -- (cm_trc_man_cmdgrp_menu)
» » » » List Command Groups for AIX Tracing for Cluster Resources -- (cm_trc_ls_cmdgrp.dialog)
» » » » Add a Command Group for AIX Tracing for Cluster Resources -- (cm_trc_add_cmdgrp.select)
» » » » Change / Show a Command Group for AIX Tracing for Cluster Resou --
(cm_trc_ch_cmdgrp.select)
» » » » Remove Command Groups for AIX Tracing for Cluster Resources -- (cm_trc_rm_cmdgrp.dialog)
» » Open a SMIT Session on a Node -- (cm_open_a_smit_session_select)

» Custom Cluster Configuration -- (cm_custom_menu)


» » Cluster Nodes and Networks -- (cm_custom_cluster_nodes_networks)
» » » Initial Cluster Setup (Custom) -- (cm_custom_setup_menu)
» » » » Cluster -- (cm_custom_setup_cluster_menu)
» » » » » Add/Change/Show a Cluster -- (cm_add_change_show_cluster)
» » » » » Remove the Cluster Definition -- (cm_remove_cluster)
» » » » Nodes -- (cm_custom_setup_nodes_menu)
» » » » » Add a Node -- (cm_add_node)
» » » » » Change/Show a Node -- (cm_change_show_node)
» » » » » Remove a Node -- (cm_remove_node)
» » » » Networks -- (cm_manage_networks_menu)
» » » » » Add a Network -- (cm_add_network)
» » » » » Change/Show a Network -- (cm_change_show_network)
» » » » » Remove a Network -- (cm_remove_network)
» » » » Network Interfaces -- (cm_manage_interfaces_menu)
» » » » » Add a Network Interface -- (cm_add_interfaces)
» » » » » Change/Show a Network Interface -- (cm_change_show_interfaces)
» » » » » Remove a Network Interface -- (cm_remove_interfaces)
» » » » Define Repository Disk and Cluster IP Address -- (cm_define_repos_ip_addr)
» » » Manage the Cluster -- (cm_custom_mgt_menu)
» » » » Cluster Startup Settings -- (cm_startup_options)

Appendix B. PowerHA SMIT tree 489


» » » » Reset Cluster Tunables -- (cm_reset_cluster_tunables)
» » » Verify and Synchronize Cluster Configuration (Advanced) -- (cm_adv_ver_and_sync)
» » Resources -- (cm_custom_apps_resources)
» » » Custom Disk Methods -- (cldisktype_custom_menu)
» » » » Add Custom Disk Methods -- (cldisktype_custom_dialog_add)
» » » » Change/Show Custom Disk Methods -- (cldisktype_custom_dialog_cha.select)
» » » » Remove Custom Disk Methods -- (cldisktype_custom_dialog_rem.select)
» » » Custom Volume Group Methods -- (cm_config_custom_volume_methods_menu_dmn)
» » » » Add Custom Volume Group Methods -- (cm_dialog_add_custom_volume_methods)
» » » » Change/Show Custom Volume Group Methods --
(cm_selector_change_custom_volume_methods)
» » » » Remove Custom Volume Group Methods -- (cm_dialog_delete_custom_volume_methods)
» » » Custom File System Methods -- (cm_config_custom_filesystem_methods_menu_dmn)
» » » » Add Custom File System Methods -- (cm_dialog_add_custom_filesystem_methods)
» » » » Change/Show Custom File System Methods --
(cm_selector_change_custom_filesystem_methods)
» » » » Remove Custom File System Methods -- (cm_dialog_delete_custom_filesystem_methods)
» » » Configure User Defined Resources and Types -- (cm_cludrestype_main_menu)
» » » » Configure User Defined Resource Types -- (cm_cludrestype_sub_menu)
» » » » » Add a User Defined Resource Type -- (cm_cludrestype_add)
» » » » » Change/Show a User Defined Resource Type -- (cm_cludrestype_change)
» » » » » Remove a User Defined Resource Type -- (cm_cludrestype_remove)
» » » » Configure User Defined Resources -- (cm_cludres_sub_menu)
» » » » » Add a User Defined Resource -- (cm_cludres_add)
» » » » » Change/Show a User Defined Resource -- (cm_cludres_change)
» » » » » Remove a User Defined Resource -- (cm_cludres_remove)
» » » » » Change/Show User Defined Resource Monitor -- (cm_cludres_chmonitor)
» » » » Import User Defined Resource Types and Resources Definition from XML file --
(cm_cludrestype_importxml)
» » » Customize Resource Recovery -- (_cm_change_show_resource_action_select)
» » » Verify and Synchronize Cluster Configuration (Advanced) -- (cm_adv_ver_and_sync)
» » Events -- (cm_events)
» » » Cluster Events -- (cm_cluster_events)
» » » » Configure Pre/Post-Event Commands -- (cm_defevent_menu)
» » » » » Add a Custom Cluster Event -- (cladd_event.dialog)
» » » » » Change/Show a Custom Cluster Event -- (clchsh_event.select)
» » » » » Remove a Custom Cluster Event -- (clrm_event.select)
» » » » Change/Show Pre-Defined Events -- (clcsclev.select)
» » » » User-Defined Events -- (clude_custom_menu)
» » » » » Add Custom User-Defined Events -- (clude_custom_dialog_add)
» » » » » Change/Show Custom User-Defined Events -- (clude_custom_dialog_cha.select)
» » » » » Remove Custom User-Defined Events -- (clude_custom_dialog_rem.select)
» » » » Remote Notification Methods -- (cm_def_cus_pager_menu)
» » » » » Configure a Node/Port Pair -- (define_node_port)
» » » » » Remove a Node/Port Pair -- (remove_node_port)
» » » » » Add a Custom Remote Notification Method -- (cladd_pager_notify.dialog)
» » » » » Change/Show a Custom Remote Notification Method -- (clch_pager_notify)
» » » » » Remove a Custom Remote Notification Method -- (cldel_pager_notify)
» » » » » Send a Test Remote Notification -- (cltest_pager_notify)
» » » » Change/Show Time Until Warning -- (cm_time_before_warning)
» » » System Events -- (cm_system_events)
» » » » Change/Show Event Response -- (cm_change_show_sys_event)
» » » Verify and Synchronize Cluster Configuration (Advanced) -- (cm_adv_ver_and_sync)
» » Verify and Synchronize Cluster Configuration (Advanced) -- (cm_adv_ver_and_sync)

» Can't find what you are looking for ? -- (cm_tree)

» Not sure where to start ? -- (cm_getting_started)

490 IBM PowerHA SystemMirror 7.1 for AIX


C

Appendix C. PowerHA supported hardware


Historically, newer versions of PowerHA inherited support from previous versions, unless
specific support was removed by the product. Over time, it has become uncommon to remove
support for old hardware. If the hardware was supported in the past and it can run a version of
AIX that is supported by the current version of PowerHA, the hardware is supported.

Because PowerHA 7.1 is not supported on any AIX level before 6.1.6, if the hardware is not
supported on 6.1.6, then by definition PowerHA 7.1 does not support it either. Also, if the
hardware manufacturer has not made any statement of support for AIX 7.1, it is not valid until
such support is stated. This is true even though the tables in this appendix might show that
PowerHA supports it.

This appendix contains information about IBM Power Systems, IBM storage, adapters, and
AIX levels supported by current versions of High-Availability Cluster Multi-Processing
(HACMP) 5.4.1 through PowerHA 7.1. It focuses on hardware support from around the last
five years and consists mainly of IBM POWER5 systems and later. At the time of writing, the
information was current and complete.

All POWER5 and later systems are supported on AIX 7.1 and HACMP 5.4.1 and later. AIX 7.1
support has the following specific requirements for HACMP and PowerHA:
 HACMP 5.4.1, SP10
 PowerHA 5.5, SP7
 PowerHA 6.1, SP3
 PowerHA 7.1

Full software support details are in the official support flash. The information in this appendix
is available and maintained in the “PowerHA hardware support matrix” at:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105638

Most of the devices in the online documentation are linked to their corresponding support flash.

This appendix includes the following topics:


 IBM Power Systems
 IBM storage
 Adapters

© Copyright IBM Corp. 2011. All rights reserved. 491


IBM Power Systems
The following sections provide details about the IBM Power System servers and the levels of
PowerHA and AIX supported.

IBM POWER5 systems


Table C-1 lists the software versions for PowerHA with AIX supported on IBM POWER5
System p models.

Table C-1 POWER5 System p model support for HACMP and PowerHA
System p HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1
models

7037-A50 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

9110-510 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

9110-51A AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

9111-285 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

9111-520 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6 r
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

9113-550 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

9115-505 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

9116-561+ AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

9117-570 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

9118-575 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6 r
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

9119-590 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

9119-595 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

9131-52A AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

9133-55A AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

492 IBM PowerHA SystemMirror 7.1 for AIX


Table C-2 lists the software versions for PowerHA with AIX supported on IBM POWER5
System i® models.

Table C-2 POWER5 System i model support for HACMP and PowerHA
System i models HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1

9406-520 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

9406-550 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

9406-570 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

9406-590 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

9406-595 AIX 5.3 TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

IBM POWER6 systems


Table C-3 lists the software versions for PowerHA with AIX supported on POWER6 System p
models.

Table C-3 POWER6 System p support for PowerHA and AIX


System p HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1
models

8203-E4A AIX 5.3 TL7 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP2 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

8203-E8A AIX 5.3 TL7 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX6.1 TL0 SP AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

8234-EMA AIX 5.3 TL8 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP5 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

9117-MMA AIX 5.3 TL6 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

9119-FHA AIX 5.3 TL8 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 SP1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

9125-F2A AIX 5.3 TL8 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 SP1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

Built-in serial ports: Built-in serial ports in POWER6 servers are not available for
PowerHA use. Instead, use disk heartbeating. However, note that the built-in Ethernet
(IVE) adapters are supported for PowerHA use.

Appendix C. PowerHA supported hardware 493


IBM POWER7 Systems
Table C-4 lists the software versions for HACMP and PowerHA with AIX supported on IBM
POWER7 System p models.

Table C-4 POWER7 System p support for HACMP and PowerHA


System p HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1
models

8202-E4B/720 AIX 5.3 TL11 SP1 AIX 5.3 TL12 AIX 5.3 TL12 AIX 6.1 TL6
AIX 6.1 TL4 SP2 AIX 6.1 TL5 AIX 6.1 TL5 AIX 7.1

8205-E6B/740 AIX 5.3 TL11 SP1 AIX 5.3 TL12 AIX 5.3 TL12 AIX 6.1 TL6
AIX 6.1 TL4 SP2 AIX 6.1 TL5 AIX 6.1 TL5 AIX 7.1

8231-E2B/710 AIX 5.3 TL11 SP1 AIX 5.3 TL12 AIX 5.3 TL12 AIX 6.1 TL6
AIX 6.1 TL4 SP2 AIX 6.1 TL5 AIX 6.1 TL5 AIX 7.1

8231-E2B/730 AIX 5.3 TL11 SP1 AIX 5.3 TL12 AIX 5.3 TL12 AIX 6.1 TL6
AIX 6.1 TL4 SP2 AIX 6.1 TL5 AIX 6.1 TL5 AIX 7.1

8233-E8B/750 AIX 5.3 TL11 SP1 AIX 5.3 TL11 SP1 AIX 5.3 TL11 AIX 6.1 TL6 r
AIX 6.1 TL4 SP2 AIX 6.1 TL4 SP3 AIX 6.1 TL4 SP3 AIX 7.1

9117-MMB/770 AIX 5.3 TL11 SP1 AIX 5.3 TL11 AIX 5.3 TL11 AIX 6.1 TL6
AIX 6.1 TL4 SP2 AIX 6.1 TL4 SP3 AIX 6.1 TL4 SP3 AIX 7.1

9119-FHB/795 AIX 5.3 TL11 SP1 AIX 5.3 TL12 AIX 5.3 TL12 AIX 6.1 TL6
AIX 6.1 TL4 SP2 AIX 6.1 TL5 AIX 6.1 TL5 AIX 7.1

9179-FHB/780 AIX 5.3 TL11 SP1 AIX 5.3 TL11 AIX 5.3 TL11 or AIX 6.1 TL6
AIX 6.1 TL4 SP2 AIX 6.1 TL4 SP3 AIX 6.1 TL4 SP3 AIX 7.1

Built-in serial ports: Built-in serial ports in POWER7 Servers are not available for
PowerHA use. Instead, use disk heartbeating. However, note that the built-in Ethernet
(IVE) adapters are supported for PowerHA use.

IBM POWER Blade servers


Table C-5 lists the software versions for HACMP and PowerHA with AIX supported on IBM
POWER Blade servers.

Table C-5 IBM POWER Blade support for HACMP and PowerHA
System p HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1
models

7778-23X/JS23 HACMP SP2 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL7 AIX 6.1 TL2 SP1 AIX 6.1 TL2 SP1 AIX 7.1
AIX 6.1 TL0 SP2

7778-43X/JS43 HACMP SP2 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL7 AIX 6.1 TL2 SP1 AIX 6.1 TL2 SP1 AIX 7.1
AIX 6.1 TL0 SP2

7998-60X/JS12 HACMP SP2 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL7 AIX 6.1 TL2 SP AIX 6.1 TL2 SP1 AIX 7.1

7998-61X/JS22 HACMP SP2 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL6 AIX 6.1 TL2 SP1 AIX 6.1 TL2 SP1 AIX 7.1

494 IBM PowerHA SystemMirror 7.1 for AIX


System p HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1
models

8406-70Y/PS700 AIX 5.3 TL11 SP1 AIX 5.3 TL12 AIX 5.3 TL12 AIX 6.1 TL6
AIX 6.1 TL4 SP2 AIX 6.1 TL5 AIX 6.1 TL5 AIX 7.1

8406-71Y/PS701 AIX 5.3 TL11 SP1 AIX 5.3 TL12 AIX 5.3 TL12 AIX 6.1 TL6
PS702 AIX 6.1 TL4 SP2 AIX 6.1 TL5 AIX 6.1 TL5 AIX 7.1

8844-31U/JS21 AIX 5.3. TL4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
8844-51U/JS21 AIX 6.1 TL2 SP1 AIX 6.1 TL2 SP1 AIX 7.1

Blade support includes support for IVM and IVE on both POWER6 and POWER7 blades. The
following adapter cards are supported in the POWER6 and POWER7 blades:
 8240 Emulex 8Gb FC Expansion Card (CIOv)
 8241 QLogic 4Gb FC Expansion Card (CIOv)
 8242 QLogic 8Gb Fibre Channel Expansion Card (CIOv)
 8246 SAS Connectivity Card (CIOv)
 8251 Emulex 4Gb FC Expansion Card (CFFv)
 8252 QLogic combo Ethernet and 4 Gb Fibre Channel Expansion Card (CFFh)
 8271 QLogic Ethernet/8Gb FC Expansion Card (CFFh)

IBM storage
It is common to use multipathing drivers with storage. If using MPIO, SDD, SDDPCM, or all
three types on any PowerHA controlled storage, you are required to use enhanced concurrent
volume groups (ECVGs). This requirement also applies to vSCSI and NPIV devices.

Fibre Channel adapters


This section provides information about support for fibre channel (FC) adapters.

DS storage units
Table C-6 lists the DS storage unit support for HACMP and PowerHA with AIX.

Table C-6 DS storage unit support for HACMP and PowerHA


Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1

DS3400 HACMP SP2 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL8 AIX TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL2

DS3500 HACMP SP2 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL8 AIX TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL2

DS4100 AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

DS4200 AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

DS4300 AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

Appendix C. PowerHA supported hardware 495


Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1

DS4400 AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

DS4500 AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

DS4700 AIX 5.3 TL5 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

DS4800 AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

DS5020 HACMP SP2 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL7 AIX 6.1 TL2 SP1 AIX 6.1 TL2 SP1 AIX 7.1
AIX 6.1 TL0 SP2

DS6000 AIX 5.3 TL5 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
DS6800 AIX 6.1 AIX TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

DS5100 HACMP SP2 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL7 AIX 6.1 TL2 SP1 AIX 6.1 TL2 SP1 AIX 7.1
AIX 6.1 TL0 SP2

DS5300 HACMP SP2 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL7 AIX 6.1 TL2 SP1 AIX 6.1 TL2 SP1 AIX 7.1
AIX 6.1 TL0 SP2

DS8000 AIX 5.3 TL5 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
931,932,9B2 AIX 6.1 AIX TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

DS8700 HACMP SP2 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL8 AIX TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL2

IBM XIV
Table C-7 lists the software versions for HAMCP and PowerHA with AIX supported on XIV
storage. PowerHA requires XIV microcode level 10.0.1 or later.

Table C-7 IBM XIV support for HACMP and PowerHA with AIX
Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1

XIV HACMP SP4 AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL6
2810-A14 AIX 5.3 TL7 SP6 AIX TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL0 SP2

496 IBM PowerHA SystemMirror 7.1 for AIX


SAN Volume Controller
Table C-8 shows the software versions for HACMP and PowerHA with AIX supported on the
SAN Volume Controller (SVC). SVC software levels are supported up through SVC v5.1. The
levels shown in the table are the absolute minimum requirements for v5.1.

Table C-8 SVC supported models for HACMP and PowerHA with AIX
Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1

2145-4F2 HACMP SP8 PowerHA SP6 PowerHA SP1 AIX 6.1 TL6
AIX 5.3 TL9 AIX 5.3 TL9 AIX 5.3 TL9 AIX 7.1
AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3

2145-8F2 HACMP SP8 PowerHA SP8 PowerHA SP1 AIX 6.1 TL6
AIX 5.3 TL9 AIX 5.3 TL9 AIX 5.3 TL9 AIX 7.1
AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3

Network-attached storage
Table C-9 shows the software versions for PowerHA and AIX supported on network-attached
storage (NAS).

Table C-9 NAS supported models for HACMP and PowerHA with AIX
Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1

N3700 (A20) AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

N5200 (A20) AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

N5200 (G20) AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

N5300 HACMP SP3 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL0 SP2

N5500 (A20) AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

N5500 (G20) AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

N5600 HACMP SP3 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL0 SP2

N6040 AIX 5.3 TL7 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP2 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

N6060 AIX 5.3 TL7 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP2 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

N6070 AIX 5.3 TL7 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP2 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

N7600 (A20) AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

N7600 (G20) AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

Appendix C. PowerHA supported hardware 497


Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1

N7700 (A21) AIX 5.3 TL7 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP2 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

N7700 (G21) AIX 5.3 TL7 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP2 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

N7800 (A20) AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

N7800 (G20) AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

N7900 (A21) AIX 5.3 TL7 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP2 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

N7900 (G21) AIX 5.3 TL7 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 TL0 SP2 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

Serial-attached SCSI
Table C-10 lists the software versions for PowerHA and AIX supported on the serial-attached
SCSI (SAS) model.

Table C-10 SAS supported model for HACMP and PowerHA with AIX
Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1

5886 EXP12S HACMP SP5 HACMP SP2 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3

SCSI
Table C-11 shows the software versions for PowerHA and AIX supported on the SCSI model.

Table C-11 SCSI supported model for HACMP and PowerHA with AIX
Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1

7031-D24 AIX 5.3 TL4 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL6
AIX 6.1 AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3 AIX 7.1

Adapters
This following sections contain information about the supported adapters for PowerHA.

Fibre Channel adapters


The following FC adapters are supported:
 #1905 4 Gb Single Port Fibre Channel PCI-X 2.0 DDR Adapter
 #1910 4 Gb Dual Port Fibre Channel PCI-X 2.0 DDR Adapter
 #1957 2 Gigabit Fibre Channel PCI-X Adapter
 #1977 2 Gigabit Fibre Channel PCI-X Adapter

498 IBM PowerHA SystemMirror 7.1 for AIX


 #5273 LP 8 Gb PCI-Express Dual Port Fibre Channel Adapter*
 #5276 LP 4 Gb PCI-Express Fibre Channel Adapter
 #5716 2 Gigabit Fibre Channel PCI-X Adapter
 #5735 8 Gb PCI-Express Dual Port Fibre Channel Adapter*
 #5758 4 Gb Single Port Fibre Channel PCI-X 2.0 DDR Adapter
 #5759 4 Gb Dual Port Fibre Channel PCI-X 2.0 DDR Adapter
 #5773 Gigabit PCI Express Fibre Channel Adapter
 #5774 Gigabit PCI Express Fibre Channel Adapter
 #6228 1-and 2-Gigabit Fibre Channel Adapter for 64-bit PCI Bus
 #6239 2 Gigabit FC PCI-X Adapter

#5273/#5735 PCI-Express Dual Port Fibre Channel Adapter: The 5273/5735 minimum
requirements are PowerHA 5.4.1 SP2 or 5.5 SP1.

SAS
The following SAS adapters are supported:
 #5278 LP 2x4port PCI-Express SAS Adapter 3 Gb
 #5901 PCI-Express SAS Adapter
 #5902 PCI-X DDR Dual –x4 Port SAS RAID Adapter
 #5903 PCI-Express SAS Adapters
 #5912 PCI-X DDR External Dual – x4 Port SAS Adapter

Table C-12 lists the SAS software support requirements.

Table C-12 SAS software support for HACMP and PowerHA with AIX
HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1

HACMP SP5 HACMP SP2 AIX 5.3 TL9 AIX 6.1 TL6
AIX 5.3 TL9 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 7.1
AIX 6.1 TL2 SP3 AIX 6.1 TL2 SP3

Ethernet
The following Ethernet adapters are supported with PowerHA:
 #1954 4-Port 10/100/100 Base-TX PCI-X Adapter
 #1959 IBM 10/100/1000 Base-TX Ethernet PCI-X Adapter
 #1978 IBM Gigabit Ethernet-SX PCI-X Adapter
 #1979 IBM 10/100/1000 Base-TX Ethernet PCI-X Adapter
 #1981 IBM 10 Gigabit Ethernet-SR PCI-X Adapter
 #1982 IBM 10 Gigabit Ethernet-LR PCI-X Adapter
 #1983 IBM 2-port 10/100/1000 Base-TX Ethernet PCI-X
 #1984 IBM Dual Port Gigabit Ethernet-SX PCI-X Adapter
 #1990 IBM 2-port 10/100/1000 Base-TX Ethernet PCI-X
 #4961 IBM Universal 4-Port 10/100 Ethernet Adapter
 #4962 IBM 10/100 Mbps Ethernet PCI Adapter II
 #5271 LP 4-Port Ethernet 10/100/1000 Base-TX PCI-X Adapter
 #5274 LP 2-Port Gigabit Ethernet-SX PCI Express
 #5700 IBM Gigabit Ethernet-SX PCI-X Adapter
 #5701 IBM 10/100/1000 Base-TX Ethernet PCI-X Adapter
 #5706 IBM 2-Port 10/100/1000 Base-TX Ethernet PCI-X Adapter
 #5707 IBM 2-Port Gigabit Ethernet-SX PCI-X Adapter
 #5717 IBM 4-Port Ethernet 10/100/1000 Base-TX PCI-X Adapter

Appendix C. PowerHA supported hardware 499


 #5718 IBM 10 Gigabit -SR/-LR Ethernet PCI-x adapters
 #5719 IBM 10 Gigabit -SR/-LR Ethernet PCI-x adapters
 #5721 IBM 10 Gigabit Ethernet-SR PCI-X 2.0 Adapter
 #5722 IBM 10 Gigabit Ethernet-LR PCI-X 2.0 Adapter
 #5740 4-Port 10/100/100 Base-TX PCI-X Adapter
 #5767 Adapter 2-Port 10/100/1000 Base-TX Ethernet PCI Express
 #5768 Adapter 2-Port Gigabit Ethernet-SX PCI Express

InfiniBand
The following InfiniBand adapters are supported with PowerHA:
 #1809 IBM GX Dual-port 4x IB HCA
 #1810 IBM GX Dual-port 4x IB HCA
 #1811 IBM GX Dual-port 4x IB HCA
 #1812 IBM GX Dual-port 4x IB HCA
 #1820 IBM GX Dual-port 12x IB HCA

SCSI and iSCSI


The following SCSI and iSCSI adapters are supported with PowerHA:
 #1912 IBM PCI-X DDR Dual Channel Ultra320 LVD SCSI Adapter
 #1913 PCI-X DDR Dual Channel Ultra320 SCSI RAID Adapter
 #1975 PCI-X Dual Channel Ultra320 SCSI RAID Adapter
 #1986 1 Gigabit-TX iSCSI TOE PCI-X adapter (copper connector)
 #1987 1 Gigabit-SX iSCSI TOE PCI-X adapter (optical connector)
 #5703 PCI-X Dual Channel Ultra320 SCSI RAID Adapter
 #5710 PCI-X Dual Channel Ultra320 SCSI Adapter
 #5711 PCI-X Dual Channel Ultra320 SCSI RAID Blind Swap Adapter
 #5712 PCI-X Dual Channel Ultra320 SCSI Adapter
 #5713 1 Gigabit-TX iSCSI TOE PCI-X adapter (copper connector)
 #5714 1 Gigabit-SX iSCSI TOE PCI-X adapter (optical connector)
 #5736 IBM PCI-X DDR Dual Channel Ultra320 SCSI Adapter
 #5737 PCI-X DDR Dual Channel Ultra320 SCSI RAID Adapter

PCI bus adapters


PowerHA 7.1 no longer supports RS-232 connections. Therefore, the following adapters are
supported up through PowerHA 6.1 only:
 #2943 8-Port Asynchronous EIA-232/RS-422, PCI bus adapter
 #2944 128-Port Asynchronous Controller, PCI bus adapter
 #5277 IBM LP 4-Port Async EIA-232 PCIe Adapter
 #5723 2-Port Asynchronous EIA-232/RS-422, PCI bus adapter
 #5785 IBM 4-Port Async EIA-232 PCIe adapter

The 5785 adapter is only supported by PowerHA 5.5 and 6.1.

500 IBM PowerHA SystemMirror 7.1 for AIX


D

Appendix D. The clmgr man page


At time of writing, no documentation was available about the clmgr command except for the
related man pages. To make it easier for those of you who do not have the product installed
and want more details about the clmgr command, a copy of the man pages is provided as
follows:
clmgr command
************

Purpose
=======
clmgr: Provides a consistent, reliable interface for performing IBM PowerHA
SystemMirror cluster operations via a terminal or script. All clmgr
operations are logged in the "clutils.log" file, including the
command that was executed, its start/stop time, and what user
initiated the command.

The basic format for using clmgr is consistently as follows:

clmgr <ACTION> <CLASS> [<NAME>] [<ATTRIBUTES...>]

This consistency helps make clmgr easier to learn and use. Further
help is also available at each part of clmgr's commmand line. For
example, just executing "clmgr" by itself will result in a list of
the available ACTIONs supported by clmgr. Executing "clmgr ACTION"
with no CLASS provided will result in a list of all the available
CLASSes for the specified ACTION. Executing "clmgr ACTION CLASS"
with no NAME or ATTRIBUTES provided is slightly different, though,
since for some ACTION+CLASS combinations, that may be a valid
command format. So to get help in this scenario, it is necessary
to explicitly request it by appending the "-h" flag. So executing
"clmgr ACTION CLASS -h" will result in a listing of all known
attributes for that ACTION+CLASS combination being displayed.
That is where clmgr's ability to help ends, however; it can not
help with each individual attribute. If there is a question about
what a particular attribute is for, or when to use it, the product

© Copyright IBM Corp. 2011. All rights reserved. 501


documentation will need to be consulted.

Synopsis
========
clmgr [-c|-x] [-S] [-v] [-f] [-D] [-l {low|med|high|max}] [-T <ID>]
[-a {<ATTR#1>,<ATTR#2>,<ATTR#n>,...}] <ACTION> <CLASS> [<NAME>]
[-h | <ATTR#1>=<VALUE#1> <ATTR#2>=<VALUE#2> <ATTR#n>=<VALUE#n>]

ACTION={add|modify|delete|query|online|offline|...}
CLASS={cluster|site|node|network|resource_group|...}

clmgr {-h|-?} [-v]


clmgr [-v] help

ACTION a verb describing the operation to be performed

The following four ACTIONs are available on almost all the


supported CLASSes (there are a few exceptions):

add (Aliases: a)
query (Aliases: q, ls, get)
modify (Aliases: mod, ch, set)
delete (Aliases: de, rm, er)

The remaining ACTIONS are typically only supported on a small


subset of the supported CLASSes:

Cluster, Sites, Node, Resource Group:


online (Aliases: on, start)
offline (Aliases: off, stop)

Resource Group, Service IP, Persistent IP:


move (Aliases: mv)

Cluster, Log, Node, Snapshot:


manage (Aliases: mg)

Cluster, File Collection:


sync (Aliases: sy)

Cluster, Method:
verify (Aliases: ve)

Log, Report, Snapshot:


view (Aliases: vi)

NOTE: ACTION is *not* case-sensitive.


NOTE: all ACTIONs provide a shorter alias, such as "rm" in
place of "delete". These aliases are provided for
convenience/ease-of-use at a terminal, and are not
recommended for use in scripts.

CLASS the type of object upon which the ACTION will be performed.
The complete list of supported CLASSes is:

502 IBM PowerHA SystemMirror 7.1 for AIX


cluster (Aliases: cl)
site (Aliases: si)
node (Aliases: no)
interface (Aliases: in, if)
network (Aliases: ne, nw)
resource_group (Aliases: rg)
service_ip (Aliases: se)
persistent_ip (Aliases: pe)
application_controller (Aliases: ac, app)
application_monitor (Aliases: am, mon)
tape (Aliases: tp)
dependency (Aliases: de)
file_collection (Aliases: fi, fc)
snapshot (Aliases: sn, ss)
resource (Aliases: rs)
resource_type (Aliases: rt)
method (Aliases: me)
volume_group (Aliases: vg)
logical_volume (Aliases: lv)
file_system (Aliases: fs)
physical_volume (Aliases: pv)

NOTE: CLASS is *not* case-sensitive.


NOTE: all CLASSes provide a shorter alias, such as "fc" in
place of "file_collection". These aliases are provided
for convenience/ease-of-use at a terminal, and are not
recommended for use in scripts.

NAME the specific object, of type "CLASS", upon which the ACTION
is to be performed.

ATTR=VALUE optional, attribute/value pairs that are specific to the


ACTION+CLASS combination. These may be used to do specify
configuration settings, or adjust particular operations.

When used with the "query" action, ATTR=VALUE specifications


may be used to perform attribute-based searching/filtering.
When used for this purpose, simple wildcards may be used.
For example, "*" matches zero or more of any character, "?"
matches zero or one of any character.

NOTE: an ATTR may not always need to be fully typed. Only the
number of leading characters required to uniquely identify
the attribute from amongst the set of attributes available
for the specified operation need to be provided. So instead
of "FC_SYNC_INTERVAL", for the "add/modify cluster"
operation, "FC" could be used, and would have the same
result.

-a valid only with the "query", "add", and "modify" ACTIONs,


requests that only the specified attribute(s) be displayed.

NOTE: the specified order of these attributes is *not*


guaranteed to be preserved in the resulting output.

Appendix D. The clmgr man page 503


-c valid only with the "query", "add", and "modify" ACTIONs,
requests all data to be displayed in colon-delimited format.

-D disables the dependency mechanism in clmgr that will attempt to


create any requisite resources if they are not already defined
within the cluster.

-f requests an override of any interactive prompts, forcing the


current operation to be attempted (if forcing the operation
is a possibility).

-h requests that any available help information be displayed.


An attempt is made to provide context-sensitive assistance.

-l activates trace logging for serviceability:

low: logs function entry/exit


med: adds function entry parameters, as well as function
return values
high: adds tracing of every line of execution, only omitting
routine, "utility" functions
max: adds the routine/utility functions. Also adds a time/date
stamp to the function entry/exit messages.

All trace data is written into the "clutils.log" file.


This option is typically only of interest when troubleshooting.

-S valid only with the "query" ACTION and "-c" option,


requests that all column headers be suppressed.

-T a transaction ID to be applied to all logged output, to help


group one of more activities into a single body of output that
can be extracted from the log for analysis.
This option is typically only of interest when troubleshooting.

-v requests maximum verbosity in the output.

NOTE: when used with the "query" action and no specific


object name, queries all instances of the specified
class. For example, "clmgr -v query node" will query
and display *all* nodes and their attributes. When
used with the "add" or "modify" operations, the
final, resulting attributes after the operation is
complete will be displayed (only if the operation
was successful).

-x valid only with the "query", "add", and "modify" ACTIONs,


requests all data to be displayed in simple XML format.

Operations
==========
CLUSTER:
clmgr add cluster \
[ <cluster_label> ] \
REPOSITORY=<hdisk#> \

504 IBM PowerHA SystemMirror 7.1 for AIX


SHARED_DISKS=<hdisk#>[,<hdisk#>,...] \
[ NODES=<host>[,<host#2>,<host#n>,...] ] \
[ CLUSTER_IP=<IP_Address> ] \
[ FC_SYNC_INTERVAL=## ] \
[ RG_SETTLING_TIME=## ] \
[ MAX_EVENT_TIME=### ] \
[ MAX_RG_PROCESSING_TIME=### ] \
[ SITE_POLICY_FAILURE_ACTION={fallover|notify} ] \
[ SITE_POLICY_NOTIFY_METHOD="<FULL_PATH_TO_FILE>" ]
[ DAILY_VERIFICATION={Enabled|Disabled} ] \
[ VERIFICATION_NODE={Default|<node>} ] \
[ VERIFICATION_HOUR=<00..23> ] \
[ VERIFICATION_DEBUGGING={Enabled|Disabled} ]
clmgr modify cluster \
[ NEWNAME=<new_cluster_label> ] \
[ SHARED_DISKS=<disk>[,<disk#2>,<disk#n>,...] ] \
[ NODES=<host>[,<host#2>,<host#n>,...] ] \
[ CLUSTER_IP=<IP_Address> ] \
[ FC_SYNC_INTERVAL=## ] \
[ RG_SETTLING_TIME=## ] \
[ MAX_EVENT_TIME=### ] \
[ MAX_RG_PROCESSING_TIME=### ] \
[ SITE_POLICY_FAILURE_ACTION={fallover|notify} ] \
[ SITE_POLICY_NOTIFY_METHOD="<FULL_PATH_TO_FILE>" ]
[ DAILY_VERIFICATION={Enabled|Disabled} ] \
[ VERIFICATION_NODE={Default|<node>} ] \
[ VERIFICATION_HOUR=<00..23> ] \
[ VERIFICATION_DEBUGGING={Enabled|Disabled} ]
clmgr query cluster
clmgr delete cluster [ NODES={ALL|<node>[,<node#2>,<node#n>,...}] ]
clmgr recover cluster

NOTE: the "delete" action defaults to only deleting


the cluster on the local node.

clmgr sync cluster \


[ VERIFY={yes|no} ] \
[ CHANGES_ONLY={no|yes} ] \
[ DEFAULT_TESTS={yes|no} ] \
[ METHODS=<method#1>[,<method#n>,...] ] \
[ FIX={no|yes} ] \
[ LOGGING={standard|verbose} ] \
[ LOGFILE=<PATH_TO_LOG_FILE> ] \
[ MAX_ERRORS=## ] \
[ FORCE={no|yes} ]
NOTE: all options are verification parameters, so they
are only valid when "VERIFY" is set to "yes".

clmgr manage cluster {discover|reset|unlock}

clmgr manage cluster security \


LEVEL={Disable|Low|Med|High}
clmgr manage cluster security \
ALGORITHM={DES|3DES|AES} \
[ GRACE_PERIOD=<SECONDS> ] \

Appendix D. The clmgr man page 505


[ REFRESH=<SECONDS> ]
clmgr manage cluster security \
MECHANISM={OpenSSL|SelfSigned|SSH} \
[ CERTIFICATE=<PATH_TO_FILE> ] \
[ PRIVATE_KEY=<PATH_TO_FILE> ]

NOTE: "GRACE_PERIOD" defaults to 21600 seconds (6 hours).


NOTE: "REFRESH" defaults to 86400 seconds (24 hours).

clmgr verify cluster \


[ CHANGES_ONLY={no|yes} ] \
[ DEFAULT_TESTS={yes|no} ] \
[ METHODS=<method#1>[,<method#n>,...] ] \
[ FIX={no|yes} ] \
[ LOGGING={standard|verbose} ] \
[ LOGFILE=<PATH_TO_LOG_FILE> ] \
[ MAX_ERRORS=## ]
[ SYNC={no|yes} ] \
[ FORCE={no|yes} ]
NOTE: the "FORCE" option should only be used when "SYNC" is set
to "yes".

clmgr offline cluster \


[ WHEN={now|restart|both} ] \
[ MANAGE={offline|move|unmanage} ] \
[ BROADCAST={true|false} ] \
[ TIMEOUT=<seconds_to_wait_for_completion> ]
clmgr online cluster \
[ WHEN={now|restart|both} ] \
[ MANAGE={auto|manual} ] \
[ BROADCAST={false|true} ] \
[ CLINFO={false|true|consistent} ] \
[ FORCE={false|true} ] \
[ FIX={no|yes|interactively} ]
[ TIMEOUT=<seconds_to_wait_for_completion> ]

NOTE: the "RG_SETTLING_TIME" attribute only affects resource groups


with a startup policy of "Online On First Available Node".
NOTE: an alias for "cluster" is "cl".

SITE:
clmgr add site <sitename> \
[ NODES=<node>[,<node#2>,<node#n>,...] ]
clmgr modify site <sitename> \
[ NEWNAME=<new_site_label> ] \
[ {ADD|REPLACE}={ALL|<node>[,<node#2>,<node#n>,...}] ]
At least one modification option must be specified.
ADD attempts to append the specified nodes to the site.
REPLACE attempts to replace the sites current nodes with
the specified nodes.
clmgr query site [ <sitename>[,<sitename#2>,<sitename#n>,...] ]
clmgr delete site {<sitename>[,<sitename#2>,<sitename#n>,...] | ALL}
clmgr recover site <sitename>
clmgr offline site <sitename> \
[ WHEN={now|restart|both} ] \

506 IBM PowerHA SystemMirror 7.1 for AIX


[ MANAGE={offline|move|unmanage} ] \
[ BROADCAST={true|false} ] \
[ TIMEOUT=<seconds_to_wait_for_completion> ]
clmgr online site <sitename> \
[ WHEN={now|restart|both} ] \
[ MANAGE={auto|manual} ] \
[ BROADCAST={false|true} ] \
[ CLINFO={false|true|consistent} ] \
[ FORCE={false|true} ] \
[ FIX={no|yes|interactively} ]
[ TIMEOUT=<seconds_to_wait_for_completion> ]

NOTE: an alias for "site" is "si".

NODE:
clmgr add node <node> \
[ COMMPATH=<ip_address_or_network-resolvable_name> ] \
[ RUN_DISCOVERY={true|false} ] \
[ PERSISTENT_IP=<IP> NETWORK=<network>
{NETMASK=<255.255.255.0 | PREFIX=1..128} ] \
[ START_ON_BOOT={false|true} ] \
[ BROADCAST_ON_START={true|false} ] \
[ CLINFO_ON_START={false|true|consistent} ] \
[ VERIFY_ON_START={true|false} ]
clmgr modify node <node> \
[ NEWNAME=<new_node_label> ] \
[ COMMPATH=<new_commpath> ] \
[ PERSISTENT_IP=<IP> NETWORK=<network>
{NETMASK=<255.255.255.0 | PREFIX=1..128} ] \
[ START_ON_BOOT={false|true} ] \
[ BROADCAST_ON_START={true|false} ] \
[ CLINFO_ON_START={false|true|consistent} ] \
[ VERIFY_ON_START={true|false} ]
clmgr query node [ {<node>|LOCAL}[,<node#2>,<node#n>,...] ]
clmgr delete node {<node>[,<node#2>,<node#n>,...] | ALL}
clmgr manage node undo_changes
clmgr recover node <node>[,<node#2>,<node#n>,...]
clmgr online node <node>[,<node#2>,<node#n>,...] \
[ WHEN={now|restart|both} ] \
[ MANAGE={auto|manual} ] \
[ BROADCAST={false|true} ] \
[ CLINFO={false|true|consistent} ] \
[ FORCE={false|true} ] \
[ FIX={no|yes|interactively} ]
[ TIMEOUT=<seconds_to_wait_for_completion> ]
clmgr offline node <node>[,<node#2>,<node#n>,...] \
[ WHEN={now|restart|both} ] \
[ MANAGE={offline|move|unmanage} ] \
[ BROADCAST={true|false} ] \
[ TIMEOUT=<seconds_to_wait_for_completion> ]

NOTE: the "TIMEOUT" attribute defaults to 120 seconds.


NOTE: an alias for "node" is "no".

NETWORK:

Appendix D. The clmgr man page 507


clmgr add network <network> \
[ TYPE={ether|XD_data|XD_ip|infiniband} ] \
[ {NETMASK=<255.255.255.0 | PREFIX=1..128} ] \
[ IPALIASING={true|false} ]
clmgr modify network <network> \
[ NEWNAME=<new_network_label> ] \
[ TYPE={ether|XD_data|XD_ip|infiniband} ] \
[ {NETMASK=<###.###.###.###> | PREFIX=1..128} ] \
[ ENABLE_IPAT_ALIASING={true|false} ] \
[ PUBLIC={true|false} ] \
[ RESOURCE_DIST_PREF={AC|C|CPL|ACPL} ]
clmgr query network [ <network>[,<network#2>,<network#n>,...] ]
clmgr delete network {<network>[,<network#2>,<network#n>,...] | ALL}

NOTE: the TYPE defaults to "ether" if not specified.


NOTE: when adding, the default is to construct an IPv4
network using a netmask of "255.255.255.0". To
create an IPv6 network, specify a valid prefix.
NOTE: AC == Anti-Collocation
C == Collocation
CPL == Collocation with Persistent Label
ACPL == Anti-Collocation with Persistent Label
NOTE: aliases for "network" are "ne" and "nw".

INTERFACE:
clmgr add interface <interface> \
NETWORK=<network> \
[ NODE=<node> ] \
[ TYPE={ether|infiniband} ] \
[ INTERFACE=<network_interface> ]
clmgr modify interface <interface> \
NETWORK=<network>
clmgr query interface [ <interface>[,<if#2>,<if#n>,...] ]
clmgr delete interface {<interface>[,<if#2>,<if#n>,...] | ALL}

NOTE: the "interface" may be either an IP address or label


NOTE: the "NODE" attribute defaults to the local node name.
NOTE: the "TYPE" attribute defaults to "ether"
NOTE: the "<network_interface>" might look like "en1", "en2", ...
NOTE: aliases for "interface" are "in" and "if".

RESOURCE GROUP:
clmgr add resource_group <resource_group> \
NODES=nodeA1,nodeA2,... \
[ SECONDARYNODES=nodeB2,nodeB1,... ] \
[ STARTUP={OHN|OFAN|OAAN|OUDP} ] \
[ FALLOVER={FNPN|FUDNP|BO} ] \
[ FALLBACK={NFB|FBHPN} ] \
[ NODE_PRIORITY_POLICY={default|mem|cpu| \
disk|least|most} ] \
[ NODE_PRIORITY_POLICY_SCRIPT=</path/to/script> ] \
[ NODE_PRIORITY_POLICY_TIMEOUT=### ] \
[ SITE_POLICY={ignore|primary|either|both} ] \
[ SERVICE_LABEL=service_ip#1[,service_ip#2,...] ] \
[ APPLICATIONS=appctlr#1[,appctlr#2,...] ] \

508 IBM PowerHA SystemMirror 7.1 for AIX


[ SHARED_TAPE_RESOURCES=<TAPE>[,<TAPE#2>,...] ] \
[ VOLUME_GROUP=<VG>[,<VG#2>,...] ] \
[ FORCED_VARYON={true|false} ] \
[ VG_AUTO_IMPORT={true|false} ] \
[ FILESYSTEM=/file_system#1[,/file_system#2,...] ] \
[ DISK=<hdisk>[,<hdisk#2>,...] ] \
[ FS_BEFORE_IPADDR={true|false} ] \
[ WPAR_NAME="wpar_name" ] \
[ EXPORT_FILESYSTEM=/expfs#1[,/expfs#2,...] ] \
[ EXPORT_FILESYSTEM_V4=/expfs#1[,/expfs#2,...] ] \
[ STABLE_STORAGE_PATH="/fs3" ] \
[ NFS_NETWORK="nfs_network" ] \
[ MOUNT_FILESYSTEM=/nfs_fs1;/expfs1,/nfs_fs2;,... ]

STARTUP:
OHN ----- Online Home Node (default value)
OFAN ---- Online on First Available Node
OAAN ---- Online on All Available Nodes (concurrent)
OUDP ---- Online Using Node Distribution Policy

FALLOVER:
FNPN ---- Fallover to Next Priority Node (default value)
FUDNP --- Fallover Using Dynamic Node Priority
BO ------ Bring Offline (On Error Node Only)

FALLBACK:
NFB ----- Never Fallback
FBHPN --- Fallback to Higher Priority Node (default value)

NODE_PRIORITY_POLICY:
NOTE: this policy may only be established if if the FALLOVER
policy has been set to "FUDNP".
default - next node in the NODES list
mem ----- node with most available memory
disk ---- node with least disk activity
cpu ----- node with most available CPU cycles
least --- node where the dynamic node priority script
returns the lowest value
most ---- node where the dynamic node priority script
returns the highest value

SITE_POLICY:
ignore -- Ignore
primary - Prefer Primary Site
either -- Online On Either Site
both ---- Online On Both Sites

NOTE: "SECONDARYNODES" and "SITE_POLICY" only apply when sites are


configured within the cluster.
NOTE: "appctlr" is an abbreviation for "application_controller".

clmgr modify resource_group <resource_group> \


[ NEWNAME=<new_resource_group_label> ] \
[ NODES=nodeA1[,nodeA2,...] ] \
[ SECONDARYNODES=nodeB2[,nodeB1,...] ] \

Appendix D. The clmgr man page 509


[ STARTUP={OHN|OFAN|OAAN|OUDP} ] \
[ FALLOVER={FNPN|FUDNP|BO} ] \
[ FALLBACK={NFB|FBHPN} ] \
[ NODE_PRIORITY_POLICY={default|mem|cpu| \
disk|least|most} ] \
[ SITE_POLICY={ignore|primary|either|both} ] \
[ SERVICE_LABEL=service_ip#1[,service_ip#2,...] ] \
[ APPLICATIONS=appctlr#1[,appctlr#2,...] ] \
[ VOLUME_GROUP=volume_group#1[,volume_group#2,...]] \
[ FORCED_VARYON={true|false} ] \
[ VG_AUTO_IMPORT={true|false} ] \
[ FILESYSTEM=/file_system#1[,/file_system#2,...] ] \
[ DISK=hdisk#1[,hdisk#2,...] ] \
[ FS_BEFORE_IPADDR={true|false} ] \
[ WPAR_NAME="wpar_name" ] \
[ EXPORT_FILESYSTEM=/expfs#1[,/expfs#2,...] ] \
[ EXPORT_FILESYSTEM_V4=/expfs#1[,/expfs#2,...] ] \
[ STABLE_STORAGE_PATH="/fs3" ] \
[ NFS_NETWORK="nfs_network" ] \
[ MOUNT_FILESYSTEM=/nfs_fs1;/expfs1,/nfs_fs2;,... ]

NOTE: "SECONDARYNODES" and "SITE_POLICY" only apply when sites are


configured within the cluster.
NOTE: "appctlr" is an abbreviation for "application_controller".

clmgr query resource_group [ <resource_group>[,<rg#2>,<rg#n>,...] ]


clmgr delete resource_group {<resource_group>[,<rg#2>,<rg#n>,...] |
ALL}
clmgr online resource_group <resource_group>[,<rg#2>,<rg#n>,...] \
[ NODES=<node>[,<node#2>,...] ]
clmgr offline resource_group <resource_group>[,<rg#2>,<rg#n>,...] \
[ NODES=<node>[,<node#2>,...] ]
clmgr move resource_group <resource_group>[,<rg#2>,<rg#n>,...] \
{SITE|NODE}=<node_or_site_label> \
[ STATE={online|offline} ] \
[ SECONDARY={false|true} ]

NOTE: the "SITE" and "SECONDARY" attributes are only applicable


when sites are configured in the cluster.
NOTE: the "SECONDARY" attribute defaults to "false".
NOTE: the resource group STATE remains unchanged if "STATE" is
not explicitly specified.
NOTE: an alias for "resource_group" is "rg".

FALLBACK TIMER:
clmgr add fallback_timer <timer> \
[ YEAR=<####> ] \
[ MONTH=<{1..12 | Jan..Dec}> ] \
[ DAY_OF_MONTH=<{1..31}> ] \
[ DAY_OF_WEEK=<{0..6 | Sun..Sat}> ] \
HOUR=<{0..23}> \
MINUTE=<{0..59}>
clmgr modify fallback_timer <timer> \
[ YEAR=<{####}> ] \
[ MONTH=<{1..12 | Jan..Dec}> ] \

510 IBM PowerHA SystemMirror 7.1 for AIX


[DAY_OF_MONTH=<{1..31}> ] \
[DAY_OF_WEEK=<{0..6 | Sun..Sat}> ] \
[HOUR=<{0..23}> ] \
[MINUTE=<{0..59}> ] \
[REPEATS=<{0,1,2,3,4 |
Never,Daily,Weekly,Monthly,Yearly}> ]
clmgr query fallback_timer [<timer>[,<timer#2>,<timer#n>,...] ]
clmgr delete fallback_timer {<timer>[,<timer#2>,<timer#n>,...] |\
ALL}

NOTE: aliases for "fallback_timer" are "fa" and "timer".

PERSISTENT IP/LABEL:
clmgr add persistent_ip <persistent_IP> \
NETWORK=<network> \
[ NODE=<node> ]
clmgr modify persistent_ip <persistent_label> \
[ NEWNAME=<new_persistent_label> ] \
[ NETWORK=<new_network> ] \
[ PREFIX=<new_prefix_length> ]
clmgr query persistent_ip [ <persistent_IP>[,<pIP#2>,<pIP#n>,...] ]
clmgr delete persistent_ip {<persistent_IP>[,<pIP#2>,<pIP#n>,...] |
ALL}
clmgr move persistent_ip <persistent_IP> \
INTERFACE=<new_interface>

NOTE: an alias for "persistent_ip" is "pe".

SERVICE IP/LABEL:
clmgr add service_ip <service_ip> \
NETWORK=<network> \
[ {NETMASK=<255.255.255.0 | PREFIX=1..128} ] \
[ HWADDR=<new_hardware_address> ] \
[ SITE=<new_site> ]
clmgr modify service_ip <service_ip> \
[ NEWNAME=<new_service_ip> ] \
[ NETWORK=<new_network> ] \
[ {NETMASK=<###.###.###.###> | PREFIX=1..128} ] \
[ HWADDR=<new_hardware_address> ] \
[ SITE=<new_site> ]
clmgr query service_ip [ <service_ip>[,<service_ip#2>,...] ]
clmgr delete service_ip {<service_ip>[,<service_ip#2>,,...] | ALL}
clmgr move service_ip <service_ip> \
INTERFACE=<new_interface>

NOTE: if the "NETMASK/PREFIX" attributes are not specified,


the netmask/prefix value for the underlying network
is used.
NOTE: an alias for "service_ip" is "se".

APPLICATION CONTROLLER:
clmgr add application_controller <application_controller> \
STARTSCRIPT="/path/to/start/script" \
STOPSCRIPT ="/path/to/stop/script"
[ MONITORS=<monitor>[,<monitor#2>,<monitor#n>,...] ]

Appendix D. The clmgr man page 511


clmgr modify application_controller <application_controller> \
[ NEWNAME=<new_application_controller_label> ] \
[ STARTSCRIPT="/path/to/start/script" ] \
[ STOPSCRIPT ="/path/to/stop/script" ]
[ MONITORS=<monitor>[,<monitor#2>,<monitor#n>,...] ]
clmgr query application_controller [ <appctlr>[,<appctlr#2>,...] ]
clmgr delete application_controller {<appctlr>[,<appctlr#2>,...] | \
ALL}

NOTE: "appctlr" is an abbreviation for "application_controller".

NOTE: aliases for "application_controller" are "ac" and "app".

APPLICATION MONITOR:
clmgr add application_monitor <monitor> \
TYPE={Process|Custom} \
APPLICATIONS=<appctlr#1>[,<appctlr#2>,<appctlr#n>,...] \
MODE={continuous|startup|both} \
[ STABILIZATION="1 .. 3600" ] \
[ RESTARTCOUNT="0 .. 100" ] \
[ FAILUREACTION={notify|fallover} ] \

Process Arguments:
PROCESSES="pmon1,dbmon,..." \
OWNER="<processes_owner_name>" \
[ INSTANCECOUNT="1 .. 1024" ] \
[ RESTARTINTERVAL="1 .. 3600" ] \
[ NOTIFYMETHOD="</script/to/notify>" ] \
[ CLEANUPMETHOD="</script/to/cleanup>" ] \
[ RESTARTMETHOD="</script/to/restart>" ]

Custom Arguments:
MONITORMETHOD="/script/to/monitor" \
[ MONITORINTERVAL="1 .. 1024" ] \
[ HUNGSIGNAL="1 .. 63" ] \
[ RESTARTINTERVAL="1 .. 3600" ] \
[ NOTIFYMETHOD="</script/to/notify>" ] \
[ CLEANUPMETHOD="</script/to/cleanup>" ] \
[ RESTARTMETHOD="</script/to/restart>" ]

NOTE: "STABILIZATION" defaults to 180


NOTE: "RESTARTCOUNT" defaults to 3

clmgr modify application_monitor <monitor> \


[ NEWNAME=<new_monitor_label> ] \
[ See the "add" action, above, for a list
of supported modification attributes. ]
clmgr query application_monitor [ <monitor>[,<monitor#2>,...] ]
clmgr delete application_monitor {<monitor>[,<monitor#2>,...] | ALL}

NOTE: "appctlr" is an abbreviation for "application_controller".

NOTE: aliases for "application_monitor" are "am" and "mon".

DEPENDENCY:

512 IBM PowerHA SystemMirror 7.1 for AIX


# Temporal Dependency (parent ==> child)
clmgr add dependency \
PARENT=<rg#1> \
CHILD="<rg#2>[,<rg#2>,<rg#n>...]"
clmgr modify dependency <parent_child_dependency> \
[ TYPE=PARENT_CHILD ] \
[ PARENT=<rg#1> ] \
[ CHILD="<rg#2>[,<rg#2>,<rg#n>...]" ]

# Temporal Dependency (start/stop after)


clmgr add dependency \
{STOP|START}="<rg#2>[,<rg#2>,<rg#n>...]" \
AFTER=<rg#1>
clmgr modify dependency \
[ TYPE={STOP_AFTER|START_AFTER} ] \
[ {STOP|START}="<rg#2>[,<rg#2>,<rg#n>...]" ] \
[ AFTER=<rg#1> ]

# Location Dependency (colocation)


clmgr add dependency \
SAME={NODE|SITE} \
GROUPS="<rg1>,<rg2>[,<rg#n>...]"
clmgr modify dependency <colocation_dependency> \
[ TYPE=SAME_{NODE|SITE} ] \
GROUPS="<rg1>,<rg2>[,<rg#n>...]"

# Location Dependency (anti-colocation)


clmgr add dependency \
HIGH="<rg1>,<rg2>,..." \
INTERMEDIATE="<rg3>,<rg4>,..." \
LOW="<rg5>,<rg6>,..."
clmgr modify dependency <anti-colocation_dependency> \
[ TYPE=DIFFERENT_NODES ] \
[ HIGH="<rg1>,<rg2>,..." ] \
[ INTERMEDIATE="<rg3>,<rg4>,..." ] \
[ LOW="<rg5>,<rg6>,..." ]

# Acquisition/Release Order
clmgr add dependency \
TYPE={ACQUIRE|RELEASE} \
{ SERIAL="{<rg1>,<rg2>,...|ALL}" |
PARALLEL="{<rg1>,<rg2>,...|ALL}" }
clmgr modify dependency \
TYPE={ACQUIRE|RELEASE} \
{ SERIAL="{<rg1>,<rg2>,...|ALL}" |
PARALLEL="{<rg1>,<rg2>,...|ALL}" }

clmgr query dependency [ <dependency> ]


clmgr delete dependency {<dependency> | ALL} \
[ TYPE={PARENT_CHILD|STOP_AFTER|START_AFTER| \
SAME_NODE|SAME_SITE}|DIFFERENT_NODES} ]
clmgr delete dependency RG=<RESOURCE_GROUP>

NOTE: an alias for "dependency" is "de".

Appendix D. The clmgr man page 513


TAPE:
clmgr add tape <tape> \
DEVICE=<tape_device_name> \
[ DESCRIPTION=<tape_device_description> ] \
[ START="</script/to/start/tape/device>" ] \
[ START_SYNCHRONOUSLY={no|yes} ] \
[ STOP="</script/to/stop/tape/device>" ] \
[ STOP_SYNCHRONOUSLY={no|yes} ]
clmgr modify tape <tape> \
[ NEWNAME=<new_tape_label> ] \
[ DEVICE=<tape_device_name> ] \
[ DESCRIPTION=<tape_device_description> ] \
[ START="</script/to/start/tape/device>" ] \
[ START_SYNCHRONOUSLY={no|yes} ] \
[ STOP="</script/to/stop/tape/device>" ] \
[ STOP_SYNCHRONOUSLY={no|yes} ]
clmgr query tape [ <tape>[,<tape#2>,<tape#n>,...] ]
clmgr delete tape {<tape> | ALL}

NOTE: an alias for "tape" is "tp".

FILE COLLECTION:
clmgr add file_collection <file_collection> \
FILES="/path/to/file1,/path/to/file2,..." \
[ SYNC_WITH_CLUSTER={no|yes} ] \
[ SYNC_WHEN_CHANGED={no|yes} ] \
[ DESCRIPTION="<file_collection_description>" ]
clmgr modify file_collection <file_collection> \
[ NEWNAME="<new_file_collection_label>" ] \
[ ADD="/path/to/file1,/path/to/file2,..." ] \
[ DELETE={"/path/to/file1,/path/to/file2,..."|ALL} ] \
[ REPLACE={"/path/to/file1,/path/to/file2,..."|""} ] \
[ SYNC_WITH_CLUSTER={no|yes} ] \
[ SYNC_WHEN_CHANGED={no|yes} ] \
[ DESCRIPTION="<file_collection_description>" ]
clmgr query file_collection [ <file_collection>[,<fc#2>,<fc#n>,...]]
clmgr delete file_collection {<file_collection>[,<fc#2>,<fc#n>,...]|
ALL}
clmgr sync file_collection <file_collection>

NOTE: the "REPLACE attribute replaces all existing


files with the specified set
NOTE: aliases for "file_collection" are "fc" and "fi".

SNAPSHOT:
clmgr add snapshot <snapshot> \
DESCRIPTION="<snapshot_description>" \
[ METHODS="method1,method2,..." ] \
[ SAVE_LOGS={false|true} ]
clmgr modify snapshot <snapshot> \
[ NEWNAME="<new_snapshot_label>" ] \
[ DESCRIPTION="<snapshot_description>" ]
clmgr query snapshot [ <snapshot>[,<snapshot#2>,<snapshot#n>,...] ]
clmgr view snapshot <snapshot> \
[ TAIL=<number_of_trailing_lines> ] \

514 IBM PowerHA SystemMirror 7.1 for AIX


[ HEAD=<number_of_leading_lines> ] \
[ FILTER=<pattern>[,<pattern#2>,<pattern#n>,...] ] \
[ DELIMITER=<alternate_pattern_delimiter> ] \
[ CASE={insensitive|no|off|false} ]
clmgr delete snapshot {<snapshot>[,<snapshot#2>,<snapshot#n>,...] |
ALL}
clmgr manage snapshot restore <snapshot> \
[ CONFIGURE={yes|no} ] \
[ FORCE={no|yes} ]

NOTE: the "view" action displays the contents of the ".info"


file for the snapshot, if that file exists.
NOTE: CONFIGURE defaults to "yes"; FORCE defaults to "no".
NOTE: an alias for "snapshot" is "sn".

METHOD:
clmgr add method <method_label> \
TYPE={snapshot|verify} \
FILE=<executable_file> \
[ DESCRIPTION=<description> ]
clmgr modify method <method_label> \
TYPE={snapshot|verify} \
[ NEWNAME=<new_method_label> ] \
[ DESCRIPTION=<new_description> ] \
[ FILE=<new_executable_file> ]

clmgr add method <method_label> \


TYPE=notify \
CONTACT=<number_to_dial_or_email_address> \
EVENT=<event>[,<event#2>,<event#n>,...] \
[ NODES=<node>[,<node#2>,<node#n>,...] ] \
[ FILE=<message_file> ] \
[ DESCRIPTION=<description> ] \
[ RETRY=<retry_count> ] \
[ TIMEOUT=<timeout> ]
NOTE: "NODES" defaults to the local node.

clmgr modify method <method_label> \


TYPE=notify \
[ NEWNAME=<new_method_label> ] \
[ DESCRIPTION=<description> ] \
[ FILE=<message_file> ] \
[ CONTACT=<number_to_dial_or_email_address> ] \
[ EVENT=<cluster_event_label> ] \
[ NODES=<node>[,<node#2>,<node#n>,...] ] \
[ RETRY=<retry_count> ] \
[ TIMEOUT=<timeout> ]

clmgr query method [ <method>[,<method#2>,<method#n>,...] ] \


[ TYPE={notify|snapshot|verify} ]
clmgr delete method {<method>[,<method#2>,<method#n>,...] | ALL} \
[ TYPE={notify|snapshot|verify} ]
clmgr verify method <method>

NOTE: the "verify" action can only be applied to "notify" methods.

Appendix D. The clmgr man page 515


If more than one method exploits the same event, and that
event is specified, then both methods will be invoked.

NOTE: an alias for "method" is "me".

LOG:
clmgr modify logs ALL DIRECTORY="<new_logs_directory>"
clmgr modify log {<log>|ALL} \
[ DIRECTORY="{<new_log_directory>"|DEFAULT} ]
[ FORMATTING={none|standard|low|high} ] \
[ TRACE_LEVEL={low|high} ]
[ REMOTE_FS={true|false} ]
clmgr query log [ <log>[,<log#2>,<log#n>,...] ]
clmgr view log [ {<log>|EVENTS} ] \
[ TAIL=<number_of_trailing_lines> ] \
[ HEAD=<number_of_leading_lines> ] \
[ FILTER=<pattern>[,<pattern#2>,<pattern#n>,...] ] \
[ DELIMITER=<alternate_pattern_delimiter> ] \
[ CASE={insensitive|no|off|false} ]
clmgr manage logs collect \
[ DIRECTORY="<directory_for_collection>" ] \
[ NODES=<node>[,<node#2>,<node#n>,...] ] \
[ RSCT_LOGS={yes|no} ] \

NOTE: when "DEFAULT: is specified for the "DIRECTORY" attribute,


then the original, default IBM PowerHA SystemMirror directory
value is restored.
NOTE: the "FORMATTING" attribute only applies to the "hacmp.out"
log, and is ignored for all other logs.
NOTE: the "FORMATTING" and "TRACE_LEVEL" attributes only apply
to the "hacmp.out" and "clstrmgr.debug" logs, and are
ignored for all other logs.
NOTE: when "ALL" is specified in place of a log name, then the
provided DIRECTORY and REMOTE_FS modifications are applied
to all the logs.
NOTE: when "EVENTS" is specified in place of a log name,
then an events summary report is displayed.

VOLUME GROUP:
clmgr query volume_group

LOGICAL VOLUME:
clmgr query logical_volume

FILE_SYSTEM:
clmgr query file_system

PHYSICAL VOLUME:
clmgr query physical_volume \
[ <disk>[,<disk#2>,<disk#n>,...] ] \
[ NODES=<node>,<node#2>[,<node#n>,...] ] \
[ ALL={no|yes} ]

NOTE: "node" may be either a node name, or a network-


resolvable name (i.e. hostname or IP address).

516 IBM PowerHA SystemMirror 7.1 for AIX


NOTE: "disk" may be either a device name (e.g. "hdisk0")
or a PVID (e.g. "00c3a28ed9aa3512").
NOTE: an alias for "physical_volume" is "pv".

REPORT:
clmgr view report [<report>] \
[ FILE=<PATH_TO_NEW_FILE> ] \
[ TYPE={text|html} ]

clmgr view report {nodeinfo|rginfo|lvinfo|


fsinfo|vginfo|dependencies} \
[ TARGETS=<target>[,<target#2>,<target#n>,...] ] \
[ FILE=<PATH_TO_NEW_FILE> ] \
[ TYPE={text|html} ]

clmgr view report availability \


[ TARGETS=<appctlr>[,<appctlr#2>,<appctlr#n>,...] ] \
[ FILE=<PATH_TO_NEW_FILE> ] \
[ TYPE={text|html} ] \
[ BEGIN_TIME="YYYY:MM:DD" ] \
[ END_TIME="YYYY:MM:DD" ]

NOTE: the currently supported reports are "basic", "cluster",


"status", "topology", "applications", "availability",
"events", "nodeinfo", "rginfo", "networks", "vginfo",
"lvinfo", "fsinfo", and "dependencies".
Some of these reports provide overlapping information, but
each also provides its own, unique information, as well.
NOTE: "appctlr" is an abbreviation for "application_controller".
NOTE: "MM" must be 1 - 12. "DD" must be 1 - 31.
NOTE: if no "BEGIN_TIME" is provided, then a report will be
generated for the last 30 days prior to "END_TIME".
NOTE: if no "END_TIME" is provided, then the current time will
be the default.
NOTE: an alias for "report" is "re".

Usage Examples
==============
clmgr query cluster

* For output that is more easily consumed by other programs, alternative


output formats, such as colon-delimited or XML, may be helpful:
clmgr -c query cluster
clmgr -x query node nodeA

* Most multi-value lists can be specified in either a colon-delimited manner,


or via quoted strings:
clmgr -a cluster_id,version query cluster
clmgr -a "cluster_id version" query cluster

* Combinations of option flags can be used to good effect. For example, to


retrieve a single value for a single attribute:
clmgr -cSa "version" query cluster

Appendix D. The clmgr man page 517


* Attribute-based searching can help filter out unwanted data, ensuring that
only the desired results are returned:
clmgr -v -a "name" q rg nodes="*nodeB*"
clmgr query file_collection files="*rhosts*"

* Application availability reports can help measure application uptime


requirements:
clmgr view report availability

clmgr add cluster tryme nodes=nodeA,nodeB


clmgr add application_controller manage_httpd \
start_script=/usr/local/bin/scripts/start_ihs.sh \
stop_script=/usr/local/bin/scripts/stop_ihs.sh
clmgr add application_monitor monitor_httpd \
type=process \
applications=manage_httpd \
processes=httpd \
owner=root \
mode=continuous \
stabilization=300 \
restartcount=3 \
failureaction=notify \
notifymethod="/usr/local/bin/scripts/ihs_notification.sh" \
cleanupmethod="/usr/local/bin/scripts/web_app_cleanup.sh" \
restartmethod="/usr/local/bin/scripts/start_ihs.sh"
clmgr add resource_group ihs_rg \
nodes=nodeA,nodeB \
startup=OFAN \
fallover=FNPN \
fallback=NFB \
node_priority_policy=mem \
applications=manage_httpd

clmgr view log hacmp.out FILTER=Event:

Suggested Reading
=================
IBM PowerHA SystemMirror for AIX Troubleshooting Guide
IBM PowerHA SystemMirror for AIX Planning Guide
IBM PowerHA SystemMirror for AIX Installation Guide

Prerequisite Information
========================
IBM PowerHA SystemMirror for AIX Concepts and Facilities Guide

Related Information
===================
IBM PowerHA SystemMirror for AIX Administration Guide

518 IBM PowerHA SystemMirror 7.1 for AIX


Related publications

The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this book.

IBM Redbooks
The following IBM Redbooks publication provides additional information about the topic in this
document. Note that it might be available in softcopy only.
 Best Practices for DB2 on AIX 6.1 for POWER Systems, SG24-7821
 DS8000 Performance Monitoring and Tuning, SG24-7146
 IBM AIX Version 7.1 Differences Guide, SG24-7910
 IBM System Storage DS8700 Architecture and Implementation, SG24-8786
 Implementing IBM Systems Director 6.1, SG24-7694
 Personal Communications Version 4.3 for Windows 95, 98 and NT, SG24-4689
 PowerHA for AIX Cookbook, SG24-7739

You can search for, view, download or order this document and other Redbooks, Redpapers,
Web Docs, draft, and additional materials, at the following website:
ibm.com/redbooks

Other publications
These publications are also relevant as further information sources:
 Cluster Management, SC23-6779
 PowerHA SystemMirror for IBM Systems Director, SC23-6763
 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Administration Guide,
SC23-6750
 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Concepts and Facilities
Guide, SC23-6751
 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Installation Guide,
SC23-6755
 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Master Glossary, SC23-6757
 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Planning Guide,
SC23-6758-01
 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Programming Client
Applications, SC23-6759
 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Smart Assist Developer’s
Guide, SC23-6753
 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Smart Assist for DB2 user’s
Guide, SC23-6752

© Copyright IBM Corp. 2011. All rights reserved. 519


 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Smart Assist for Oracle
User’s Guide, SC23-6760
 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Smart Assist for WebSphere
User’s Guide, SC23-6762
 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Troubleshooting Guide,
SC23-6761

Online resources
These websites are also relevant as further information sources:
 IBM PowerHA SystemMirror for AIX
http://www.ibm.com/systems/power/software/availability/aix/index.html
 PowerHA hardware support matrix
http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105638
 IBM PowerHA High Availability wiki
http://www.ibm.com/developerworks/wikis/display/WikiPtype/High%20Availability
 Implementation Services for Power Systems for PowerHA for AIX
http://www.ibm.com/services/us/index.wss/offering/its/a1000032
 IBM training classes for PowerHA SystemMirror for AIX
http://www.ibm.com/training

Help from IBM


IBM Support and downloads
ibm.com/support

IBM Global Services


ibm.com/services

520 IBM PowerHA SystemMirror 7.1 for AIX


Index
Application Availability and Configuration reports 358
Symbols application configuration 86
/etc/cluster/rhosts file 73 application controller
collection monitoring 202 configuration 91
populating 183 versus application server 67
rolling migration 186 application list 353
snapshot migration 168 application monitoring 368
/etc/filesystems file 318 Application Name field tip 143
/etc/hosts file, collection monitoring 202 application server
/etc/inittab file, cluster monitoring 206 clmgr command 120
/etc/services file 139 versus application controller 67
/etc/syslogd.conf file, cluster monitoring 206 application startup, testing with Startup Monitoring config-
/tmp/clmigcheck/clmigcheck.log 161 ured 298
/usr/es/sbin/cluster/utilities/ file 234 architecture
/var/adm/ras/syslog.caa log file 229 changes for RSCT 3.1 3
/var/hacmp/log/clutils.log file, clmgr debugging 131 IBM Systems Director 22
/var/log/clcomd/clcomd.log file 313 PowerHA SystemMirror 1
#5273/#5735 PCI-Express Dual Port Fibre Channel Autonomic Health Advisor File System (AHAFS) 11
Adapter 499 files used in RSCT 12

A B
-a option bootstrap repository 225
clmgr command 109 built-in serial ports 493–494
wildcards 110
ADAPTER_DOWN event 12
adapters C
Ethernet 499 -c flag 209
fibre channel 498 CAA (Cluster Aware AIX) 7, 225
for the repository disk 49 /etc/filesystems file 318
InfiniBand 500 AIX 6.1 and 7.1 3
PCI bus 500 central repository 9
SAS 499 changed PVID of repository disk 322
SCSI and iSCSI 500 chcluster command 480
adaptive failover 35, 102 clcomdES 24
Add Network tab 344 cluster after the node restarts 317
adding on new volume group 416 cluster commands 477
Additional Properties tab 257 cluster creation 154, 318
agent password 328 cluster environment management 11
AHAFS (Autonomic Health Advisor File System) 11 cluster services not active message 323
files used in RSCT 12 clusterconf command 481
AIX collecting debug information for IBM support 231
commands and log files 216 commands and log files 224
disk and dev_group association 443 communication 156
importing volume groups 383 daemons 8
installation of IBM Systems Director 327 disk fencing 37
volume group configuration 381 file sets 7, 179
AIX 6.1 67 initial cluster status 82
AIX 6.1 TL6 152 log files for troubleshooting 306
for migration 47 lscluster command 478
upgrading to 153 mkcluster command 478
AIX 7.1 support of PowerHA 6.1 193 previously used repository disk 316
AIX BOS components removal of volume group when rmcluster does not
installation 59 320
prerequisites for 44 repository disk 9
AIX_CONTROLLED interface state 18 repository disk replacement 317

© Copyright IBM Corp. 2011. All rights reserved. 521


rmcluster command 480 return of only one value 110
RSCT changes 8 service address 118
services, adding a shared disk 173 simple XML format 130
subsystem group active 208 syntax 113
subsystem guide 208 usage examples 106
subsystems 202 using help 111
support in RSCT v3.1 3 -v option 110
switch from Group Services 156 clmgr debugging, /var/hacmp/log/clutils.log file 131
troubleshooting 316 clmgr man page 501
volume group clmgr online cluster start_cluster command 129
already in use 320 clmgr query cluster command 108
previously used 320 clmgr query command 107, 122
zone 211 clmgr sync cluster command 124
Can’t find what you are looking for ? 66 clmgr utility 65, 241
Capture Snapshot window 345 cluster configuration 104
CCI instance 424 PowerHA log files 307
central cluster repository-based communication (DP- query action 242
COM) interface 15 view action 243
states 15 clmgr verify cluster command 124
central repository 9 clmgr view log command 307
chcluster command 480 clmigcheck command 153
description 480 menu options 164
examples 481 process overview 158
flags 481 profile 157
syntax 480 program 157, 167
checking the configuration 164 running 159
clcomd instances 157 running on one node 168
clcomd subsystem 157 clmigcheck script 308
clcomdES daemon 157 clmigcheck.txt file 160
clcomdES subsystem 157 clmigcleanup process 155
clconfd subsystem 8 clRGinfo command 284
clconvert_snapshot command 168 clshowres command 393
cldump utility 233, 312 clstat utility 231, 312
clevmgrdES subsystem 31 interactive mode 232
CLI -o flag 233
cluster creation 340 cltopinfo command 82, 118, 309
cluster creation with SystemMirror plug-in 339 cluster
command help 341 adding 385
examples of command usage for resource group man- adding networks 387
agement 360 adding sites 386
clinfo command 127 checking in rolling migration 191
clmgr add cluster command 114, 118 configuration 385
clmgr add resource_group command 111 configuration synchronization 454
clmgr add resource_group -h command 111 creation
clmgr command 131, 501 CAA 318
-a option 109 with CLI 340
actions 104 event flow when a node joins 39
alternative output formats 130 IP address 67
application server 120 menu 253
cluster definition synchronization 124 multicast IP address, configuration 73
cluster start 127 name 114
colon-delimited format 130 removal of 103
configuring a PowerHA cluster 112 restarting 183
displaying log file content 132 starting 403
enhanced search capability 109 status 205
error messages 106 topology, custom configuration 78
log file 130 Cluster Applications and Resources 27
new cluster configuration 113 Cluster Aware AIX (CAA) 7, 225
object classes 105 /etc/filesystems file 318
resource group 120 AIX 6.1 and 7.1 3

522 IBM PowerHA SystemMirror 7.1 for AIX


central repository 9 undoing local changes 370
chcluster command 480 verification and synchronization 360
clcomdES 24 CLI 363
cluster after node restarts 317 GUI 360
cluster commands 477 Cluster Configuration Report 366
cluster creation 154, 318 cluster creation 333–334
cluster environment management 11 common storage 337
clusterconf command 481 host names in FQDN format 75
collecting debug information for IBM support 231 SystemMirror plug-in CLI 339
commands and log files 224 SystemMirror plug-in wizard 334
communication 156 cluster event management 11
daemons 8 cluster implementation
disk fencing 37 hardware requirements 44
file sets 7, 179 migration planning 46
initial cluster status 82 network 50
log files for troubleshooting 306 planning for high availability 43
lscluster command 478 PowerHA 7.1 considerations 46
mkcluster command 478 prerequisites for AIX BOS components 44
repository disk 9 prerequisites for RSCT components 44
rmcluster command 480 software requirements 44
RSCT changes 8 storage 48
services, adding a shared disk 173 supported hardware 45
subsystem group active 208 cluster interfaces listing 234
subsystem guide 208 cluster management 333
subsystems 202 functionality 343
support in RSCT v3.1 3 modification functionality 349
troubleshooting 316 storage management 345
volume group, previously used 320 SystemMirror plug-in 341
zone 211 CLI 347
cluster communication 13 SystemMirror plug-in GUI wizard 341–342
heartbeat configuration 20 Cluster Management window 343
interfaces 13 Cluster Management Wizard 342
AIX_CONTROLLED 18 cluster modification locks 369
central cluster repository-based communication cluster monitoring 201
(DPCOM) 15 /etc/inittab file 206
IP network interfaces 13 /etc/syslogd.conf file 206
RESTRICTED 18 /usr/es/sbin/cluster/utilities/ file tools 234
SAN-based communication (SFWCOM) 14 /var/adm/ras/syslog.caa log file 229
node status 18 active cluster 368
round-trip time 20 AIX commands and log files 216
cluster configuration 65 application monitoring 368
clmgr utility 65, 104 CAA commands and log files 224
defining 70 CAA debug information for IBM support 231
event failure recovery 370 CAA subsystem group active 208
node names 70 CAA subsystem guide 208
problem determination data collection 370 CAA subsystems 202
recovery from issues 369 cldump utility 233
resource groups 95 clmgr utility 241
SMIT menu 65–66 clstat utility 231
custom configuration 68, 78 Cluster Configuration Report 366
repository disk and cluster multicast IP address cluster modification locks 369
73 cluster status 205, 208, 218
resource group dependencies 96 Cluster Topology Configuration Report 367
resources and applications configuration 86 common agent subsystems 205
resources configuration 68 disk configuration 203, 207, 216
typical configuration 67, 69 Group Services 218
starting all nodes 129 information collection
SystemMirror for IBM Systems Director 133 after cluster configuration 206
SystemMirror plug-in 65 after cluster is running 216
test environment 68 before configuration 202

Index 523
lscluster command for cluster information 209 common storage 337
map view 365 communication interfaces, adding 387
multicast information 205, 207, 217 communication node status 18
network configuration and routing table 219 communication path 314
network interfaces configuration 203 components, Reliable Scalable Cluster Technology
ODM classes 236 (RSCT) 2
of activities before starting a cluster 364 configuration
PowerHA groups 203 AIX disk and dev_group association 443
recovery from configuration issues 369 cluster 385
repository disk 206 adding 385
repository disk, CAA, solidDB 225 adding a node 386
routing table 204 adding communication interfaces 387
solidDB log files 230 cluster resources and resource group 388
subsystem services status 366 Hitachi TrueCopy/HUR resources 429
SystemMirror plug-in 364 PowerHA cluster 65
tcipdump, iptrace, mping utilities 220 recovery from issues 369
tools 231 troubleshooting 312
topology view 364 verification and synchronization 360
cluster node CLI 363
installation of SystemMirror agent 330 GUI 360
status and mapping 287 verification of Hitachi TrueCopy/HUR 453
Cluster Nodes and Networks 27 volume groups and file systems on primary site 381
cluster resources, configuration 388 Configure Persistent Node IP Label/Address menu 31
Cluster services are not active message 323 CPU starvation 292
Cluster Snapshot menu 31 Create Dependency function 357
Cluster Standard Configuration menu 29 creation
cluster status 208 custom resource group 351
cluster still stuck in migration condition 308 predefined resource group 353
cluster testing 259, 297 resource group 349
CPU starvation 292 verifying 355
crash in node with active resource group 289 C-SPOC
dynamic node priority 302 adding Global Mirror pair to existing volume group
Group Services failure 296 405
loss of the rootvg volume group 286, 289 creating a volume group 412
network failure 283 disaster recovery 373
network failure simulation 282 on other LVM operations 422
repository disk heartbeat channel 269 storage resources and resource groups 86
rootvg system event 286 cthags, grpsvcs 6
rootvg volume group offline 288 Custom Cluster Configuration menu 30
SAN-based heartbeat channel 260 custom configuration 68, 78
cluster topology 385 verifying and synchronizing 81
Cluster Topology Configuration Report 367
cluster topology information 235
CLUSTER_OVERRIDE environment variable 36 D
deleting the variable 37 -d flag 214
error message 37 daemons
clusterconf command 481 CAA 8
description 481 clcomd 8
examples 482 clconfd 8
flags 482 cld 8
syntax 481 failure in Group Services 12
clutils file 306 data collection, problem determination 370
Collect log files button 347 DB2
collection monitoring installation on nodes for Smart Assist 136
/etc/cluster/rhosts file 202 instance and database on shared 137
/etc/hosts file 202 debug information, collecting for IBM support 231
colon-delimited format of clmgr command 130 default value 122
command dev_group association and AIX disk configuration 443
help 341 disaster recovery
profile 157 C-SPOC operations 373
DS8700 requirements 372

524 IBM PowerHA SystemMirror 7.1 for AIX


Global Mirror 371 configuration verification 453
adding a cluster 385 considerations 421
adding a logical volume 407 creating volume groups and file systems on repli-
adding a node 386 cated disks 447
adding a pair to a new volume group 411 defining managed replicated resource to PowerHA
adding a pair to an existing volume group 404 451
adding communication interfaces 387 failover testing 454
adding networks 387 graceful site failover 455, 460
adding new logical volume on new volume group HORCM instances 426
416 horcm.conf files 425
adding sites 386 increasing size of existing file system 468
AIX volume group configuration 381 LVM administration of replicated pairs 463
cluster configuration 385 management 422
cluster resources and resource group configura- minimum connectivity requirements 420
tion 388 planning 420
considerations 373 replicated pair creation 432
creating new volume group 412 resource configuration 429
DS8700 requirements 372 rolling site failure 457, 461
failover testing 393 site re-integration 459, 462
FlashCopy relationship creation 379 software prerequisites 420
Global Copy relationship creation 378 Discovery Manager 249
graceful site failover 395 disk configuration 203, 207
importing new volume group to remote site 416 AIX 216
importing volume groups 383 disk heartbeat network, removing 310
installing DSCLI client software 373 DNP (dynamic node priority)
LVM administration of replicated resources 404 configuration 102
mirror group 391 script for the nodes 102
planning 372 testing 302
PPRC path creation 377 DPCOM 15
relationship configuration 377 dpcom node connection 83
resource configuration 374–375 dpcomm interface 213
resource definition 389 DPF database support 139
resources and resource group definition 391 DS storage units 495
rolling site failure 398 DS8000 Global Mirror Replicated Resources field 393
service IP definition 388 DS8700 disaster recovery requirements 372
session identifier 379 DSCLI client software 373
sessions for involved LSSs 380 duplicated events, filtering 12
site re-integration 400 dynamic node priority (DNP)
size increase of existing file system 410 adaptive failover 35
software prerequisites 372 configuration 102
source and target volumes 380 script for the nodes 102
storage agent 389 testing 302
storage system 390
symmetrical configuration 376
synchronization and verification of cluster configu- E
ration 416 ECM volume group 313
testing fallover after adding new volume group Edit Advanced Properties button 344
417 error messages
volume group and file system configuration 381 clmgr command 106
Hitachi TrueCopy/HUR 419 CLUSTER_OVERRIDE variable 37
adding logical volume 466 Ethernet 499
adding LUN pair to new volume group 469 event failure recovery 370
adding LUN pairs event flow 38
to existing volume group 463 node down processing normal with takeover 41
adding replicated resources 451 startup processing 38
to a resource group 452 when another node joins the cluster 39
AIX disk and dev_group association 443 export DISPLAY 329
asynchronous pairing 439
CCI software installation 422 F
cluster configuration synchronization 454 failback of PPRC pairs

Index 525
primary site 402 disaster recovery 371
secondary site 400 DS8700 requirements 372
failbackpprc command 402 failover testing 393
failover of PPRC pairs back to primary site 401 graceful site failover 395
failover testing 393 rolling site failure 398
graceful site failover 395 site re-integration 400
Hitachi TrueCopy/HUR 454 importing new volume group to remote site 416
rolling site failure 398 importing volume groups 383
site re-integration 400 LVM administration of replicated resources 404
failoverpprc command 383 mirror group 391
fallover testing planning for disaster recovery 372
after adding new volume group 476 software prerequisites 372
after making LVM changes 469 PPRC and SPPRC file sets 372
fast path, smitty cm_apps_resources 95 relationship configuration 377
fcsX FlashCopy relationship creation 379
device busy 57 Global Copy relationship creation 378
X value 57 PPRC path creation 377
Fibre Channel adapters 495, 498 session identifier 379
DS storage units 495 sessions for involved LSSs 380
IBM XIV 496 source and target volumes 380
SAN-based communication 57 resource configuration 374
SVC 497 prerequisites 375
file collection and logs management 346 source and target volumes 375
file collection creation 346 resource definition 389
file sets 7, 61 resources and resource group definition 391
installation 58, 64 service IP definition 388
PowerHA 62 session identifier 379
PPRC and SPPRC 372 sessions for all involved LSSs 380
Smart Assist 91 size increase of existing file system 410
Smart Assist for DB2 136 source and target volumes 380
file systems 121 storage agent 389
configuration 381 storage system 390
creation with volume groups 447 symmetrical configuration 376
importing for Smart Assist for DB2 137 synchronization and verification of cluster configura-
increasing size 468 tion 416
size increase 410 testing fallover after adding new volume group 417
FILTER argument 132 gossip protocol 13
FlashCopy relationship creation 379 graceful site failover 395, 455, 460
FQDN format on host names 75 moving resource group to another site 395
Group Services 2
daemon failure 12
G failure 296
GENXD Replicated Resources field 393 information 218
Global Copy relationships 378 subsystem name
Global Mirror cthags 6
adding a cluster 385 grpsvcs 6
adding a logical volume 407 switch to CAA 156
adding a node 386 grpsvcs
adding a pair to a new volume group 411 cthags 6
adding a pair to an existing volume group 404 SRC subsystem 156
adding communication interfaces 387
adding networks 387
adding new logical volume on new volume group 416 H
adding sites 386 HACMPtopsvcs class 238
AIX volume group configuration 381 halt -q command 289
cluster configuration 385 hardware configuration
cluster resources and resource group configuration Fibre Channel adapters for SAN-based communica-
388 tion 57
considerations for disaster recovery 373 SAN zoning 54
creating new volume group 412 shared storage 55
C-SPOC operations 373 test environment 54

526 IBM PowerHA SystemMirror 7.1 for AIX


hardware requirements 44 IBM support, collecting CAA debug information 231
multicast IP address 45 IBM Systems Director 21
repository disk 45 advantages 21
SAN 45 agent file 58
supported hardware 45 agent password 328
heartbeat architecture 22
considerations for configuration 20 availability menu 251
testing 260 CLI (smcli interface) 257
heartbeat channel, repository disk 269 cluster configuration 133
help in clmgr command 111 cluster creation 333–334
high availability, planning a cluster implementation 43 cluster management 333
Hitachi CCI software 422 configuration 328
installation in a non-root directory 423 installation 325–326
installation in root directory 423 AIX 327
installing a newer version 424 hardware requirements 326
Hitachi TrueCopy/Hitachi Universal Replicator (Hitachi login page 247
TrueCopy/HUR) 419 root user 247
Hitachi TrueCopy/HUR smadmin group 247
adding LUN pairs to existing volume group 463 smcli utility 22
adding LUN pairs to new volume group 469 status of common agent subsystems 205
adding new logical volume 466 SystemMirror plug-in 21, 65, 329
AIX disk and dev_group association 443 systems and agents to discover 250
assigning LUNs to hosts 429 web interface 246
asynchronous pairing 439 welcome page 248
CCI instance 424 IBM XIV 496
CCI software installation 422 ifconfig en0 down command 283
cluster configuration synchronization 454 IGMP (Internet Group Management Protocol) 14
considerations 421 InfiniBand 500
creating volume groups and file systems on replicated information collection after cluster is running 216
disks 447 installation
failover testing 454 AIX BOS components 59
graceful site failover 455, 460 common agent 331
HORCM instances 426 DSCLI client software 373
horcm.conf files 425 hardware configuration 54
increasing size of existing file system 468 IBM Systems Director 325–326
management 422 hardware requirements 326
minimum connectivity requirements 420 on AIX 327
replicated pair creation 432 PowerHA file sets 58, 62
rolling site failure 457, 461 PowerHA software example 59
site re-integration 459, 462 PowerHA SystemMirror 7.1 for AIX Standard Edition
software prerequisites 420 53
HORCM 444 SystemMirror agent 332
instance 426 SystemMirror plug-in 325, 329
horcm.conf files 425 agent installation 330
host groups, assigning LUNs 429 server installation 329
host names troubleshooting 312
FQDN format 75 volume group
network planning 51 consideration 64
hostname command 168 conversion 64
installp command 133
interfaces
I excluding configured 213
-i flag 211 states 14
IBM storage 495 up, point of contact down 20
Fibre Channel adapters 495 Internet Group Management Protocol (IGMP) 14
DS storage units 495 Inter-site Management Policy 452, 460, 462
IBM XIV 496 invalid events, filtering 12
SVC 497 IP address 67, 94
NAS 497 snapshot migration 166
SAS 498 IP network interfaces 13, 118
SCSI 498

Index 527
states 14 M
IPAT via aliasing subnetting requirements 51 -m flag 209
IPAT via replacement configuration 162 management interfaces 13
iptrace utility 220, 222 map view 365
iSCSI adapters 500 migration
AIX 6.1 TL6 152
L CAA cluster creation 154
LDEV hex values 433 clcomdES and clcomd subsystems 157
log files 224 considerations 152
AIX 216 planning 46
clmgr command 130 AIX 6.1 TL6 47
displaying content using clmgr command 132 PowerHA 7.1 151
PowerHA 306 premigration checking 153, 157
troubleshooting 306 process 153
logical subsystem (LSS), Global Mirror session definition protocol 155
380 snapshot 161
logical volume 416, 466 SRC subsystem changes 157
adding 407 stages 153
Logical Volume Manager (LVM) switch from Group Services to CAA 156
administration of Global Mirror replicated resources troubleshooting 308
404 clmigcheck script 308
commands over repository disk 207 cluster still stuck in migration condition 308
lppchk command 64 non-IP networks 308
lsavailpprcport command 377 upgrade to AIX 6.1 TL6 153
lscluster command 18, 82, 478 upgrade to PowerHA 7.1 154
-c flag 82, 209 mirror group 389, 391
cluster information 209 mkcluster command 478
-d flag 214 description 479
description 478 examples 479
examples 478 flags 479
-i flag 14–15, 20, 83, 211, 275 syntax 478
output 16 mksnapshot command 347
-m flag 18, 20, 82, 209, 273 mkss alias 347
-s flag 215 modification functionality 349
syntax 478 monitoring 201
zone 211 /etc/cluster/rhosts file 202
lscluster -m command 18 /etc/hosts file 202
lslpp command 59 /etc/inittab file 206
lsmap -all command 287 /etc/syslogd.conf file 206
lspv command 127 /usr/es/sbin/cluster/utilities/ file tools 234
lssi command 377 /var/adm/ras/syslog.caa log file 229
lssrc -ls clstrmgrES command 163 AIX commands and log files 216
lssrc -ls cthags command 218 CAA commands and log files 224
lssrc -ls grpsvcs command 218 CAA debug information for IBM support 231
lsvg command 10 CAA subsystem group active 208
LUN pairs CAA subsystem guide 208
adding to existing volume group 463 CAA subsystems 202
adding to new volume group 469 cldump utility 233
LUNs clmgr utility 241
assigning to hosts 429 clstat utility 231
LDEV hex values 433 cluster status 205, 208, 218
LVM (Logical Volume Manager) common agent subsystems 205
commands over repository disk 207 disk configuration 203, 207, 216
C-SPOC 422 Group Services 218
Global Mirror replicated resources 404 IBM Systems Director web interface 246
lwiplugin.bat script 330 information collection
lwiplugin.sh script 330 after cluster configuration 206
after cluster is running 216
before configuration 202
lscluster command for cluster information 209

528 IBM PowerHA SystemMirror 7.1 for AIX


multicast information 205, 207, 217 O
network configuration and routing table 219 object classes
network interfaces configuration 203 aliases 105
ODM classes 236 clmgr 105
PowerHA groups 203 supported 106
repository disk 206 Object Data Manager (ODM) classes 236
repository disk, CAA, solidDB 225 ODM (Object Data Manager) classes 236
routing table 204 odmget command 237
solidDB log files 230 OFFLINE DUE TO TARGET OFFLINE 33
tcpdump, iptrace, mping utilities 220 offline migration 191
tools 231 manually specifying an address 197
Move Resource Group 395, 455 planned target configuration 193
mping utility 220, 224 planning 191
mpio_get_config command 56 PowerHA 6.1 support on AIX 7.1 193
multicast address 51 procedure 195
multicast information 205, 207 process flow 194
netstat command 217 starting configuration 192
multicast IP address
configuration 73
hardware requirements 45 P
not specified 74 pausepprc command 383
multicast traffic monitoring utilities 220 PCI bus adapters 500
multipath driver 50 physical volume ID (PVID) 88
pick list 90
planning
N cluster implementation for high availability 43
NAS (network-attached storage) 497 hardware requirements 44
netstat command 217 migration 46
network configuration 219 network 50
network failure simulation 282–283 PowerHA 7.1 considerations 46
testing environment 282 software requirements 44
network planning 50 storage 48
host name and node name 51 point of contact 18
multicast address 51 down, interface up 20
network interfaces 51 point-of-contact status 82
single adapter networks 51 POWER Blade servers 494
subnetting requirements for IPAT via aliasing 51 Power Systems 492
virtual Ethernet 51 POWER5 systems 492
Network tab 256 POWER6 systems 493
network-attached storage (NAS) 497 POWER7 Systems 494
networks PowerHA 1
addition of 387 available clusters 253
interfaces 51, 118 cluster configuration 65
configuration 203 clmgr command 112
Never Fall Back (NFB) 121 clmgr utility 104
node custom configuration 68, 78
crash with an active resource group 289 resource group dependencies 96
down processing normal with takeover 41 resources and applications configuration 86
event flow when joining a cluster 39 resources configuration 68
failure 41 SMIT 66
status 18 SystemMirror for IBM Systems Director 133
node names typical configuration 67, 69
cluster configuration 70 cluster topology with smitty sysmirror 14
network planning 51 defining Hitachi TrueCopy/HUR managed replicated
NODE_DOWN event 12 resource 451
nodes groups, cluster monitoring 203
adding 386 installation
AIX 6.1 TL6 152 AIX BOS components 59, 62
starting all in a cluster 129 file sets 62
non-DPF database support 139 RSCT components 59
non-IP networks 308

Index 529
SMIT tree 483 replicated disks, volume group and file system creation
supported hardware 491 447
SystemMirror replicated pairs 432
architecture foundation 1 LVM administration 463
management interfaces 13 replicated resources
SystemMirror 7.1 1 adding 451
SystemMirror 7.1 features 23 adding to a resource group 452
PowerHA 6.1 support on AIX 7.1 193 defining to PowerHA 451
PowerHA 7.1 36 LVM administration of 404
considerations 46 reports
file set installation 58 Application Availability and Configuration 358
migration to 151–153 Cluster Configuration Report 366
software installation 59 Cluster Topology Configuration 367
SystemMirror plug-in 21 repository disk 9
volume group consideration 64 changed PVID 322
PPRC cluster 225
failing back pairs to primary site 402 cluster monitoring 206
failing back pairs to secondary site 400 configuration 73
failing over pairs back to primary site 401 hardware requirements 45
file sets 372 heartbeat channel testing 269
path creation 377 LVM command support 207
Prefer Primary Site policy 452, 462 node connection 83
premigration checking 153, 157 previously used for CAA 316
previous version 1 replacement 317
primary node 120–121 snapshot migration 166
problem determination data collection 370 resource configuration, Global Mirror 374
PVID (physical volume ID) 88, 115 prerequisites 375
of repository disk 322 source and target volumes 375
symmetrical configuration 376
resource group
Q adding 392
query action 242 adding from C-SPOC 86
adding Hitachi TrueCopy/HUR replicated resources
R 452
raidscan command 463 adding resources 392
Redbooks Web site, Contact us xiv application list 353
redundant heartbeat testing 260 circular dependencies 33
refresh -s clcomd command 69 clmgr command 120
refresh -s syslogd command 306 configuration 95, 388
relationship configuration 377 crash in node 289
FlashCopy relationship creation 379 creation
Global Copy relationship creation 378 verifying 355
PPRC path creation 377 with SystemMirror plug-in GUI wizard 349
session identifier 379 custom creation 351
sessions for involved LSSs 380 definition 391
source and target volumes 380 dependencies, Start After 32
Reliable Scalable Cluster Technology (RSCT) 2 management 355
AHAFS files 12 CLI 359
architecture changes for v3.1 3 CLI command usage 360
CAA support 3 functionality 357
cluster security services 2 wizard access 355
components 2 moving to another site 395
Group Services (grpsvcs) 2 mutual-takeover dual-node implementation 68
installation 59 OFFLINE DUE TO TARGET OFFLINE 33
PowerHA 5 parent/child dependency 33
prerequisites for 44 predefined creation 353
Remote Monitoring and Control 2 removal 358
resource managers 2 status change 359
Topology Services 2 resource group dependencies
remote site, importing new volume group 416 Start After and Stop After configuration 96

530 IBM PowerHA SystemMirror 7.1 for AIX


Stop After 32 S
Resource Group tab 255 -s flag 215
Resource Groups menu 255 SAN
resource management 32 hardware requirements 45
adaptive failover 35 zoning 54
dynamic node priority 35 SAN fiber communication
Start After and Stop After resource group dependen- enabling 15
cies 32 unavailable 15
user-defined resource type 34 SAN Volume Controller (SVC) 497
resource managers 2 SAN-based communication
Resource Monitoring and Control (RMC) subsystem 2 channel 54
resource type node connection 83
management 100 Fibre Channel adapters 57
user-defined 100 SAN-based communication (SFWCOM) interface 14
resources state 15
adding to a resource group 392 SAN-based heartbeat channel testing 260, 263
configuration 68, 86 SAS (serial-attached SCSI) 498
RESTRICTED interface state 18 adapters 499
RMC (Resource Monitoring and Control) subsystem 2 SCSI 498
rmcluster command 316, 480 adapters 500
description 480 security keys 313
example 480 serial-attached SCSI (SAS) 498
flags 480 adapters 499
removal of volume group 320 service address 118
syntax 480 defined 94
rolling migration 177 service IP 388
/etc/cluster/rhosts file 183, 186 SFWCOM 14
checking newly migrated cluster 191 sfwcom
migrating the final node 188 interface 213
migrating the first node 179 node connection 83
migrating the second node 185 shared disk, adding to CAA services 173
planning 178 shared storage 55
procedure 178 for repository disk 48
restarting the cluster 183 shared volume group
troubleshooting 191 importing for Smart Assist for DB2 137
rolling site failure 398, 457, 461 Smart Assist for DB2 instance and database creation
root user 247 137
rootvg system event 31 simple XML format of the clmgr command 130
testing 286 single adapter networks 51
rootvg volume group SIRCOL 229
cluster node status and mapping 287 site re-integration 400, 459, 462
PowerHA logs 289 failback of PPRC pairs to primary site 402
testing offline 288 failback of PPRC pairs to secondary site 400
testing the loss of 286 failover of PPRC pairs back to primary site 401
round trip time (rtt) 20, 213 starting the cluster 403
routing table 204, 219 site relationship 460
RSCT (Reliable Scalable Cluster Technology) 2, 59 sites, addition of 386
AHAFS files 12 smadmin group 247
architecture changes for v3.1 3 Smart Assist 91
CAA support 3 new location 29
changes 8 Smart Assist for DB2 135
cluster security services 2 configuration 147
components 2 DB2 installation on both nodes 136
Group Services 2 file set installation 136
PowerHA 5 implementation with SystemMirror cluster 139
prerequisites for 44 instance and database on shared volume group 137
Remote Monitoring and Control subsystem 2 log file 149
resource managers 2 prerequisites 136
Topology Services 2 shared volume group and file systems 137
rtt (round-trip time) 20, 213 starting 141

Index 531
steps before starting 139 socksimple command 263
SystemMirror configuration 139 software prerequisites 372
updating /etc/services file 139 software requirements 44
smcli command 257 solid subsystem 8
smcli lslog command 348 solidDB 225
smcli mkcluster command 341 log file names 231
smcli mkfilecollection command 348 log files 230
smcli mksnapshot command 347 SQL interface 228
smcli synccluster -h -v command 363 status 226
smcli undochanges command 363 source and target volumes
smcli utility 22 disaster recovery 375
smit bffcreate command 62 including in Global Mirror session 380
smit clstop command 163 SPPRC file sets 372
SMIT menu 65 SRC subsystem changes during migration 157
changes 66 Start After resource group dependency 32, 297
configuration 66 configuration 96
custom configuration 68, 78 standard configuration testing 298
locating available options 66 testing 297
resource group dependencies configuration 96 Startup Monitoring, testing application startup 298
resources and applications configuration 86 startup processing 38
resources configuration 68 Stop After resource group dependency 32
typical configuration 67, 69 configuration 96
SMIT panel 25 storage
Cluster Snapshot menu 31 agent 389
Cluster Standard Configuration menu 29 Fibre Channel adapters 495
Configure Persistent Node IP Label/Address menu management 345
31 NAS 497
Custom Cluster Configuration menu 30 resources, adding from C-SPOC 86
SMIT tree 25 SAS 498
smitty clstart 28 SCSI 498
smitty clstop 28 system 389–390
smitty hacmp 26 Storage Framework Communication (sfwcom) 213
smit sysmirror fast path 66 storage planning 48
SMIT tree 25, 483 adapters for the repository disk 49
smitty clstart command 28 multipath driver 50
smitty clstop command 28 shared storage for repository disk 48
smitty cm_apps_resources fast path 95 System Storage Interoperation Center 50
smitty hacmp command 26 Storage tab 256
smitty sysmirror command 26 subnetting requirements for IPAT via aliasing 51
PowerHA cluster topology 14 subsystem services status 366
snapshot supported hardware 45, 491
conversion 168 supported storage, third-party multipathing software
failure to restore 169 49–50
restoration 169 SVC (SAN Volume Controller) 497
snapshot migration 161, 164 symmetrical configuration 376
/etc/cluster/rhosts file 168 synccluster command 363
adding shared disk to CAA services 173 synchronization of cluster configuration 360, 416
AIX 6.1.6 and clmigcheck installation 163 CLI 363
checklist 176 GUI 360
clmigcheck program 167 syslog facility 306
cluster verification 175 system event, rootvg 31
conversion 168 System Mirror 7.1
procedure 163 resource management 32
process overview 162 rootvg system event 31
repository disk and multicast IP addresses 166 System Storage Interoperation Center 50
restoration 169 SystemMirror
snapshot creation 163 agent installation 332
stopping the cluster 163 cluster and Smart Assist for DB2 implementation 139
uninstalling SystemMirror 5.5 168 configuration for Smart Assist for DB2 139
SNMP, clstat and cldump utilities 312 SystemMirror 5.5, uninstalling 168

532 IBM PowerHA SystemMirror 7.1 for AIX


SystemMirror 7.1 loss of the rootvg volume group 286, 289
CAA disk fencing 37 network failure 283
CLUSTER_OVERRIDE environment variable 36 network failure simulation 282
deprecated features 24 repository disk heartbeat channel 269
event flow differences 38 environment 270
features 23 rootvg system event 286
installation of the Standard Edition 53 rootvg volume group offline 288
new features 24 SAN-based heartbeat channel 260, 263
planning a cluster implementation for high availability Start After resource group dependency 297
43 third-party multipathing software 49–50
SMIT panel 25 timeout value 36
supported hardware 45 top-level menu 67
SystemMirror plug-in 21 last two items 67
agent installation 330 Topology Services (topsvcs) 2
CLI for cluster creation 339 topology view 364
cluster creation and management 333 touch /tmp/syslog.out command 306
cluster management 341 troubleshooting 305
CLI 347 CAA 316
Cluster Management Wizard 342 changed PVID of repository disk 322
functionality 343 cluster after node restarts 317
GUI wizard 341 cluster creation 318
cluster monitoring 364 cluster services not active message 323
activities before starting a cluster 364 previously used repository disk 316
cluster subsystem services status 366 previously used volume group 320
Cluster tab 255 removal of volume group 320
common storage 337 repository disk replacement 317
creation volume group already in use 320
custom resource group 351 installation and configuration 312
predefined resource group 353 /var/log/clcomd/clcomd.log file and security keys
GUI wizard, resource group management 355 313
initial panel 252 clstat and cldump utilities and SNMP 312
installation 325, 329 communication path 314
verification 329 ECM volume group 313
monitoring an active cluster 368 log files 306
resource group CAA 306
creation with GUI wizard 349 clutils file 306
management, CLI 359 PowerHA 306
Resource Groups tab 254 syslog facility 306
server installation 329 migration 308
verifying creation of a resource group 355 clmigcheck script 308
wizard for cluster creation 334 cluster still stuck in migration condition 308
non-IP networks 308
rolling migration 191
T verbose logging level 307
TAIL argument 132 TrueCopy synchronous pairings 433
takeover, node down processing normal 41 TrueCopy/HUR
tcpdump utility 220–221 adding replicated resources 451
test environment 68 adding replicated resources to a resource group 452
testing configuration verification 453
application startup with Startup Monitoring configured defining managed replicated resource to PowerHA
298 451
cluster 259 disaster recovery 419
CPU starvation 292 LVM administration of replicated pairs 463
crash in node with active resource group 289 planning for management 420
dynamic node priority 302 resource configuration 429
failover 393 Two-Node Cluster Configuration Assistant 29
fallover after adding new volume group 417, 476 typical configuration 67, 69
fallover after making LVM changes 469 clcomdES versus clcomd subsystem 70
fallover on a cluster after making LVM changes 411 node names 70
Group Services failure 296 prerequisite 69
Hitachi TrueCopy/HUR 454

Index 533
U
uname -L command 287
undo changes 363
undochanges command 363
undoing local changes of a configuration 370
unestablished pairs 447
Universal Replicator asynchronous pairing 439
user-defined resource type 34, 100
UUID 225

V
-v option, clmgr command 110
verbose logging level 307
verification
cluster configuration 360, 416
configuration
CLI 363
GUI 360
Hitachi TrueCopy/HUR configuration 453
verification of cluster configuration 360
VGDA, removal from disk 320
view action 243
virtual Ethernet, network planning 51
volume disk group, previous 180
volume groups 120
adding a Global Mirror pair 404, 411
adding LUN pairs 463, 469
adding new logical volume 416
already in use 320
configuration 381
consideration for installation 64
conversion during installation 64
creating 412
creation with file systems on replicated disks 447
importing in the remote site 383
importing to remote site 416
previously used 320
removal when rmcluster command does not 320
testing fallover after adding 417
Volume Groups option 86
volume, dynamically expanding 404

W
web interface 246
wildcards 110

Z
zone 211

534 IBM PowerHA SystemMirror 7.1 for AIX


IBM PowerHA SystemMirror 7.1 for
AIX
IBM PowerHA SystemMirror 7.1 for AIX
(1.0” spine)
0.875”<->1.498”
460 <-> 788 pages
IBM PowerHA SystemMirror 7.1 for AIX
IBM PowerHA SystemMirror 7.1 for AIX
IBM PowerHA SystemMirror 7.1 for
AIX
IBM PowerHA SystemMirror 7.1 for
AIX
Back cover ®

IBM PowerHA
SystemMirror 7.1
for AIX ®

Learn how to plan for, IBM PowerHA SystemMirror 7.1 for AIX is a major product announcement
for IBM in the high availability space for IBM Power Systems Servers. This INTERNATIONAL
install, and configure
release now has a deeper integration between the IBM high availability TECHNICAL
PowerHA with the
solution and IBM AIX. It features integration with the IBM Systems SUPPORT
Cluster Aware AIX Director, SAP Smart Assist and cache support, the IBM System Storage ORGANIZATION
component DS8000 Global Mirror support, and support for Hitachi Storage.
This IBM Redbooks publication contains information about the IBM
See how to migrate to, PowerHA SystemMirror 7.1 release for AIX. This release includes
monitor, test, and fundamental changes, in particular departures from how the product has
troubleshoot been managed in the past, which has necessitated this Redbooks BUILDING TECHNICAL
PowerHA 7.1 publication. INFORMATION BASED ON
PRACTICAL EXPERIENCE
This Redbooks publication highlights the latest features of PowerHA
Explore the IBM SystemMirror 7.1 and explains how to plan for, install, and configure IBM Redbooks are developed
Systems Director PowerHA with the Cluster Aware AIX component. It also introduces you to by the IBM International
plug-in and disaster PowerHA SystemMirror Smart Assist for DB2. This book guides you Technical Support
through migration scenarios and demonstrates how to monitor, test, and Organization. Experts from
recovery troubleshoot PowerHA 7.1. In addition, it shows how to use IBM Systems IBM, Customers and Partners
Director for PowerHA 7.1 and how to install the IBM Systems Director from around the world create
Server and PowerHA SystemMirror plug-in. Plus, it explains how to timely technical information
perform disaster recovery using IBM DS8700 Global Mirror and Hitachi based on realistic scenarios.
TrueCopy and Universal Replicator. Specific recommendations
are provided to help you
This publication targets all technical professionals (consultants, IT implement IT solutions more
architects, support staff, and IT specialists) who are responsible for effectively in your
delivering and implementing high availability solutions for their enterprise. environment.

For more information:


ibm.com/redbooks

SG24-7845-00 ISBN 0738435120

You might also like