Managing HP Service Guard
Managing HP Service Guard
Managing HP Service Guard
Edition
Confidential computer software. Valid license from HP required for possession, use, or copying. Consistent with FAR 12.211 and
12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are
licensed to the U.S. Government under vendor’s standard commercial license.
The information contained herein is subject to change without notice. The only warranties for HP products and services are set
forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as
constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.
MS-DOS®, Microsoft®, Windows®, Windows NT®, and Windows® XP are U.S. registered trademarks of Microsoft Corporation.
UNIX® is a registered trademark in the United States and other countries, licensed exclusively through The Open Group.
Table of Contents
Printing History ...........................................................................................................................25
Preface.......................................................................................................................................27
1 Serviceguard at a Glance.........................................................................................................29
What is Serviceguard? .......................................................................................................29
Failover..........................................................................................................................31
About Veritas CFS and CVM from Symantec...............................................................32
Using Serviceguard Manager.............................................................................................32
Monitoring Clusters with Serviceguard Manager........................................................33
Administering Clusters with Serviceguard Manager...................................................33
Configuring Clusters with Serviceguard Manager.......................................................33
Starting Serviceguard Manager.....................................................................................33
Using SAM..........................................................................................................................33
What are the Distributed Systems Administration Utilities?.............................................34
A Roadmap for Configuring Clusters and Packages ........................................................34
Table of Contents 3
Redundant Power Supplies ...............................................................................................49
Larger Clusters ...................................................................................................................50
Active/Standby Model ..................................................................................................50
Point to Point Connections to Storage Devices ............................................................51
4 Table of Contents
Deciding When and Where to Run and Halt Failover Packages .......................69
Failover Packages’ Switching Behavior..............................................................70
Failover Policy....................................................................................................72
Automatic Rotating Standby..............................................................................73
Failback Policy....................................................................................................75
Using Older Package Configuration Files.....................................................................78
Using the Event Monitoring Service ............................................................................79
Using the EMS HA Monitors........................................................................................79
How Packages Run.............................................................................................................80
What Makes a Package Run?.........................................................................................80
Before the Control Script Starts.....................................................................................82
During Run Script Execution........................................................................................82
Normal and Abnormal Exits from the Run Script........................................................84
Service Startup with cmrunserv..................................................................................84
While Services are Running..........................................................................................85
When a Service, Subnet, or Monitored Resource Fails, or a Dependency is Not
Met.................................................................................................................................85
When a Package is Halted with a Command................................................................86
During Halt Script Execution........................................................................................86
Normal and Abnormal Exits from the Halt Script........................................................87
Package Control Script Error and Exit Conditions..................................................88
How the Network Manager Works ...................................................................................89
Stationary and Relocatable IP Addresses .....................................................................90
Types of IP Addresses..............................................................................................91
Adding and Deleting Relocatable IP Addresses ..........................................................91
Load Sharing ...........................................................................................................92
Monitoring LAN Interfaces and Detecting Failure: Link Level....................................92
Local Switching .......................................................................................................93
Switching Back to Primary LAN Interfaces after Local Switching..........................96
Remote Switching ...................................................................................................97
Address Resolution Messages after Switching on the Same Subnet.......................98
Monitoring LAN Interfaces and Detecting Failure: IP Level........................................98
Reasons To Use IP Monitoring.................................................................................98
How the IP Monitor Works......................................................................................99
Failure and Recovery Detection Times..................................................................101
Constraints and Limitations...................................................................................101
Reporting Link-Level and IP-Level Failures...............................................................101
Example 1: If Local Switching is Configured.........................................................102
Example 2: If There Is No Local Switching............................................................102
Automatic Port Aggregation.......................................................................................103
VLAN Configurations.................................................................................................104
What is VLAN?......................................................................................................105
Support for HP-UX VLAN.....................................................................................105
Configuration Restrictions.....................................................................................105
Table of Contents 5
Additional Heartbeat Requirements......................................................................106
Volume Managers for Data Storage..................................................................................106
Types of Redundant Storage.......................................................................................106
About Device File Names (Device Special Files).........................................................106
Examples of Mirrored Storage.....................................................................................107
Examples of Storage on Disk Arrays...........................................................................110
Types of Volume Manager...........................................................................................112
HP-UX Logical Volume Manager (LVM)....................................................................112
Veritas Volume Manager (VxVM)...............................................................................112
Propagation of Disk Groups in VxVM...................................................................113
Package Startup Time with VxVM.........................................................................113
Veritas Cluster Volume Manager (CVM)....................................................................113
Cluster Startup Time with CVM............................................................................114
Propagation of Disk Groups with CVM................................................................114
Redundant Heartbeat Subnets...............................................................................114
Comparison of Volume Managers...............................................................................114
Responses to Failures .......................................................................................................116
System Reset When a Node Fails ...............................................................................116
What Happens when a Node Times Out...............................................................117
Example............................................................................................................117
Responses to Hardware Failures.................................................................................118
Responses to Package and Service Failures ................................................................119
Service Restarts ...........................................................................................................119
Network Communication Failure ..............................................................................120
6 Table of Contents
LVM Planning ..................................................................................................................132
Using EMS to Monitor Volume Groups......................................................................132
LVM Worksheet ..........................................................................................................133
CVM and VxVM Planning ...............................................................................................134
CVM and VxVM Worksheet .......................................................................................134
Cluster Configuration Planning .......................................................................................135
About Cluster-wide Device Special Files (cDSFs).......................................................135
Points To Note........................................................................................................136
Where cDSFs Reside...............................................................................................136
Limitations of cDSFs.........................................................................................137
LVM Commands and cDSFs.............................................................................137
About Easy Deployment.............................................................................................137
Advantages of Easy Deployment...........................................................................138
Limitations of Easy Deployment............................................................................138
Heartbeat Subnet and Cluster Re-formation Time .....................................................138
About Hostname Address Families: IPv4-Only, IPv6-Only, and Mixed Mode..........139
What Is IPv4–only Mode?......................................................................................139
What Is IPv6-Only Mode?......................................................................................140
Rules and Restrictions for IPv6-Only Mode.....................................................140
Recommendations for IPv6-Only Mode...........................................................142
What Is Mixed Mode?............................................................................................142
Rules and Restrictions for Mixed Mode...........................................................142
Cluster Configuration Parameters ..............................................................................143
Cluster Configuration: Next Step................................................................................167
Package Configuration Planning .....................................................................................168
Logical Volume and File System Planning .................................................................168
Planning Veritas Cluster Volume Manager (CVM) and Cluster File System (CFS)....170
CVM 4.1 and later without CFS..............................................................................170
CVM 4.1 and later with CFS ..................................................................................170
About the Volume Monitor.........................................................................................172
Using the Volume Monitor.....................................................................................172
Command Syntax.............................................................................................173
Examples...........................................................................................................174
Scope of Monitoring..........................................................................................174
Planning for NFS-mounted File Systems....................................................................174
Planning for Expansion...............................................................................................176
Choosing Switching and Failover Behavior................................................................176
Parameters for Configuring EMS Resources...............................................................178
About Package Dependencies.....................................................................................179
Simple Dependencies.............................................................................................179
Rules for Simple Dependencies........................................................................180
Guidelines for Simple Dependencies................................................................183
Extended Dependencies.........................................................................................184
Rules for Exclusionary Dependencies..............................................................185
Table of Contents 7
Rules for different_node and any_node Dependencies...................................186
What Happens when a Package Fails....................................................................186
For More Information.............................................................................................187
About Package Weights...............................................................................................187
Package Weights and Node Capacities..................................................................188
Configuring Weights and Capacities.....................................................................188
Simple Method.......................................................................................................188
Example 1..........................................................................................................188
Points to Keep in Mind.....................................................................................190
Comprehensive Method.........................................................................................190
Defining Capacities...........................................................................................190
Defining Weights..............................................................................................192
Rules and Guidelines.............................................................................................195
For More Information.............................................................................................195
How Package Weights Interact with Package Priorities and Dependencies.........196
Example 1..........................................................................................................196
Example 2..........................................................................................................196
About External Scripts.................................................................................................197
Using Serviceguard Commands in an External Script..........................................199
Determining Why a Package Has Shut Down.......................................................200
last_halt_failed..................................................................................................200
About Cross-Subnet Failover......................................................................................201
Implications for Application Deployment.............................................................202
Configuring a Package to Fail Over across Subnets: Example..............................202
Configuring node_name...................................................................................203
Configuring monitored_subnet_access............................................................203
Configuring ip_subnet_node............................................................................203
Configuring a Package: Next Steps.............................................................................204
Planning for Changes in Cluster Size...............................................................................204
8 Table of Contents
Using Easy Deployment Commands to Configure the Cluster.............................211
Configuring Root-Level Access...................................................................................216
Allowing Root Access to an Unconfigured Node..................................................216
Ensuring that the Root User on Another Node Is Recognized..............................217
About identd.....................................................................................................218
Configuring Name Resolution....................................................................................218
Safeguarding against Loss of Name Resolution Services......................................220
Ensuring Consistency of Kernel Configuration .........................................................222
Enabling the Network Time Protocol .........................................................................222
Tuning Network and Kernel Parameters....................................................................223
Creating Mirrors of Root Logical Volumes.................................................................224
Choosing Cluster Lock Disks......................................................................................225
Backing Up Cluster Lock Disk Information ..........................................................226
Setting Up a Lock LUN...............................................................................................226
Creating a Disk Partition on an HP Integrity System............................................227
Defining the Lock LUN..........................................................................................228
Excluding Devices from Probing.................................................................................229
Setting Up and Running the Quorum Server..............................................................230
Creating the Storage Infrastructure and Filesystems with LVM, VxVM and CVM....230
Creating a Storage Infrastructure with LVM...............................................................231
Using the EMS Disk Monitor.................................................................................232
Using Mirrored Individual Data Disks..................................................................232
Creating Volume Groups..................................................................................232
Creating Logical Volumes......................................................................................234
Setting Logical Volume Timeouts.....................................................................234
Creating File Systems.............................................................................................235
Distributing Volume Groups to Other Nodes........................................................235
Deactivating the Volume Group.......................................................................236
Distributing the Volume Group........................................................................236
Making Physical Volume Group Files Consistent.................................................238
Creating Additional Volume Groups.....................................................................238
Creating a Storage Infrastructure with VxVM............................................................239
Converting Disks from LVM to VxVM..................................................................239
Initializing Disks for VxVM...................................................................................239
Initializing Disks Previously Used by LVM...........................................................239
Creating Disk Groups............................................................................................240
Creating Volumes...................................................................................................241
Creating File Systems.............................................................................................241
Deporting Disk Groups..........................................................................................241
Re-Importing Disk Groups....................................................................................242
Clearimport at System Reboot Time......................................................................242
Configuring the Cluster ...................................................................................................242
cmquerycl Options......................................................................................................243
Speeding up the Process........................................................................................243
Table of Contents 9
Specifying the Address Family for the Cluster Hostnames...................................244
Specifying the Address Family for the Heartbeat .................................................244
Specifying the Cluster Lock...................................................................................245
Generating a Network Template File.....................................................................245
Full Network Probing............................................................................................246
Specifying a Lock Disk................................................................................................246
Specifying a Lock LUN................................................................................................247
Specifying a Quorum Server.......................................................................................248
Obtaining Cross-Subnet Information..........................................................................248
Identifying Heartbeat Subnets....................................................................................251
Specifying Maximum Number of Configured Packages ...........................................251
Modifying the MEMBER_TIMEOUT Parameter.........................................................251
Controlling Access to the Cluster................................................................................251
A Note about Terminology....................................................................................252
How Access Roles Work........................................................................................252
Levels of Access......................................................................................................253
Setting up Access-Control Policies.........................................................................254
Role Conflicts....................................................................................................257
Package versus Cluster Roles.................................................................................258
Adding Volume Groups..............................................................................................258
Verifying the Cluster Configuration ...........................................................................258
Distributing the Binary Configuration File ................................................................259
Storing Volume Group and Cluster Lock Configuration Data .............................260
Creating a Storage Infrastructure with Veritas Cluster File System (CFS).................261
Preparing the Cluster and the System Multi-node Package..................................261
Creating the Disk Groups......................................................................................263
Creating the Disk Group Cluster Packages............................................................263
Creating Volumes...................................................................................................264
Creating a File System and Mount Point Package.................................................264
Creating Checkpoint and Snapshot Packages for CFS...........................................265
Mount Point Packages for Storage Checkpoints...............................................265
Mount Point Packages for Snapshot Images.....................................................267
Creating the Storage Infrastructure with Veritas Cluster Volume Manager (CVM)...268
Initializing the Veritas Volume Manager ..............................................................269
Preparing the Cluster for Use with CVM ..............................................................269
Identifying the Master Node..................................................................................270
Initializing Disks for CVM.....................................................................................270
Creating Disk Groups............................................................................................271
Mirror Detachment Policies with CVM............................................................271
Creating Volumes ..................................................................................................271
Adding Disk Groups to the Package Configuration .............................................272
Using DSAU during Configuration............................................................................272
Managing the Running Cluster........................................................................................272
Checking Cluster Operation with Serviceguard Manager..........................................272
10 Table of Contents
Checking Cluster Operation with Serviceguard Commands.....................................273
Preventing Automatic Activation of LVM Volume Groups .......................................274
Setting up Autostart Features .....................................................................................274
Changing the System Message ...................................................................................275
Managing a Single-Node Cluster................................................................................275
Single-Node Operation..........................................................................................276
Disabling identd..........................................................................................................276
Deleting the Cluster Configuration ............................................................................277
Table of Contents 11
ip_subnet_node ..................................................................................................299
ip_address...........................................................................................................299
service_name.......................................................................................................299
service_cmd.........................................................................................................300
service_restart.....................................................................................................301
service_fail_fast_enabled......................................................................................301
service_halt_timeout............................................................................................301
resource_name.....................................................................................................301
resource_polling_interval.....................................................................................302
resource_start......................................................................................................302
resource_up_value...............................................................................................302
concurrent_vgchange_operations..........................................................................302
enable_threaded_vgchange...................................................................................303
vgchange_cmd.....................................................................................................303
cvm_activation_cmd............................................................................................304
vxvol_cmd...........................................................................................................304
vg.......................................................................................................................304
cvm_dg...............................................................................................................305
vxvm_dg.............................................................................................................305
vxvm_dg_retry....................................................................................................305
deactivation_retry_count.....................................................................................305
kill_processes_accessing_raw_devices ..................................................................305
File system parameters.....................................................................................306
concurrent_fsck_operations..................................................................................306
concurrent_mount_and_umount_operations.........................................................306
fs_mount_retry_count.........................................................................................307
fs_umount_retry_count ......................................................................................307
fs_name..............................................................................................................307
fs_server.............................................................................................................308
fs_directory.........................................................................................................308
fs_type................................................................................................................308
fs_mount_opt......................................................................................................308
fs_umount_opt....................................................................................................309
fs_fsck_opt..........................................................................................................309
pev_....................................................................................................................309
external_pre_script..............................................................................................309
external_script....................................................................................................309
user_name...........................................................................................................310
user_host............................................................................................................310
user_role.............................................................................................................310
Additional Parameters Used Only by Legacy Packages..................................311
Generating the Package Configuration File......................................................................311
Before You Start...........................................................................................................311
cmmakepkg Examples.................................................................................................312
12 Table of Contents
Next Step.....................................................................................................................313
Editing the Configuration File..........................................................................................313
Verifying and Applying the Package Configuration........................................................317
Adding the Package to the Cluster...................................................................................318
How Control Scripts Manage VxVM Disk Groups..........................................................318
Table of Contents 13
Removing Nodes from Participation in a Running Cluster........................................343
Halting the Entire Cluster ...........................................................................................344
Automatically Restarting the Cluster .........................................................................344
Halting a Node or the Cluster while Keeping Packages Running..............................344
What You Can Do...................................................................................................345
Rules and Restrictions............................................................................................345
Additional Points To Note......................................................................................347
Halting a Node and Detaching its Packages..........................................................348
Halting a Detached Package..................................................................................349
Halting the Cluster and Detaching its Packages....................................................349
Example: Halting the Cluster for Maintenance on the Heartbeat Subnets............349
Managing Packages and Services ....................................................................................350
Starting a Package .......................................................................................................350
Starting a Package that Has Dependencies............................................................351
Using Serviceguard Commands to Start a Package...............................................351
Starting the Special-Purpose CVM and CFS Packages.....................................351
Halting a Package .......................................................................................................351
Halting a Package that Has Dependencies............................................................352
Using Serviceguard Commands to Halt a Package ..............................................352
Moving a Failover Package .........................................................................................352
Using Serviceguard Commands to Move a Running Failover Package................352
Changing Package Switching Behavior ......................................................................353
Changing Package Switching with Serviceguard Commands..............................353
Maintaining a Package: Maintenance Mode...............................................................353
Characteristics of a Package Running in Maintenance Mode or Partial-Startup
Maintenance Mode ................................................................................................354
Rules for a Package in Maintenance Mode or Partial-Startup Maintenance
Mode ................................................................................................................355
Dependency Rules for a Package in Maintenance Mode or Partial-Startup
Maintenance Mode ..........................................................................................356
Performing Maintenance Using Maintenance Mode.............................................356
Procedure..........................................................................................................357
Performing Maintenance Using Partial-Startup Maintenance Mode....................357
Procedure..........................................................................................................357
Excluding Modules in Partial-Startup Maintenance Mode..............................358
Reconfiguring a Cluster....................................................................................................359
Previewing the Effect of Cluster Changes...................................................................361
What You Can Preview..........................................................................................361
Using Preview mode for Commands and in Serviceguard Manager....................361
Using cmeval..........................................................................................................362
Updating the Cluster Lock Configuration..................................................................364
Updating the Cluster Lock Disk Configuration Online.........................................364
Updating the Cluster Lock LUN Configuration Online........................................364
Reconfiguring a Halted Cluster ..................................................................................365
14 Table of Contents
Reconfiguring a Running Cluster................................................................................365
Adding Nodes to the Cluster While the Cluster is Running ................................365
Removing Nodes from the Cluster while the Cluster Is Running ........................366
Changing the Cluster Networking Configuration while the Cluster Is
Running..................................................................................................................367
What You Can Do.............................................................................................367
What You Must Keep in Mind..........................................................................368
Example: Adding a Heartbeat LAN.................................................................369
Example: Deleting a Subnet Used by a Package...............................................371
Removing a LAN or VLAN Interface from a Node.........................................372
Changing the LVM Configuration while the Cluster is Running .........................372
Changing the VxVM or CVM Storage Configuration ...........................................373
Changing MAX_CONFIGURED_PACKAGES.............................................................374
Configuring a Legacy Package.........................................................................................375
Creating the Legacy Package Configuration ..............................................................375
Configuring a Package in Stages............................................................................376
Editing the Package Configuration File.................................................................376
Creating the Package Control Script...........................................................................378
Customizing the Package Control Script ..............................................................379
Adding Customer Defined Functions to the Package Control Script ...................380
Adding Serviceguard Commands in Customer Defined Functions ...............381
Support for Additional Products...........................................................................381
Verifying the Package Configuration..........................................................................382
Distributing the Configuration....................................................................................382
Distributing the Configuration And Control Script with Serviceguard
Manager.................................................................................................................382
Copying Package Control Scripts with HP-UX commands...................................383
Distributing the Binary Cluster Configuration File with HP-UX Commands .....383
Configuring Cross-Subnet Failover.............................................................................383
Configuring node_name........................................................................................384
Configuring monitored_subnet_access..................................................................384
Creating Subnet-Specific Package Control Scripts.................................................385
Control-script entries for nodeA and nodeB....................................................385
Control-script entries for nodeC and nodeD....................................................385
Reconfiguring a Package...................................................................................................385
Migrating a Legacy Package to a Modular Package....................................................386
Reconfiguring a Package on a Running Cluster .........................................................386
Renaming or Replacing an External Script Used by a Running Package..............387
Reconfiguring a Package on a Halted Cluster ............................................................388
Adding a Package to a Running Cluster.....................................................................388
Deleting a Package from a Running Cluster ..............................................................388
Resetting the Service Restart Counter.........................................................................390
Allowable Package States During Reconfiguration ....................................................390
Changes that Will Trigger Warnings......................................................................396
Table of Contents 15
Responding to Cluster Events ..........................................................................................397
Single-Node Operation ....................................................................................................397
Disabling Serviceguard.....................................................................................................398
Removing Serviceguard from a System...........................................................................398
16 Table of Contents
Networking and Security Configuration Errors.........................................................414
Cluster Re-formations Caused by Temporary Conditions..........................................414
Cluster Re-formations Caused by MEMBER_TIMEOUT Being Set too Low.............415
System Administration Errors ....................................................................................416
Package Control Script Hangs or Failures ............................................................416
Problems with Cluster File System (CFS)....................................................................418
Problems with VxVM Disk Groups.............................................................................419
Force Import and Deport After Node Failure........................................................419
Package Movement Errors ..........................................................................................420
Node and Network Failures .......................................................................................420
Troubleshooting the Quorum Server...........................................................................421
Authorization File Problems..................................................................................421
Timeout Problems..................................................................................................421
Messages................................................................................................................421
Table of Contents 17
Bind to Relocatable IP Addresses ...............................................................................433
Call bind() before connect() ...................................................................................434
Give Each Application its Own Volume Group .........................................................434
Use Multiple Destinations for SNA Applications ......................................................435
Avoid File Locking ......................................................................................................435
Using a Relocatable Address as the Source Address for an Application that is Bound to
INADDR_ANY.................................................................................................................435
Restoring Client Connections ..........................................................................................437
Handling Application Failures ........................................................................................439
Create Applications to be Failure Tolerant .................................................................439
Be Able to Monitor Applications ................................................................................439
Minimizing Planned Downtime ......................................................................................440
Reducing Time Needed for Application Upgrades and Patches ...............................440
Provide for Rolling Upgrades ...............................................................................440
Do Not Change the Data Layout Between Releases .............................................441
Providing Online Application Reconfiguration .........................................................441
Documenting Maintenance Operations .....................................................................441
18 Table of Contents
Before You Start...........................................................................................................454
Running the Rolling Upgrade Using DRD..................................................................454
Example of a Rolling Upgrade .........................................................................................455
Step 1. ..........................................................................................................................456
Step 2. ..........................................................................................................................456
Step 3. ..........................................................................................................................457
Step 4. ..........................................................................................................................457
Step 5. ..........................................................................................................................458
Guidelines for Non-Rolling Upgrade...............................................................................459
Migrating Cluster Lock PV Device File Names...........................................................459
Other Considerations..................................................................................................459
Performing a Non-Rolling Upgrade.................................................................................459
Limitations of Non-Rolling Upgrades.........................................................................459
Steps for Non-Rolling Upgrades.................................................................................459
Performing a Non-Rolling Upgrade Using DRD.............................................................460
Limitations of Non-Rolling Upgrades using DRD......................................................460
Steps for a Non-Rolling Upgrade Using DRD............................................................460
Guidelines for Migrating a Cluster with Cold Install.......................................................461
Checklist for Migration................................................................................................461
Table of Contents 19
IPv4 Compatible IPv6 Addresses...........................................................................477
IPv4 Mapped IPv6 Address...................................................................................477
Aggregatable Global Unicast Addresses...............................................................477
Link-Local Addresses.............................................................................................478
Site-Local Addresses..............................................................................................478
Multicast Addresses...............................................................................................478
Network Configuration Restrictions................................................................................479
IPv6 Relocatable Address and Duplicate Address Detection Feature.............................480
Local Primary/Standby LAN Patterns..............................................................................481
Example Configurations...................................................................................................481
Index........................................................................................................................................493
20 Table of Contents
List of Figures
1-1 Typical Cluster Configuration ....................................................................................29
1-2 Typical Cluster After Failover ....................................................................................31
1-3 Tasks in Configuring a Serviceguard Cluster ............................................................35
2-1 Redundant LANs .......................................................................................................40
2-2 Mirrored Disks Connected for High Availability ......................................................47
2-3 Cluster with High Availability Disk Array ................................................................48
2-4 Cluster with Fibre Channel Switched Disk Array......................................................49
2-5 Eight-Node Active/Standby Cluster ..........................................................................51
2-6 Eight-Node Cluster with XP or EMC Disk Array ......................................................52
3-1 Serviceguard Software Components...........................................................................54
3-2 Lock Disk or Lock LUN Operation.............................................................................64
3-3 Quorum Server Operation..........................................................................................66
3-4 Package Moving During Failover...............................................................................69
3-5 Before Package Switching...........................................................................................71
3-6 After Package Switching.............................................................................................72
3-7 Rotating Standby Configuration before Failover........................................................73
3-8 Rotating Standby Configuration after Failover..........................................................74
3-9 CONFIGURED_NODE Policy Packages after Failover...................................................75
3-10 Automatic Failback Configuration before Failover....................................................76
3-11 Automatic Failback Configuration After Failover......................................................77
3-12 Automatic Failback Configuration After Restart of Node 1.......................................78
3-13 Legacy Package Time Line Showing Important Events..............................................81
3-14 Package Time Line (Legacy Package).........................................................................83
3-15 Legacy Package Time Line for Halt Script Execution.................................................87
3-16 Cluster Before Local Network Switching ...................................................................94
3-17 Cluster After Local Network Switching .....................................................................95
3-18 Local Switching After Cable Failure ..........................................................................96
3-19 Aggregated Networking Ports..................................................................................104
3-20 Physical Disks Within Shared Storage Units.............................................................108
3-21 Mirrored Physical Disks............................................................................................109
3-22 Multiple Devices Configured in Volume Groups.....................................................109
3-23 Physical Disks Combined into LUNs........................................................................110
3-24 Multiple Paths to LUNs.............................................................................................111
3-25 Multiple Paths in Volume Groups.............................................................................111
4-1 Sample Cluster Configuration ..................................................................................123
5-1 Access Roles..............................................................................................................253
D-1 Running Cluster Before Rolling Upgrade ................................................................456
D-2 Running Cluster with Packages Moved to Node 2 ..................................................456
D-3 Node 1 Upgraded to new HP-UX version................................................................457
D-4 Node 1 Rejoining the Cluster ...................................................................................457
D-5 Running Cluster with Packages Moved to Node 1 ..................................................458
21
D-6 Running Cluster After Upgrades .............................................................................458
H-1 System Management Homepage with Serviceguard Manager................................488
H-2 Cluster by Type.........................................................................................................489
22 List of Figures
List of Tables
1 Printing History..........................................................................................................25
3-1 Package Configuration Data.......................................................................................73
3-2 Node Lists in Sample Cluster......................................................................................76
3-3 Error Conditions and Package Movement for Failover Packages..............................88
3-4 Pros and Cons of Volume Managers with Serviceguard..........................................115
4-1 SCSI Addressing in Cluster Configuration ..............................................................126
4-2 Package Failover Behavior .......................................................................................177
6-1 Base Modules.............................................................................................................284
6-2 Optional Modules......................................................................................................285
7-1 Verifying Cluster Components..................................................................................338
7-2 Types of Changes to the Cluster Configuration .......................................................359
7-3 Types of Changes to Packages ..................................................................................391
G-1 IPv6 Address Types...................................................................................................475
G-2 ...................................................................................................................................476
G-3 ...................................................................................................................................477
G-4 ...................................................................................................................................477
G-5 ...................................................................................................................................477
G-6 ...................................................................................................................................478
G-7 ...................................................................................................................................478
G-8 ...................................................................................................................................478
I-1 Minimum and Maximum Values of Cluster Configuration Parameters..................491
I-2 Minimum and Maximum Values of Package Configuration Parameters.................491
23
24
Printing History
Table 1 Printing History
Printing Date Part Number Edition
The last printing date and part number indicate the current edition, which applies to
Serviceguard version A.11.20. See the latest edition of the Release Notes for a summary
of changes in that release.
25
26
Preface
This eighteenth edition of the manual applies to Serviceguard Version A.11.20. Earlier
versions are available at www.hp.com/go/hpux-serviceguard-docs —> HP
Serviceguard.
This guide describes how to configure Serviceguard to run on HP 9000 or HP Integrity
servers under the HP-UX operating system. The contents are as follows:
• “Serviceguard at a Glance” (page 29), describes a Serviceguard cluster and provides
a roadmap for using this guide.
• “Understanding Serviceguard Hardware Configurations” (page 37) provides a
general view of the hardware configurations used by Serviceguard.
• “Understanding Serviceguard Software Components” (page 53) describes the
software components of Serviceguard and shows how they function within the
HP-UX operating system.
• “Planning and Documenting an HA Cluster ” (page 121) steps through the planning
process and provides a set of worksheets for organizing information about the
cluster.
• “Building an HA Cluster Configuration” (page 205) describes the creation of the
cluster configuration.
• “Configuring Packages and Their Services ” (page 279) describes the creation of
high availability packages and the control scripts associated with them.
• “Cluster and Package Maintenance” (page 321) presents the basic cluster
administration tasks.
• “Troubleshooting Your Cluster” (page 399) explains cluster testing and
troubleshooting strategies.
• “Enterprise Cluster Master Toolkit ” (page 423) describes a group of tools to simplify
the integration of popular applications with Serviceguard.
• “Designing Highly Available Cluster Applications ” (page 425)gives guidelines for
creating cluster-aware applications that provide optimal performance in a
Serviceguard environment.
• “Integrating HA Applications with Serviceguard” (page 443),” presents suggestions
for integrating your existing applications with Serviceguard.
• “Software Upgrades ” (page 447) shows how to move from one Serviceguard or
HP-UX release to another without bringing down your applications.
• “Blank Planning Worksheets” (page 463),” contains a set of empty worksheets for
preparing a Serviceguard configuration.
• “Migrating from LVM to VxVM Data Storage ” (page 471),” describes how to
convert from LVM data storage to VxVM data storage.
• “IPv6 Network Support” (page 475) describes the IPv6 addressing scheme and the
primary/standby interface configurations supported.
27
• Appendix H (page 485) describes the Serviceguard Manager GUI.
• “Maximum and Minimum Values for Parameters” (page 491) provides a reference
to the supported ranges for Serviceguard parameters.
Related Publications
For information about the current version of Serviceguard, and about older versions,
see the Serviceguard documents posted at www.hp.com/go/
hpux-serviceguard-docs —> HP Serviceguard.
The following documents, which can all be found at www.hp.com/go/
hpux-serviceguard-docs, are particularly useful.
• The latest edition of the HP Serviceguard Version A.11.20 Release Notes
• HP Serviceguard Quorum Server Version A.04.00 Release Notes
• Serviceguard Extension for RAC Version A.11.20 Release Notes
• Using Serviceguard Extension for RAC
• Understanding and Designing Serviceguard Disaster Tolerant Architectures
• Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters
• Enterprise Cluster Master Toolkit Version Release Notes
• Serviceguard/SGeRAC/SMS/Serviceguard Mgr Plug-in Compatibility and Feature Matrix.
• Securing Serviceguard and other Serviceguard white papers.
For HP-UX system administration information, see the documents at www.hp.com/
go/hpux-core-docs.
For information on the Distributed Systems Administration Utilities (DSAU), see the
latest version of the Distributed Systems Administration Utilities Release Notes and the
Distributed Systems Administration Utilities User’s Guide at www.hp.com/go/
hpux-core-docs: go to the HP-UX 11i v3 collection and scroll down toGetting
started.
For information about the Event Monitoring Service, see the following documents at
www.hp.com/go/hpux-ha-monitoring-docs:
• Using the Event Monitoring Service
• Using High Availability Monitors
28
1 Serviceguard at a Glance
This chapter introduces Serviceguard on HP-UX, and shows where to find information
in this book. It covers the following:
• What is Serviceguard?
• Using Serviceguard Manager (page 32)
• A Roadmap for Configuring Clusters and Packages (page 34)
If you are ready to start setting up Serviceguard clusters, skip ahead to Chapter 4:
“Planning and Documenting an HA Cluster ” (page 121). Specific steps for setup are
given in Chapter 5: “Building an HA Cluster Configuration” (page 205).
Figure 1-1 shows a typical Serviceguard cluster with two nodes.
What is Serviceguard?
Serviceguard allows you to create high availability clusters of HP 9000 or HP Integrity
servers (or a mixture of both; see the release notes for your version for details and
restrictions).
A high availability computer system allows application services to continue in spite
of a hardware or software failure. Highly available systems protect users from software
failures as well as from failure of a system processing unit (SPU), disk, or local area
What is Serviceguard? 29
network (LAN) component. In the event that one component fails, the redundant
component takes over. Serviceguard and other high availability subsystems coordinate
the transfer between components.
A Serviceguard cluster is a networked grouping of HP 9000 or HP Integrity servers
(or both), known as nodes, having sufficient redundancy of software and hardware
that a single point of failure will not significantly disrupt service.
A package groups application services (individual HP-UX processes) together. There
are failover packages, system multi-node packages, and multi-node packages:
• The typical high availability package is a failover package. It usually is configured
to run on several nodes in the cluster, and runs on one at a time. If a service, node,
network, or other package resource fails on the node where it is running,
Serviceguard can automatically transfer control of the package to another cluster
node, allowing services to remain available with minimal interruption.
• There are also packages that run on several cluster nodes at once, and do not fail
over. These are called system multi-node packages and multi-node packages.
Examples are the packages HP supplies for use with the Veritas Cluster Volume
Manager and Veritas Cluster File System from Symantec (on HP-UX releases that
support them; see “About Veritas CFS and CVM from Symantec” (page 32)).
A system multi-node package must run on all nodes that are active in the cluster.
If it fails on one active node, that node halts. System multi-node packages are
supported only for HP-supplied applications.
A multi-node package can be configured to run on one or more cluster nodes. It
is considered UP as long as it is running on any of its configured nodes.
In Figure 1-1, node 1 (one of two SPU's) is running failover package A, and node 2 is
running package B. Each package has a separate group of disks associated with it,
containing data needed by the package's applications, and a mirror copy of the data.
Note that both nodes are physically connected to both groups of mirrored disks. In this
example, however, only one node at a time may access the data for a given group of
disks. In the figure, node 1 is shown with exclusive access to the top two disks (solid
line), and node 2 is shown as connected without access to the top disks (dotted line).
Similarly, node 2 is shown with exclusive access to the bottom two disks (solid line),
and node 1 is shown as connected without access to the bottom disks (dotted line).
Mirror copies of data provide redundancy in case of disk failures. In addition, a total
of four data buses are shown for the disks that are connected to node 1 and node 2.
This configuration provides the maximum redundancy and also gives optimal I/O
performance, since each package is using different buses.
Note that the network hardware is cabled to provide redundant LAN interfaces on
each node. Serviceguard uses TCP/IP network services for reliable communication
among nodes in the cluster, including the transmission of heartbeat messages , signals
from each functioning node which are central to the operation of the cluster. TCP/IP
30 Serviceguard at a Glance
services also are used for other types of inter-node communication. (The heartbeat is
explained in more detail in the chapter “Understanding Serviceguard Software.”)
Failover
Any host system running in a Serviceguard cluster is called an active node. Under
normal conditions, a fully operating Serviceguard cluster monitors the health of the
cluster's components on all its active nodes.
Most Serviceguard packages are failover packages. When you configure a failover
package, you specify which active node will be the primary node where the package
will start, and one or more other nodes, called adoptive nodes, that can also run the
package.
Figure 1-2 shows what happens in a failover situation.
After this transfer, the failover package typically remains on the adoptive node as long
the adoptive node continues running. If you wish, however, you can configure the
package to return to its primary node as soon as the primary node comes back online.
Alternatively, you may manually transfer control of the package back to the primary
node at the appropriate time.
Figure 1-2 does not show the power connections to the cluster, but these are important
as well. In order to remove all single points of failure from the cluster, you should
What is Serviceguard? 31
provide as many separate power circuits as needed to prevent a single point of failure
of your nodes, disks and disk mirrors. Each power circuit should be protected by an
uninterruptible power source. For more details, refer to the section on “Power Supply
Planning” in Chapter 4, “Planning and Documenting an HA Cluster.”
Serviceguard is designed to work in conjunction with other high availability products,
such as:
• Mirrordisk/UX or Veritas Volume Manager, which provide disk redundancy to
eliminate single points of failure in the disk subsystem;
• Event Monitoring Service (EMS), which lets you monitor and detect failures that
are not directly handled by Serviceguard;
• disk arrays, which use various RAID levels for data protection;
• HP-supported uninterruptible power supplies (UPS), such as HP PowerTrust,
which eliminates failures related to power outage.
HP recommends these products; in conjunction with Serviceguard they provide the
highest degree of availability.
NOTE: For more-detailed information, see Appendix H (page 485) and the section on
Serviceguard Manager in the latest version of the Serviceguard Release Notes. Check
the Serviceguard/SGeRAC/SMS/Serviceguard Manager Plug-in Compatibility and Feature
Matrix and the latest Release Notes for up-to-date information about Serviceguard
Manager compatibility. You can find both documents at www.hp.com/go/
hpux-serviceguard-docs —> HP Serviceguard.
Serviceguard Manager is the graphical user interface for Serviceguard. It is available
as a “plug-in” to the System Management Homepage (SMH). SMH is a web-based
graphical user interface (GUI) that replaces SAM as the system administration GUI as
of HP-UX 11i v3 (but you can still run the SAM terminal interface; see “Using SAM”
(page 33)).
32 Serviceguard at a Glance
You can use Serviceguard Manager to monitor, administer, and configure Serviceguard
clusters.
• You can see properties, status, and alerts of clusters, nodes, and packages.
• You can do administrative tasks such as run or halt clusters, cluster nodes, and
packages.
• Yyou can create or modify a cluster and its packages.
Using SAM
You can use SAM, the System Administration Manager, to do many of the HP-UX
system administration tasks described in this manual (that is, tasks, such as configuring
disks and filesystems, that are not specifically Serviceguard tasks).
To launch SAM, enter
/usr/sbin/sam
Using SAM 33
on the command line. As of HP-UX 11i v3, SAM offers a Terminal User Interface (TUI)
which also acts as a gateway to the web-based System Management Homepage (SMH).
• To get to the SMH for any task area, highlight the task area in the SAM TUI and
press w.
• To go directly to the SMH from the command line, enter
/usr/sbin/sam -w
For more information, see the HP-UX Systems Administrator’s Guide, at the address
given in the preface to this manual.
34 Serviceguard at a Glance
Figure 1-3 Tasks in Configuring a Serviceguard Cluster
The tasks in Figure 1-3 are covered in step-by-step detail in chapters 4 through 7. HP
recommends you gather all the data that is needed for configuration before you start.
See “Planning and Documenting an HA Cluster ” (page 121) for tips on gathering data.
NOTE: If you will be using a cross-subnet configuration, see also the Restrictions
(page 42) that apply specifically to such configurations.
NOTE: You should verify that network traffic is not too heavy on the heartbeat/data
LAN. If traffic is too heavy, this LAN might not perform adequately in transmitting
heartbeats if the dedicated heartbeat LAN fails.
Cross-Subnet Configurations
As of Serviceguard A.11.18 it is possible to configure multiple subnets, joined by a
router, both for the cluster heartbeat and for data, with some nodes using one subnet
and some another.
A cross-subnet configuration allows:
• Automatic package failover from a node on one subnet to a node on another
• A cluster heartbeat that spans subnets.
Configuration Tasks
Cluster and package configuration tasks are affected as follows:
• You must use the -w full option to cmquerycl to discover actual or potential
nodes and subnets across routers.
• You must configure two new parameters in the package configuration file to allow
packages to fail over across subnets:
— ip_subnet_node - to indicate which nodes the subnet is configured on
— monitored_subnet_access - to indicate whether the subnet is configured on all
nodes (FULL) or only some (PARTIAL)
(For legacy packages, see “Configuring Cross-Subnet Failover” (page 383).)
Restrictions
The following restrictions apply:
• All nodes in the cluster must belong to the same network domain (that is, the
domain portion of the fully-qualified domain name must be the same).
• The nodes must be fully connected at the IP level.
• A minimum of two heartbeat paths must be configured for each cluster node.
• There must be less than 200 milliseconds of latency in the heartbeat network.
• Each heartbeat subnet on each node must be physically routed separately to the
heartbeat subnet on another node; that is, each heartbeat path must be physically
separate:
— The heartbeats must be statically routed; static route entries must be configured
on each node to route the heartbeats through different paths.
— Failure of a single router must not affect both heartbeats at the same time.
• IPv6 heartbeat subnets are not supported in a cross-subnet configuration.
• IPv6–only and mixed modes are not supported in a cross-subnet configuration.
For more information about these modes, see “About Hostname Address Families:
IPv4-Only, IPv6-Only, and Mixed Mode” (page 139).
• Because Veritas Cluster File System from Symantec (CFS) requires link-level traffic
communication (LLT) among the nodes, Serviceguard cannot be configured in
cross-subnet configurations with CFS alone.
But CFS is supported in specific cross-subnet configurations with Serviceguard
and HP add-on products; see the documentation listed below.
• Each package subnet must be configured with a standby interface on the local
bridged net. The standby interface can be shared between subnets.
• You must not set the HP-UX network parameter ip_strong_es_model in a cross-subnet
configuration. Leave it set to the default (0, meaning disabled); Serviceguard does
not support enabling it for cross-subnet configurations. For more information
about this parameter, see “Tuning Network and Kernel Parameters” (page 223)
and “Using a Relocatable Address as the Source Address for an Application that
is Bound to INADDR_ANY” (page 435).
• Deploying applications in this environment requires careful consideration; see
“Implications for Application Deployment” (page 202).
NOTE: See also the Rules and Restrictions (page 39) that apply to all cluster
networking configurations.
NOTE: In a cluster that contains systems with PCI SCSI adapters, you cannot attach
both PCI and NIO SCSI adapters to the same shared SCSI bus.
External shared Fast/Wide SCSI buses must be equipped with in-line terminators for
disks on a shared bus. Refer to the “Troubleshooting” chapter for additional information.
When planning and assigning SCSI bus priority, remember that one node can dominate
a bus shared by multiple nodes, depending on what SCSI addresses are assigned to
the controller for each node on the shared bus. All SCSI addresses, including the
addresses of all interface cards, must be unique for all devices on a shared bus.
Data Protection
It is required that you provide data protection for your highly available system, using
one of two methods:
• Disk Mirroring
• Disk Arrays using RAID Levels and Multiple Data Paths
About Multipathing
Multipathing is automatically configured in HP-UX 11i v3 (this is often called native
multipathing), or in some cases can be configured with third-party software such as
EMC Powerpath.
Figure 2-3 below shows a similar cluster with a disk array connected to each node on
two I/O channels. See “About Multipathing” (page 45).
Details on logical volume configuration for Serviceguard are in the chapter “Building
an HA Cluster Configuration.”
This type of configuration uses native HP-UX or other multipathing software; see
“About Multipathing” (page 45).
Larger Clusters
You can create clusters of up to 16 nodes with Serviceguard. Clusters of up to 16 nodes
may be built by connecting individual SPUs via Ethernet.
The possibility of configuring a cluster consisting of 16 nodes does not mean that all
types of cluster configuration behave in the same way in a 16-node configuration. For
example, in the case of shared SCSI buses, the practical limit on the number of nodes
that can be attached to the same shared bus is four, because of bus loading and limits
on cable length. Even in this case, 16 nodes could be set up as an administrative unit,
and sub-groupings of four could be set up on different SCSI buses which are attached
to different mass storage devices.
In the case of non-shared SCSI connections to an XP series or EMC disk array, the
four-node limit does not apply. Each node can be connected directly to the XP or EMC
by means of two SCSI buses. Packages can be configured to fail over among all sixteen
nodes. For more about this type of configuration, see “Point to Point Connections to
Storage Devices ” (page 51).
NOTE: When configuring larger clusters, be aware that cluster and package
configuration times as well as execution times for commands such as cmviewcl will
be extended. In the man pages for some commands, you can find options to help to
reduce the time. For example, refer to the man page for cmquerycl (1m) for options
that can reduce the amount of time needed for probing disks or networks.
Active/Standby Model
You can also create clusters in which there is a standby node. For example, an eight
node configuration in which one node acts as the standby for the other seven could
easily be set up by equipping the backup node with seven shared buses allowing
separate connections to each of the active nodes. This configuration is shown in
Figure 2-5.
Larger Clusters 51
Figure 2-6 Eight-Node Cluster with XP or EMC Disk Array
Fibre Channel switched configurations also are supported using either an arbitrated
loop or fabric login topology. For additional information about supported cluster
configurations, refer to the HP Unix Servers Configuration Guide, available through your
HP representative.
Serviceguard Architecture
The following figure shows the main software components used by Serviceguard. This
chapter discusses these components in some detail.
Serviceguard Architecture 53
NOTE: Veritas CFS may not yet be supported on the version of HP-UX you are
running; see “About Veritas CFS and CVM from Symantec” (page 32).
Packages Apps/Services/Resources
Network Manager
Serviceguard Daemons
Serviceguard uses the following daemons:
• /usr/lbin/cmclconfd—Serviceguard Configuration Daemon
• /usr/lbin/cmcld—Serviceguard Cluster Daemon
• /usr/lbin/cmfileassistd—Serviceguard File Management daemon
• /usr/lbin/cmlogd—Serviceguard Syslog Log Daemon
• /usr/lbin/cmlvmd—Cluster Logical Volume Manager Daemon
• /opt/cmom/lbin/cmomd—Cluster Object Manager Daemon
• /usr/lbin/cmsnmpd—Cluster SNMP subagent (optionally running)
• /usr/lbin/cmserviced—Serviceguard Service Assistant Daemon
• /usr/lbin/qs—Serviceguard Quorum Server Daemon
• /usr/lbin/cmnetd—Serviceguard Network Manager daemon.
• /usr/lbin/cmvxd—Serviceguard-to-Veritas Membership Coordination daemon.
(Only present if Veritas CFS is installed.)
Serviceguard Architecture 55
NOTE: Two of the central components of Serviceguard—Package Manager, and
Cluster Manager—run as parts of the cmcld daemon. This daemon runs at priority 20
on all cluster nodes. It is important that user processes should have a priority lower
than 20, otherwise they may prevent Serviceguard from updating the kernel safety
timer, causing a system reset.
Serviceguard Architecture 57
Lock LUN Daemon: cmdisklockd
If a lock LUN is being used, cmdisklockd runs on each node in the cluster and is
started by cmcld when the node joins the cluster.
CFS Components
The HP Serviceguard Storage Management Suite offers additional components for
interfacing with the Veritas Cluster File System on some current versions of HP-UX
(see “About Veritas CFS and CVM from Symantec” (page 32)). Documents for the
management suite are posted on www.hp.com/go/hpux-serviceguard-docs.
Veritas CFS components operate directly over Ethernet networks that connect the nodes
within a cluster. Redundant networks are required to avoid single points of failure.
The Veritas CFS components are:
• GAB (Group Membership Services/Atomic Broadcast) — When Veritas Cluster
Volume Manager (CVM) 4.1 or later, or Veritas Cluster File System (CFS), is
NOTE: You can no longer run the heartbeat on a serial (RS232) line or an FDDI or
Token Ring network.
Each node sends its heartbeat message at a rate calculated by Serviceguard on the basis
of the value of the MEMBER_TIMEOUT parameter, set in the cluster configuration
file, which you create as a part of cluster configuration.
Cluster Lock
Although a cluster quorum of more than 50% is generally required, exactly 50% of the
previously running nodes may re-form as a new cluster provided that the other 50% of
the previously running nodes do not also re-form. This is guaranteed by the use of a
tie-breaker to choose between the two equal-sized node groups, allowing one group
to form the cluster and forcing the other group to shut down. This tie-breaker is known
as a cluster lock. The cluster lock is implemented by means of a lock disk, lock LUN,
or a Quorum Server.
The cluster lock is used as a tie-breaker only for situations in which a running cluster
fails and, as Serviceguard attempts to form a new cluster, the cluster is split into two
sub-clusters of equal size. Each sub-cluster will attempt to acquire the cluster lock. The
sub-cluster which gets the cluster lock will form the new cluster, preventing the
possibility of two sub-clusters running at the same time. If the two sub-clusters are of
unequal size, the sub-cluster with greater than 50% of the nodes will form the new
cluster, and the cluster lock is not used.
Lock Requirements
A one-node cluster does not require a cluster lock. A two-node cluster requires a cluster
lock. In clusters larger than three nodes, a cluster lock is strongly recommended. If you
have a cluster with more than four nodes, use a Quorum Server; a cluster lock disk is
not allowed for clusters of that size.
Serviceguard periodically checks the health of the lock disk or LUN and writes messages
to the syslog file if the device fails the health check. This file should be monitored for
early detection of lock disk problems.
If you are using a lock disk, you can choose between two lock disk options—a single
or dual lock disk—based on the kind of high availability configuration you are building.
A single lock disk is recommended where possible. With both single and dual locks, however,
it is important that the cluster lock be available even if the power circuit to one node
fails; thus, the choice of a lock configuration depends partly on the number of power
circuits available. Regardless of your choice, all nodes in the cluster must have access
to the cluster lock to maintain high availability.
IMPORTANT: A dual lock cannot be implemented on LUNs. This means that the lock
LUN mechanism cannot be used in an Extended Distance cluster.
NOTE: You must use Fibre Channel connections for a dual cluster lock; you can no
longer implement it in a parallel SCSI configuration.
For a dual cluster lock, the disks must not share either a power circuit or a node chassis
with one another. In this case, if there is a power failure affecting one node and disk,
the other node and disk remain available, so cluster re-formation can take place on the
remaining node. For a campus cluster, there should be one lock disk in each of the data
centers, and all nodes must have access to both lock disks. In the event of a failure of
one of the data centers, the nodes in the remaining data center will be able to acquire
their local lock disk, allowing them to successfully reform a new cluster.
NOTE: A dual lock disk does not provide a redundant cluster lock. In fact, the dual lock is
a compound lock. This means that two disks must be available at cluster formation time
rather than the one that is needed for a single lock disk. Thus, the only recommended
usage of the dual cluster lock is when the single cluster lock cannot be isolated at the
time of a failure from exactly one half of the cluster nodes.
If one of the dual lock disks fails, Serviceguard will detect this when it carries out
periodic checking, and it will write a message to the syslog file. After the loss of one
of the lock disks, the failure of a cluster node could cause the cluster to go down if the
remaining node(s) cannot access the surviving cluster lock disk.
The Quorum Server runs on a separate system, and can provide quorum services for
multiple clusters.
IMPORTANT: For more information about the Quorum Server, see the latest version
of the HP Serviceguard Quorum Server release notes at www.hp.com/go/
hpux-serviceguard-docs -> HP Serviceguard Quorum Server Software.
No Cluster Lock
Normally, you should not configure a cluster of three or fewer nodes without a cluster
lock. In two-node clusters, a cluster lock is required. You may consider using no cluster
lock with configurations of three or more nodes, although the decision should be
affected by the fact that any cluster may require tie-breaking. For example, if one node
in a three-node cluster is removed for maintenance, the cluster re-forms as a two-node
cluster. If a tie-breaking scenario later occurs due to a node or communication failure,
the entire cluster will become unavailable.
In a cluster with four or more nodes, you may not need a cluster lock since the chance
of the cluster being split into two halves of equal size is very small. However, be sure
to configure your cluster to prevent the failure of exactly half the nodes at one time.
For example, make sure there is no potential single point of failure such as a single
LAN between equal numbers of nodes, and that you don’t have exactly half of the
nodes on a single power circuit.
NOTE: If you are using the Veritas Cluster Volume Manager (CVM) you cannot
change the quorum configuration while SG-CFS-pkg is running. For more information
about CVM, see “CVM and VxVM Planning ” (page 134).
When you make quorum configuration changes, Serviceguard goes through a two-step
process:
1. All nodes switch to a strict majority quorum (turning off any existing quorum
devices).
2. All nodes switch to the newly configured quorum method, device and parameters.
IMPORTANT: During Step 1, while the nodes are using a strict majority quorum, node
failures can cause the cluster to go down unexpectedly if the cluster has been using a
quorum device before the configuration change. For example, suppose you change the
Quorum Server polling interval while a two-node cluster is running. If a node fails
during Step 1, the cluster will lose quorum and go down, because a strict majority of
prior cluster members (two out of two in this case) is required. The duration of Step 1
is typically around a second, so the chance of a node failure occurring during that time
is very small.
In order to keep the time interval as short as possible, make sure you are changing only
the quorum configuration, and nothing else, when you apply the change.
If this slight risk of a node failure leading to cluster failure is unacceptable, halt the
cluster before you make the quorum configuration change.
Non-failover Packages
There are two types of special-purpose packages that do not fail over and that can run
on more than one node at the same time: the system multi-node package, which runs
on all nodes in the cluster, and the multi-node package, which can be configured to
run on all or some of the nodes in the cluster. System multi-node packages are reserved
for use by HP-supplied applications, such as Veritas Cluster Volume Manager (CVM)
and Cluster File System (CFS).
The rest of this section describes failover packages.
Failover Packages
A failover package starts up on an appropriate node (see node_name on (page 289)) when
the cluster starts. A package failover takes place when the package coordinator initiates
the start of a package on a new node. A package failover involves both halting the
existing package (in the case of a service, network, or resource failure), and starting
the new instance of the package.
Failover is shown in the following figure:
NOTE: It is possible to configure a cluster that spans subnets joined by a router, with
some nodes using one subnet and some another. This is known as a cross-subnet
configuration. In this context, you can configure packages to fail over from a node on
one subnet to a node on another, and you will need to configure a relocatable IP address
for each subnet the package is configured to start on; see “About Cross-Subnet Failover”
(page 201), and in particular the subsection “Implications for Application Deployment”
(page 202).
When a package fails over, TCP connections are lost. TCP applications must reconnect
to regain connectivity; this is not handled automatically. Note that if the package is
dependent on multiple subnets, normally all of them must be available on the target
node before the package will be started. (In a cross-subnet configuration, all the
monitored subnets that are specified for this package, and configured on the target
node, must be up.)
If the package has a dependency on a resource or another package, the dependency
must be met on the target node before the package can start.
The switching of relocatable IP addresses on a single subnet is shown in Figure 3-5 and
Figure 3-6. Figure 3-5 shows a two node cluster in its original state with Package 1
running on Node 1 and Package 2 running on Node 2. Users connect to the node with
the IP address of the package they wish to use. Each node has a stationary IP address
associated with it, and each package has an IP address associated with it.
Figure 3-6 shows the condition where Node 1 has failed and Package 1 has been
transferred to Node 2 on the same subnet. Package 1’s IP address was transferred to
Node 2 along with the package. Package 1 continues to be available and is now running
on Node 2. Also note that Node 2 can now access both Package1’s disk and Package2’s
disk.
Failover Policy
The Package Manager selects a node for a failover package to run on based on the
priority list included in the package configuration file together with the failover_policy
parameter, also in the configuration file. The failover policy governs how the package
manager selects which node to run a package on when a specific node has not been
identified and the package needs to be started. This applies not only to failovers but
also to startup for the package, including the initial startup. The failover policies are
configured_node (the default), min_package_node, site_preferred and
site_preferred_manual. The parameter is set in the package configuration file.
See “failover_policy” (page 292) for more information.
Package placement is also affected by package dependencies and weights, if you choose
to use them. See “About Package Dependencies” (page 179) and “About Package
Weights” (page 187).
When the cluster starts, each package starts as shown in Figure 3-7.
If a failure occurs, any package would fail over to the node containing fewest running
packages, as in Figure 3-8, which shows a failure on node 2:
NOTE: Using the min_package_node policy, when node 2 is repaired and brought
back into the cluster, it will then be running the fewest packages, and thus will become
the new standby node.
If these packages had been set up using the configured_node failover policy, they
would start initially as in Figure 3-7, but the failure of node 2 would cause the package
to start on node 3, as in Figure 3-9:
If you use configured_node as the failover policy, the package will start up on the
highest priority node in the node list, assuming that the node is running as a member
of the cluster. When a failover occurs, the package will move to the next highest priority
node in the list that is available.
Failback Policy
The use of the failback_policy parameter allows you to decide whether a package will
return to its primary node if the primary node becomes available and the package is
not currently running on the primary node. The configured primary node is the first
node listed in the package’s node list.
The two possible values for this policy are automatic and manual. The parameter is
set in the package configuration file:
As an example, consider the following four-node configuration, in which failover_policy
is set to configured_node and failback_policy is automatic:
node1 panics, and after the cluster reforms, pkgA starts running on node4:
After rebooting, node 1 rejoins the cluster. At that point, pkgA will be automatically
stopped on node 4 and restarted on node 1.
NOTE: Setting the failback_policy to automatic can result in a package failback and
application outage during a critical production period. If you are using automatic
failback, you may want to wait to add the package’s primary node back into the cluster
until you can allow the package to be taken out of service temporarily while it switches
back to the primary node.
NOTE: If you configure the package while the cluster is running, the package does
not start up immediately after the cmapplyconf command completes. To start the
package without halting and restarting the cluster, issue the cmrunpkg or cmmodpkg
command.
How does a failover package start up, and what is its behavior while it is running?
Some of the many phases of package life are shown in Figure 3-13.
NOTE: This diagram applies specifically to legacy packages. Differences for modular
scripts are called out below.
At any step along the way, an error will result in the script exiting abnormally (with
an exit code of 1). For example, if a package service is unable to be started, the control
script will exit with an error.
NOTE: This diagram is specific to legacy packages. Modular packages also run external
scripts and “pre-scripts” as explained above.
If the run script execution is not complete before the time specified in the
run_script_timeout, the package manager will kill the script. During run script execution,
messages are written to a log file. For legacy packages, this is in the same directory as
the run script and has the same name as the run script and the extension.log. For
modular packages, the pathname is determined by the script_log_file parameter in the
package configuration file (see (page 292)). Normal starts are recorded in the log, together
with error messages or warnings related to starting the package.
NOTE: If you set <n> restarts and also set service_fail_fast_enabled to yes, the failfast
will take place after <n> restart attempts have failed. It does not make sense to set
service_restart to “-R” for a service and also set service_fail_fast_enabled to yes.
NOTE: If you use cmhaltpkg command with the-n <nodename> option, the package
is halted only if it is running on that node.
Thecmmodpkg command cannot be used to halt a package, but it can disable switching
either on particular nodes or on all nodes. A package can continue running when its
switching has been disabled, but it will not be able to start on other nodes if it stops
running on its current node.
At any step along the way, an error will result in the script exiting abnormally (with
an exit code of 1). Also, if the halt script execution is not complete before the time
specified in the HALT_SCRIPT_TIMEOUT, the package manager will kill the script.
During halt script execution, messages are written to a log file. For legacy packages,
this is in the same directory as the run script and has the same name as the run script
and the extension.log. For modular packages, the pathname is determined by the
script_log_file parameter in the package configuration file(page 292). Normal starts are
recorded in the log, together with error messages or warnings related to halting the
package.
NOTE: This diagram applies specifically to legacy packages. Differences for modular
scripts are called out above.
Error or Exit Node Service HP-UX Halt script Package Allowed Package
Code Failfast Failfast Status on runs after to Run on Primary Allowed to
Enabled Enabled Primary Error or Node after Error Run on
after Error Exit Alternate
Node
Error or Exit Node Service HP-UX Halt script Package Allowed Package
Code Failfast Failfast Status on runs after to Run on Primary Allowed to
Enabled Enabled Primary Error or Node after Error Run on
after Error Exit Alternate
Node
Halt Script YES Either system N/A N/A (system Yes, unless
Timeout Setting reset reset) the timeout
happened
after the
cmhaltpkg
command
was executed.
NOTE: Serviceguard monitors the health of the network interfaces (NICs) and can
monitor the IP level (layer 3) network.
IMPORTANT: Any subnet that is used by a package for relocatable addresses should
be configured into the cluster via NETWORK_INTERFACE and either STATIONARY_IP
or HEARTBEAT_IP in the cluster configuration file. For more information about those
parameters, see “Cluster Configuration Parameters ” (page 143). For more information
about configuring relocatable addresses, see the descriptions of the package ip_
parameters (page 298).
NOTE: It is possible to configure a cluster that spans subnets joined by a router, with
some nodes using one subnet and some another. This is called a cross-subnet
configuration. In this context, you can configure packages to fail over from a node on
one subnet to a node on another, and you will need to configure a relocatable address
for each subnet the package is configured to start on; see “About Cross-Subnet Failover”
(page 201), and in particular the subsection “Implications for Application Deployment”
(page 202).
Types of IP Addresses
Both IPv4 and IPv6 address types are supported in Serviceguard. IPv4 addresses are
the traditional addresses of the form n.n.n.n where n is a decimal digit between 0
and 255. IPv6 addresses have the form x:x:x:x:x:x:x:x where x is the hexadecimal
value of each of eight 16-bit pieces of the 128-bit address. You can define heartbeat IPs,
stationary IPs, and relocatable (package) IPs as IPv4 or IPv6 addresses (or certain
combinations of both).
Load Sharing
Serviceguard allows you to configure several services into a single package, sharing a
single IP address; in that case all those services will fail over when the package does.
If you want to be able to load-balance services (that is, move a specific service to a less
loaded system when necessary) you can do so by putting each service in its own package
and giving it a unique IP address.
NOTE: For a full discussion, see the white paper Serviceguard Network Manager: Inbound
Failure Detection Enhancement at www.hp.com/go/hpux-serviceguard-docs.
• INOUT: When both the inbound and outbound counts stop incrementing for a
certain amount of time, Serviceguard will declare the card as bad. (Serviceguard
Local Switching
A local network switch involves the detection of a local network interface failure and
a failover to the local backup LAN card (also known as the standby LAN card). The
backup LAN card must not have any IP addresses configured.
In the case of local network switch, TCP/IP connections are not lost for Ethernet, but
IEEE 802.3 connections will be lost. For IPv4, Ethernet uses the ARP protocol, and
HP-UX sends out an unsolicited ARP to notify remote systems of address mapping
between MAC (link level) addresses and IP level addresses. IEEE 802.3 does not have
the rearp function.
IPv6 uses the Neighbor Discovery Protocol (NDP) instead of ARP. The NDP protocol
is used by hosts and routers to do the following:
• determine the link-layer addresses of neighbors on the same link, and quickly
purge cached values that become invalid.
• find neighboring routers willing to forward packets on their behalf.
• actively keep track of which neighbors are reachable, and which are not, and detect
changed link-layer addresses.
• search for alternate functioning routers when the path to a router fails.
Within the Ethernet family, local switching is supported in the following configurations:
• 1000Base-SX and 1000Base-T
• 1000Base-T or 1000BaseSX and 100Base-T
On HP-UX 11i, however, Jumbo Frames can only be used when the 1000Base-T or
1000Base-SX cards are configured. The 100Base-T and 10Base-T cards do not support
Node 1 and Node 2 are communicating over LAN segment 2. LAN segment 1 is a
standby.
In Figure 3-17, we see what would happen if the LAN segment 2 network interface
card on Node 1 were to fail.
As the standby interface takes over, IP addresses will be switched to the hardware path
associated with the standby interface. The switch is transparent at the TCP/IP level.
All applications continue to run on their original nodes. During this time, IP traffic on
Node 1 will be delayed as the transfer occurs. However, the TCP/IP connections will
continue to be maintained and applications will continue to run. Control of the packages
on Node 1 is not affected.
Local network switching will work with a cluster containing one or more nodes. You
may wish to design a single-node cluster in order to take advantage of this local network
switching feature in situations where you need only one node and do not wish to set
up a more complex cluster.
Remote Switching
A remote switch (that is, a package switch) involves moving packages to a new system.
In the most common configuration, in which all nodes are on the same subnet(s), the
package IP (relocatable IP; see “Stationary and Relocatable IP Addresses ” (page 90))
moves as well, and the new system must already have the subnet configured and
working properly, otherwise the packages will not be started.
NOTE: It is possible to configure a cluster that spans subnets joined by a router, with
some nodes using one subnet and some another. This is called a cross-subnet
configuration. In this context, you can configure packages to fail over from a node on
one subnet to a node on another, and you will need to configure a relocatable address
for each subnet the package is configured to start on; see “About Cross-Subnet Failover”
(page 201), and in particular the subsection “Implications for Application Deployment”
(page 202).
When a remote switch occurs, TCP connections are lost. TCP applications must reconnect
to regain connectivity; this is not handled automatically. Note that if the package is
dependent on multiple subnets (specified as monitored_subnets in the package
configuration file), all those subnets must normally be available on the target node
before the package will be started. (In a cross-subnet configuration, all subnets
— Errors that prevent packets from being received but do not affect the link-level
health of an interface
HP recommends that you configure target polling if the subnet is not private to the
cluster.
The IP Monitor section of the cmquerycl output looks similar to this:
…
Route Connectivity (no probing was performed):
1 16.89.143.192
16.89.120.0
IPv4:
IPv6:
…
The IP Monitor section of the cluster configuration file will look similar to the following
for a subnet on which IP monitoring is configured with target polling.
NOTE: This is the default if cmquerycl detects a gateway for the subnet in question;
see SUBNET under “Cluster Configuration Parameters ” (page 143) for more information.
IMPORTANT: By default, cmquerycl does not verify that the gateways it detects will
work correctly for monitoring. But if you use the -w full option, cmquerycl will
validate them as polling targets.
SUBNET 192.168.1.0
IP_MONITOR ON
POLLING_TARGET 192.168.1.254
To configure a subnet for IP monitoring with peer polling, edit the IP Monitor section
of the cluster configuration file to look similar to this:
SUBNET 192.168.2.0
IP_MONITOR ON
NOTE: This is not the default. If cmquerycl does not detect a gateway for the subnet
in question, it sets IP_MONITOR to OFF, disabling IP-level polling for this subnet; if it
does detect a gateway, it populates POLLING_TARGET, enabling target polling. See
SUBNET under “Cluster Configuration Parameters ” (page 143) for more information.
The IP Monitor section of the cluster configuration file will look similar to the following
in the case of a subnet on which IP monitoring is disabled:
SUBNET 192.168.3.0
IP_MONITOR OFF
In this case, you would need to re-enable the primary interface on each node after the
link is repaired, using cmmodnet (1m); for example:
cmmodnet —e lan2
Both the Single and Dual ported LANs in the non-aggregated configuration have four
LAN cards, each associated with a separate non-aggregated IP address and MAC
address, and each with its own LAN name (lan0, lan1, lan2, lan3). When these ports
are aggregated all four ports are associated with a single IP address and MAC address.
In this example, the aggregated ports are collectively known as lan900, the name by
which the aggregate is known on HP-UX 11i.
Various combinations of Ethernet card types (single or dual-ported) and aggregation
groups are possible, but it is vitally important to remember that at least two physical
cards must be used in any combination of APAs to avoid a single point of failure for
heartbeat connections. HP-APA currently supports both automatic and manual
configuration of link aggregates.
For information about implementing APA with Serviceguard, see the latest version of
the HP Auto Port Aggregation (APA) Support Guide and other APA documents posted
at www.hp.com/go/hpux-networking-docs.
VLAN Configurations
Virtual LAN configuration using HP-UX VLAN software is supported in Serviceguard
clusters.
Configuration Restrictions
HP-UX allows up to 1024 VLANs to be created from a physical NIC port. A large pool
of system resources is required to accommodate such a configuration; Serviceguard
could suffer performance degradation if many network interfaces are configured in
each cluster node. To prevent this and other problems, Serviceguard imposes the
following restrictions:
• A maximum of 30 network interfaces per node is supported. The interfaces can be
physical NIC ports, VLAN interfaces, APA aggregates, or any combination of
these.
• Local failover of VLANs must be onto the same link types. For example, you must
fail over from VLAN-over-Ethernet to VLAN-over-Ethernet.
• The primary and standby VLANs must have same VLAN ID (or tag ID).
• VLAN configurations are only supported on HP-UX 11i releases.
• Only port-based and IP-subnet-based VLANs are supported. Protocol-based VLAN
is not supported because Serviceguard does not support any transport protocols
other than TCP/IP.
• Each VLAN interface must be assigned an IP address in a unique subnet, unless
it is a standby for a primary VLAN interface.
• Failover from physical LAN interfaces to VLAN interfaces or vice versa is not
supported because of restrictions in VLAN software.
• Using VLAN in a Wide Area Network cluster is not supported.
• If CVM disk groups are used, you must not configure the Serviceguard heartbeat
over VLAN interfaces.
NOTE: It is possible, though not a best practice, to use legacy DSFs (that is, DSFs
using the older naming convention) on some nodes after migrating to agile addressing
on others; this allows you to migrate different nodes at different times, if necessary.
For information on migrating cluster lock volumes to agile addressing, see “Updating
the Cluster Lock Configuration” (page 364).
For more information about agile addressing, see following documents in the 11i v3
collection at www.hp.com/go/hpux-core-docs:
• the Logical Volume Management volume of the HP-UX System Administrator’s Guide
• the HP-UX 11i v3 Installation and Update Guide
• the white papers:
— Overview: The Next Generation Mass Storage Stack
— HP-UX 11i v3 Persistent DSF Migration Guide
— LVM Migration from Legacy to Agile Naming Model
— HP-UX 11i v3 Native Multi-Pathing for Mass Storage
See also the HP-UX 11i v3 intro(7) manpage, and“About Multipathing” (page 45).
Figure 3-21 shows the individual disks combined in a multiple disk mirrored
configuration.
Figure 3-22 shows the mirrors configured into LVM volume groups, shown in the figure
as /dev/vgpkgA and /dev/vgpkgB. The volume groups are activated by Serviceguard
packages for use by highly available applications.
NOTE: LUN definition is normally done using utility programs provided by the disk
array manufacturer. Since arrays vary considerably, you should refer to the
documentation that accompanies your storage unit.
Figure 3-24 shows LUNs configured with multiple paths (links) to provide redundant
pathways to the data.
NOTE: Under agile addressing, the storage units in these examples would have names
such as disk1, disk2, disk3, etc. See “About Device File Names (Device Special
Files)” (page 106)
As of A.11.20, Serviceguard supports cluster-wide DSFs, and HP recommends that you
use them. See “About Cluster-wide Device Special Files (cDSFs)” (page 135).
Finally, the multiple paths are configured into volume groups as shown in Figure 3-25.
NOTE: The HP-UX Logical Volume Manager is described in the HP-UX System
Administrator’s Guide. Release Notes for Veritas Volume Manager contain a description
of Veritas volume management products.
Logical Volume • Software is provided with all versions • Lacks flexibility and extended
Manager (LVM) of HP-UX. features of some other volume
• Provides up to 3-way mirroring using managers
optional Mirrordisk/UX software.
• Dynamic multipathing (DMP) is active
by default as of HP-UX 11i v3.
• Supports exclusive activation as well as
read-only activation from multiple
nodes
• Can be used to configure a cluster lock
disk
• Supports multiple heartbeat subnets;
the one with the faster failover time is
used to re-form the cluster.
Shared Logical • Provided free with SGeRAC for • Lacks the flexibility and extended
Volume Manager multi-node access to RAC data features of some other volume
(SLVM) • Supports up to 16 nodes in shared managers.
read/write mode for each cluster • Limited mirroring support
• Supports exclusive activation
• Supports multiple heartbeat subnets.
• Online node configuration with
activated shared volume groups (using
specific SLVM kernel and Serviceguard
revisions)
Base-VxVM • Software is supplied free with HP-UX • Cannot be used for a cluster lock
11i releases. • Supports only exclusive read or
• Java-based administration through write activation
graphical user interface. • Package delays are possible, due
• Striping (RAID 0) support. to lengthy vxdg import at the time
• Concatenation. the package is started or failed
• Online resizing of volumes. over
• Supports multiple heartbeat subnets.
Veritas Volume • Disk group configuration from any • Requires purchase of additional
Manager— Full node. license
VxVM product • DMP for active/active storage devices. • Cannot be used for a cluster lock
• Supports exclusive activation. • Does not support activation on
• Hot relocation and unrelocation of failed multiple nodes in either shared
subdisks mode or read-only mode
• Supports up to 32 plexes per volume • May cause delay at package
• RAID 1+0 mirrored stripes startup time due to lengthy vxdg
import
• RAID 1 mirroring
• RAID 5
• RAID 0+1 striped mirrors
• Supports multiple heartbeat subnets,
which could reduce cluster reformation
time.
Responses to Failures
Serviceguard responds to different kinds of failures in specific ways. For most hardware
failures, the response is not user-configurable, but for package and service failures,
you can choose the system’s response, within limits.
Example
Situation. Assume a two-node cluster, with Package1 running on SystemA and
Package2 running on SystemB. Volume group vg01 is exclusively activated on
SystemA; volume group vg02is exclusively activated on SystemB. Package IP
addresses are assigned to SystemA and SystemB respectively.
Failure. Only one LAN has been configured for both heartbeat and data traffic. During
the course of operations, heavy application traffic monopolizes the bandwidth of the
network, preventing heartbeat packets from getting through.
Since SystemA does not receive heartbeat messages from SystemB, SystemA attempts
to reform as a one-node cluster. Likewise, since SystemB does not receive heartbeat
messages from SystemA, SystemB also attempts to reform as a one-node cluster.
During the election protocol, each node votes for itself, giving both nodes 50 percent
of the vote. Because both nodes have 50 percent of the vote, both nodes now vie for the
cluster lock. Only one node will get the lock.
Outcome. Assume SystemA gets the cluster lock. SystemA reforms as a one-node
cluster. After re-formation, SystemA will make sure all applications configured to run
on an existing cluster node are running. When SystemA discovers Package2 is not
running in the cluster it will try to start Package2 if Package2 is configured to run
on SystemA.
NOTE: In a very few cases, Serviceguard will attempt to reboot the system before a
system reset when this behavior is specified. If there is enough time to flush the buffers
in the buffer cache, the reboot succeeds, and a system reset does not take place. Either
way, the system will be guaranteed to come down within a predetermined number of
seconds.
“Choosing Switching and Failover Behavior” (page 176) provides advice on choosing
appropriate failover behavior.
Service Restarts
You can allow a service to restart locally following a failure. To do this, you indicate a
number of restarts for each service in the package control script. When a service starts,
the variable RESTART_COUNT is set in the service’s environment. The service, as it
executes, can examine this variable to see whether it has been restarted after a failure,
and if so, it can take appropriate action such as cleanup.
NOTE: Planning and installation overlap considerably, so you may not be able to
complete the worksheets before you proceed to the actual configuration. In that case,
fill in the missing elements to document the system as you proceed with the
configuration.
Subsequent chapters describe configuration and maintenance tasks in detail.
General Planning
A clear understanding of your high availability objectives will help you to define your
hardware requirements and design your system. Use the following questions as a guide
for general planning:
1. What applications must continue to be available in the event of a failure?
2. What system resources (processing power, networking, SPU, memory, disk space)
are needed to support these applications?
3. How will these resources be distributed among the nodes in the cluster during
normal operation?
4. How will these resources be distributed among the nodes of the cluster in all
possible combinations of failures, especially node failures?
5. How will resources be distributed during routine maintenance of the cluster?
6. What are the networking requirements? Are all networks and subnets available?
7. Have you eliminated all single points of failure? For example:
• network points of failure.
• disk points of failure.
Hardware Planning
Hardware planning requires examining the physical hardware itself. One useful
procedure is to sketch the hardware configuration in a diagram that shows adapter
cards and buses, cabling, disks and peripherals. A sample diagram for a two-node
cluster is shown in Figure 4-1.
Create a similar sketch for your own cluster. You may also find it useful to record the
information as in the sample Hardware Worksheet (page 127), indicating which device
adapters occupy which slots, and the bus address for each adapter; you can update the
details as you do the cluster configuration (described in Chapter 5).
SPU Information
SPU information includes the basic characteristics of the systems you are using in the
cluster. Different models of computers can be mixed in the same cluster. This
configuration model also applies to HP Integrity servers. HP-UX workstations are not
supported for Serviceguard.
Collect information for the following items; see the Hardware Configuration Worksheet
(page 127) for an example:
Network Information
Serviceguard monitors LAN interfaces.
LAN Information
While a minimum of one LAN interface per subnet is required, at least two LAN
interfaces, one primary and one or more standby, are needed to eliminate single points
of network failure.
Primary System A 7
Primary System B 6
Primary System C 5
Primary System D 4
Disk #1 3
Disk #2 2
Disk #3 1
Disk #4 0
Disk #5 15
Disk #6 14
Others 13 - 8
=============================================================================
Disk I/O Information for Shared Disks:
Bus Type _SCSI_ Slot Number _4__ Address _16_ Disk Device File __________
Bus Type _SCSI_ Slot Number _6_ Address _24_ Disk Device File __________
Bus Type ______ Slot Number ___ Address ____ Disk Device File _________
Attach a printout of the output from the ioscan -fnC disk command
after installing disk hardware and rebooting the system. Mark this
printout to indicate which physical volume group each disk belongs to.
==========================================================================
Disk Power:
==========================================================================
Tape Backup Power:
==========================================================================
Other Power:
NOTE: You cannot use more than one type of lock in the same cluster.
A one-node cluster does not require a lock. Two-node clusters require the use of a
cluster lock, and a lock is recommended for larger clusters as well. Clusters larger than
four nodes can use only a Quorum Server as the cluster lock. In selecting a cluster lock
configuration, be careful to anticipate any potential need for additional cluster nodes.
For more information on lock disks, lock LUNs, and the Quorum Server, see “Choosing
Cluster Lock Disks” (page 225), “Setting Up a Lock LUN” (page 226), and “Setting Up
and Running the Quorum Server” (page 230).
/dev/dsk/c0t1d4 30 seconds
==============================================================================
LVM Planning
You can create storage groups using the HP-UX Logical Volume Manager (LVM), or
using Veritas VxVM and CVM software as described in the next section.
When designing your disk layout using LVM, you should consider the following:
• The root disk should belong to its own volume group.
• The volume groups that contain high-availability applications, services, or data
must be on a bus or busses available to the primary node and all adoptive nodes.
• High availability applications, services, and data should be placed in a separate
volume group from non-high availability applications, services, and data.
• You must group high availability applications, services, and data, whose control
needs to be transferred together, onto a single volume group or series of volume
groups.
• You must not group two different high-availability applications, services, or data,
whose control needs to be transferred independently, onto the same volume group.
• Your root disk must not belong to a volume group that can be activated on another
node.
• HP recommends that you use volume group names other than the default volume
group names (vg01, vg02, etc.). Choosing volume group names that represent
the high availability applications that they are associated with (for example,
/dev/vgdatabase will simplify cluster administration).
• Logical Volume Manager (LVM) 2.0 volume groups, which remove some of the
limitations imposed by LVM 1.0 volume groups, can be used on systems running
some recent versions of HP-UX 11i v3 and Serviceguard. Check the Release Notes
for your version of Servicegaurd for details. For more information, see the white
paper LVM 2.0 Volume Groups in HP-UX 11i v3 at www.hp.com/go/
hpux-core-docs -> HP–UX 11i v3.
NOTE: EMS cannot be used to monitor the status of VxVM disk groups. For this you
should use the volume monitor cmvolmond which is supplied with Serviceguard.
cmvolmond can also monitor LVM volumes. See “About the Volume Monitor”
(page 172).
resource_name /vg/vgpkg/pv_summary
resource_polling_interval 60
IMPORTANT: You must set the IO timeout for all logical volumes within the volume
group being monitored to something other than the default of zero (no timeout);
otherwise the EMS resource monitor value will never change upon a failure. Suggested
IO timeout values are 20 to 60 seconds. See “Setting Logical Volume Timeouts” (page 234)
for more information.
For more information, see “Using the EMS HA Monitors” (page 79).
LVM Worksheet
You may find a worksheet such as the following useful to help you organize and record
your physical disk configuration. This worksheet is an example; blank worksheets are
in Appendix E (page 463). Make as many copies as you need.
NOTE: Under agile addressing, the physical volumes in the sample worksheet that
follows would have names such as disk1, disk2, etc. See “About Device File Names
(Device Special Files)” (page 106).
As of A.11.20, Serviceguard supports cluster-wide DSFs, and HP recommends that you
use them. See “About Cluster-wide Device Special Files (cDSFs)” (page 135).
=============================================================================
IMPORTANT: Check the latest version of the release notes (at the address given in the
preface to this manual) for information about Serviceguard support for cDSFs.
HP recommends that you use cDSFs for the storage devices in the cluster because this
makes it simpler to deploy and maintain a cluster, and removes a potential source of
configuration errors. See “Creating Cluster-wide Device Special Files (cDSFs)” (page 206)
for instructions.
Points To Note
• cDSFs can be created for any group of nodes that you specify, provided that
Serviceguard A.11.20 is installed on each node.
Normally, the group should comprise the entire cluster.
• cDSFs apply only to shared storage; they will not be generated for local storage,
such as root, boot, and swap devices.
• Once you have created cDSFs for the cluster, HP-UX automatically creates new
cDSFs when you add shared storage.
• HP recommends that you do not mix cDSFs with persistent (or legacy DSFs) in a
volume group, and you cannot use cmpreparestg (1m) on a volume group in
which they are mixed.
See “About Easy Deployment” (page 137) for more information about
cmpreparestg.
Limitations of cDSFs
• cDSFs are supported only within a single cluster; you cannot define a cDSF group
that crosses cluster boundaries.
• A node can belong to only one cDSF group.
• cDSFs are not supported by VxVM, CVM, CFS, or any other application that
assumes DSFs reside only in /dev/disk and /dev/rdisk.
• Oracle ASM cannot detect cDSFs created after ASM is installed.
• cDSFs do not support disk partitions.
Such partitions can be addressed by a device file using the agile addressing scheme,
but not by a cDSF.
NOTE: How the clients of IPv6-only cluster applications handle hostname resolution
is a matter for the discretion of the system or network administrator; there are no HP
requirements or recommendations specific to this case.
In IPv6-only mode, all Serviceguard daemons will normally use IPv6 addresses for
communication among the nodes, although local (intra-node) communication may
occur on the IPv4 loopback address.
For more information about IPv6, see Appendix G (page 475).
IMPORTANT: See the latest version of the Serviceguard release notes for the most
current information on these and other restrictions.
• All addresses used by the cluster must be in each node's /etc/hosts file. In
addition, the file must contain the following entry:
::1 localhost ipv6-localhost ipv6-loopback
For more information and recommendations about hostname resolution, see
“Configuring Name Resolution” (page 218).
• All addresses must be IPv6, apart from the node's IPv4 loopback address, which
cannot be removed from /etc/hosts.
• The node's public LAN address (by which it is known to the outside world) must
be the last address listed in /etc/hosts.
Otherwise there is a possibility of the address being used even when it is not
configured into the cluster.
• You must use $SGCONF/cmclnodelist, not ~/.rhosts or /etc/hosts.equiv,
to provide root access to an unconfigured node.
• If you use a Quorum Server, you must make sure that the Quorum Server hostname
(and the alternate Quorum Server address specified by QS_ADDR, if any) resolve
to IPv6 addresses, and you must use Quorum Server version A.04.00 or later. See
the latest Quorum Server release notes for more information; you can find them
at www.hp.com/go/hpux-serviceguard-docs under HP Serviceguard
Quorum Server Software.
NOTE: The Quorum Server itself can be an IPv6–only system; in that case it can
serve IPv6–only and mixed-mode clusters, but not IPv4–only clusters.
• If you use a Quorum Server, and the Quorum Server is on a different subnet from
cluster, you must use an IPv6-capable router.
• Hostname aliases are not supported for IPv6 addresses, because of operating
system limitations.
• CFS, CVM, VxVM, and VxFS are not supported in IPv6–only mode.
NOTE: CFS and CVM should not be installed on any system in an IPv6–only
cluster, even if they are not configured.
This also means that the Serviceguard component of bundles that include
Serviceguard cannot be configured in IPV6 mode.
• HPVM is not supported IPv6-only mode. You cannot configure a virtual machine
either as a node or a package in an IPv6-only cluster.
IMPORTANT: Check the current Serviceguard release notes for the latest instructions
and recommendations.
• If you decide to migrate the cluster to IPv6-only mode, you should plan to do so
while the cluster is down.
IMPORTANT: See the latest version of the Serviceguard release notes for the most
current information on these and other restrictions.
• The hostname resolution file on each node (for example, /etc/hosts) must
contain entries for all the IPv4 and IPv6 addresses used throughout the cluster,
including all STATIONARY_IP and HEARTBEAT_IP addresses as well any private
addresses. There must be at least one IPv4 address in this file (in the case of
/etc/hosts, the IPv4 loopback address cannot be removed).
In addition, the file must contain the following entry:
::1 localhost ipv6-localhost ipv6-loopback
For more information and recommendations about hostname resolution, see
“Configuring Name Resolution” (page 218).
• You must use $SGCONF/cmclnodelist, not ~/.rhosts or /etc/hosts.equiv,
to provide root access to an unconfigured node.
See “Allowing Root Access to an Unconfigured Node” (page 216) for more
information.
• Hostname aliases are not supported for IPv6 addresses, because of operating
system limitations.
NOTE: See “Reconfiguring a Cluster” (page 359) for a summary of changes you can
make while the cluster is running.
The following parameters must be configured:
CLUSTER_NAME The name of the cluster as it will appear in the
output of cmviewcl and other commands, and
as it appears in the cluster configuration file.
The cluster name must not contain any of the
following characters: space, slash (/), backslash
(\), and asterisk (*).
NOTE: Limitations:
• Because Veritas Cluster File System from
Symantec (CFS) requires link-level traffic
communication (LLT) among the nodes,
Serviceguard cannot be configured in
cross-subnet configurations with CFS alone.
But CFS is supported in specific cross-subnet
configurations with Serviceguard and HP
add-on products; see the documentation
listed under “Cross-Subnet Configurations”
(page 41) for more information.
• IPv6 heartbeat subnets are not supported in
a cross-subnet configuration.
Considerations for CVM:
• For Veritas CVM 4.1 or later, multiple
heartbeats are permitted, and you must
configure either multiple heartbeat subnets
or a single heartbeat subnet with a standby.
HP recommends multiple heartbeats.
• You cannot change the heartbeat
configuration while a cluster that uses CVM
is running.
• You cannot use an IPv6 heartbeat subnet
with CVM or CFS.
NOTE:
CONFIGURED_IO_TIMEOUT_EXTENSION
is supported only with iFCP switches that
allow you to get their R_A_TOV value.
NOTE: As of the date of this manual, the Framework for HP Serviceguard Toolkits deals
specifically with legacy packages.
NOTE: LVM Volume groups that are to be activated by packages must also be defined
as cluster-aware in the cluster configuration file. See “Cluster Configuration Planning
” (page 135). Disk groups (for Veritas volume managers) that are to be activated by
packages must be defined in the package configuration file, described below.
You may need to use logical volumes in volume groups as part of the infrastructure
for package operations on a cluster. When the package moves from one node to another,
it must be able to access data residing on the same disk as on the previous node. This
is accomplished by activating the volume group and mounting the file system that
resides on it.
In Serviceguard, high availability applications, services, and data are located in volume
groups that are on a shared bus. When a node fails, the volume groups containing the
applications, services, and data of the failed node are deactivated on the failed node
and activated on the adoptive node. In order for this to happen, you must configure
the volume groups so that they can be transferred from the failed node to the adoptive
node.
Planning Veritas Cluster Volume Manager (CVM) and Cluster File System (CFS)
CAUTION: Serviceguard manages Veritas processes, specifically gab and LLT, through
system multi-node packages. As a result, the Veritas administration commands such
as gabconfig, llthosts, and lltconfig should only be used in display mode, for
example gabconfig -a. You could crash nodes or the entire cluster if you use Veritas
commands such as the gab* or llt* commands to configure these components or affect
their runtime behavior.
CAUTION: Once you create the disk group and mount point packages, you must
administer the cluster with CFS commands, including cfsdgadm, cfsmntadm,
cfsmount, and cfsumount. You must not use the HP-UX mount or umount
command to provide or remove access to a shared file system in a CFS environment;
using these HP-UX commands under these circumstances is not supported. Use
cfsmount and cfsumount instead.
If you use the HP-UX mount and umount commands, serious problems could
occur, such as writing to the local file system instead of the cluster file system.
Non-CFS commands could cause conflicts with subsequent CFS command
operations on the file system or the Serviceguard packages, and will not create an
appropriate multi-node package, which means cluster packages will not be aware
of file system changes.
NOTE: The Disk Group (DG) and Mount Point (MP) multi-node packages
(SG-CFS-DG_ID# and SG-CFS-MP_ID#) do not monitor the health of the disk
group and mount point. They check that the application packages that depend on
them have access to the disk groups and mount points. If the dependent application
package loses access and cannot read and write to the disk, it will fail, but that will
not cause the DG or MP multi-node package to fail.
4. You create the CFS package, SG-CFS-pkg, with the cfscluster command. It is
a system multi-node package that regulates the volumes used by CVM 4.1 and
later. System multi-node packages cannot be dependent on any other package.
IMPORTANT: Check the latest version of the release notes (at the address given in the
preface to this manual) for information about Serviceguard support for the volume
monitor.
NOTE: For LVM, using this monitor is an alternative to using Event Monitoring
Service (EMS) resource dependencies; see “Using EMS to Monitor Volume Groups”
(page 132). EMS does not currently provide a monitor for VxVM.
Configure the Volume Monitor as a service in a package that requires access to a VxVM
or LVM storage volume.
The package can be a failover package or multi-node package. For example, you can
configure the monitor as a service in a failover package to monitor a storage volume
(or multiple storage volumes) required by the package application. Alternatively, the
monitor could be used in a multi-node package to monitor identically-named root,
boot, or swap volumes on cluster nodes. Because the root, boot, and swap volumes are
critical to the functioning of the node , the service should be configured with
service_restart (page 301) set to yes.
When a monitored volume fails or becomes inaccessible, the monitor service will exit,
causing the package to fail on the current node. The package’s failure behavior depends
on its configured settings. For prompt recovery, HP recommends setting the value of
service_restart (page 301) for the monitoring service to none.
To ensure that a package requiring a storage volume does not attempt to start on or
fail over to a node where the storage volume is unavailable, the monitor service may
be configured in a separate package, and a package dependency may be used to ensure
that the required package is running, indicating the storage is available. Depending
on the configuration, the monitor package could be a multi-node or failover package,
and would be required to be running by the storage volume-dependent application
package. Alternatively, if you are using EMS, please be aware that EMS resource
NOTE: When using the volume monitor to monitor LVM logical volumes, you need
to make sure that the logical volume timeout value is properly configured. This value
should be configured to be at least one second less than the poll-interval specified in the
monitor service command. I/O requests to logical volumes with no timeout set may
block indefinitely. See “Setting Logical Volume Timeouts” (page 234) for more
information.
Command Syntax
The syntax for the monitoring command, cmvolmond, is as follows:
cmvolmond [-h, --help] [-v, --version]
[-O, --log-file <log_file>
[-D, --log-level <1-7>
[-t, --poll-interval <seconds>
<volume_path> [<volume_path> ...]
A brief description of each parameter follows:
-h or --help
Displays the usage, as listed above, and exits.
NOTE: Do not include the help or version parameters in your service command; this
will result in immediate package failure at runtime.
-v or --version
Displays the monitor version and exits.
-O or --log-file
Specifies a file for logging (log messages are printed to the console by default).
-D or --log-level
Specifies the log level. The level of detail logged is directly proportional to the numerical
value of the log level. That is, a log level of 7 will provide the greatest amount of log
information. The default log level is 0.
-t or --poll-interval
Specifies the interval between volume probes. You can specify a polling interval of as
little as 1 (one second), but bear in mind that a short polling interval (less than 10
seconds) may impair system performance if you are monitoring a large number of
volumes. HP recommends a polling interval of at least 10 seconds if 50 or more volumes
are being monitored by a single service command. The default polling interval is 60
seconds. In the event of a failed read attempt on a storage volume, the monitor service
will terminate after a single poll interval. If a read attempt never completes, the monitor
Examples
/usr/sbin/cmvolmond -O /log/monlog.log -D 3
/dev/vx/dsk/cvm_dg0/lvol2
This command monitors a single VxVM volume, /dev/vx/dsk/cvm_dg0/lvol2,
at log level 3, with a polling interval of 60 seconds, and prints all log messages to
/log/monlog.log.
/usr/sbin/cmvolmond /dev/vg01/lvol1 /dev/vg01/lvol2
This command monitors two LVM logical volumes at the default log level of 0, with a
polling interval of 60 seconds, and prints all log messages to the console.
/usr/sbin/cmvolmond -t 10 /dev/vg00/lvol1
This command monitors the LVM root logical volume at log level 0, with a polling
interval of 10 seconds, and prints all log messages to the console (package log).
Scope of Monitoring
The Volume Monitor detects the following failures:
• Failure of the last link to a storage device or set of devices critical to volume
operation
• Failure of a storage device or set of devices critical to volume operation
• An unexpected detachment, disablement, or deactivation of a volume
The Volume Monitor does not detect the following failures:
• Failure of a redundant link to a storage device or set of devices where a working
link remains
• Failure of a mirror or mirrored plex within a volume (assuming at least one mirror
or plex is functional)
• Corruption of data on a monitored volume.
IMPORTANT: Find out the MBTD value for each affected router and switch from
the vendors' documentation; determine all of the possible paths; find the worst
case sum of the MBTD values on these paths; and use the resulting value to set
the Serviceguard CONFIGURED_IO_TIMEOUT_EXTENSION parameter. For
instructions, see the discussion of this parameter under “Cluster Configuration
Parameters ” (page 143).
Switches and routers that do not support MBTD value must not be used in a
Serviceguard NFS configuration. This might lead to delayed packets that in turn
could lead to data corruption.
• Networking among the Serviceguard nodes must be configured in such a way that
a single failure in the network does not cause a package failure.
• Only NFS client-side locks (local locks) are supported.
Server-side locks are not supported.
• Because exclusive activation is not available for NFS-imported file systems, you
should take the following precautions to ensure that data is not accidentally
overwritten.
— The server should be configured so that only the cluster nodes have access to
the file system.
— The NFS file system used by a package must not be imported by any other
system, including other nodes in the cluster.
— The nodes should not mount the file system on boot; it should be mounted
only as part of the startup for the package that uses it.
— The NFS file system should be used by only one package.
— While the package is running, the file system should be used exclusively by
the package.
— If the package fails, do not attempt to restart it manually until you have verified
that the file system has been unmounted properly.
NOTE: If network connectivity to the NFS Server is lost, the applications using
the imported file system may hang and it may not be possible to kill them. If the
package attempts to halt at this point, it may not halt successfully
Package fails over to the node with the • failover_policy set to min_package_node.
fewest active packages.
Package fails over to the node that is • failover_policy set to configured_node. (Default)
next on the list of nodes. (Default)
All packages switch following a system • service_fail_fast_enabled set to yes for a specific service.
reset (an immediate halt without a • auto_run set to yes for all packages.
graceful shutdown) on the node when
a specific service fails. Halt scripts are
not run.
All packages switch following a system • service_fail_fast_enabled set to yes for all services.
reset on the node when any service fails. • auto_run set to yes for all packages.
An attempt is first made to reboot the
system prior to the system reset.
Failover packages can be also configured so that IP addresses switch from a failed LAN
card to a standby LAN card on the same node and the same physical subnet. To manage
this behavior, use the parameter local_lan_failover_allowed (page 296) in the package
configuration file. (yes, meaning enabled, is the default.)
NOTE: The default form for parameter names and literal values in the modular
package configuration file is lower case; for legacy packages the default is upper case.
There are no compatibility issues; Serviceguard is case-insensitive as far as the
parameters are concerned. This manual uses lower case, unless the parameter in question
is used only in legacy packages, or the context refers exclusively to such a package.
Serviceguard provides a set of parameters for configuring EMS (Event Monitoring
Service) resources. These are resource_name, resource_polling_interval, resource_start, and
resource_up_value. Configure each of these parameters in the package configuration
file for each resource the package will be dependent on.
The resource_start parameter determines when Serviceguard starts up resource
monitoring for EMS resources. resource_start can be set to either automaticor
deferred.
Serviceguard will start up resource monitoring for automatic resources automatically
when the Serviceguard cluster daemon starts up on the node.
Serviceguard will not attempt to start deferred resource monitoring during node
startup, but will start monitoring these resources when the package runs.
The following is an example of how to configure deferred and automatic resources.
resource_name /net/interfaces/lan/status/lan0
resource_polling_interval 60
resource_start deferred
resource_up_value = up
resource_name /net/interfaces/lan/status/lan1
resource_polling_interval 60
resource_start deferred
resource_up_value = up
resource_name /net/interfaces/lan/status/lan0
resource_polling_interval 60
resource_start automatic
resource_up_value = up
Simple Dependencies
A simple dependency occurs when one package requires another to be running on the
same node. You define these conditions by means of the parameters dependency_condition
and dependency_location, using the literal values UP and same_node, respectively. (For
detailed configuration information, see the package parameter definitions starting with
“dependency_name” (page 294). For a discussion of complex dependencies, see “Extended
Dependencies” (page 184).)
Make a package dependent on another package if the first package cannot (or should
not) function without the services provided by the second, on the same node. For
example, pkg1 might run a real-time web interface to a database managed by pkg2
on the same node. In this case it might make sense to make pkg1 dependent on pkg2.
In considering whether or not to create a simple dependency between packages, use
the Rules for Simple Dependencies and Guidelines for Simple Dependencies that follow.
NOTE: pkg1 can depend on more than one other package, and pkg2 can depend on
another package or packages; we are assuming only two packages in order to make
the rules as clear as possible.
• pkg1 will not start on any node unless pkg2 is running on that node.
• pkg1’s package_type (page 289) and failover_policy (page 292) constrain the type and
characteristics of pkg2, as follows:
— If pkg1 is a multi-node package, pkg2 must be a multi-node or system
multi-node package. (Note that system multi-node packages are not supported
for general use.)
— If pkg1 is a failover package and its failover_policy is min_package_node,
pkg2 must be a multi-node or system multi-node package.
— If pkg1 is a failover package and its failover_policy is configured_node, pkg2
must be:
◦ a multi-node or system multi-node package, or
◦ a failover package whose failover_policy is configured_node.
• pkg2 cannot be a failover package whose failover_policy is min_package_node.
• pkg2’s node node_name list (page 289) must contain all of the nodes on pkg1’s.
— This means that if pkg1 is configured to run on any node in the cluster (*),
pkg2 must also be configured to run on any node.
NOTE: If pkg1 lists all the nodes, rather than using the asterisk (*), pkg2
must also list them.
— Preferably the nodes should be listed in the same order if the dependency is
between packages whose failover_policy is configured_node; cmcheckconf
and cmapplyconf will warn you if they are not.
• A package cannot depend on itself, directly or indirectly.
That is, not only must pkg1 not specify itself in the dependency_condition (page 294),
but pkg1 must not specify a dependency on pkg2 if pkg2 depends on pkg1, or
if pkg2 depends on pkg3 which depends on pkg1, etc.
• If pkg1 is a failover package and pkg2 is a multi-node or system multi-node
package, and pkg2 fails, pkg1 will halt and fail over to the next node on its
node_name list on which pkg2 is running (and any other dependencies, such as
resource dependencies or a dependency on a third package, are met).
• In the case of failover packages with a configured_node failover_policy, a set of
rules governs under what circumstances pkg1 can force pkg2 to start on a given
NOTE: This applies only when the packages are automatically started (package
switching enabled); cmrunpkg will never force a package to halt.
Keep in mind that you do not have to set priority, even when one or more packages
depend on another. The default value, no_priority, may often result in the behavior
you want. For example, if pkg1 depends on pkg2, and priority is set to no_priority
for both packages, and other parameters such as node_name and auto_run are set as
recommended in this section, then pkg1 will normally follow pkg2 to wherever both
can run, and this is the common-sense (and may be the most desirable) outcome.
The following examples express the rules as they apply to two failover packages whose
failover_policy (page 292) is configured_node. Assume pkg1 depends on pkg2, that
node1, node2 and node3 are all specified (not necessarily in that order) under
node_name (page 289) in the configuration file for each package, and that failback_policy
(page 293) is set to automatic for each package.
In a simple dependency, if pkg1 depends on pkg2, and pkg1’s priority is higher than
pkg2’s, pkg1’s node order dominates. Assuming pkg1’s node order is node1, node2,
node3, then:
• On startup:
— pkg1 will select node1 to start on, provided pkg2 can run there.
— pkg2 will start on node1, provided it can run there (no matter where node1
appears on pkg2’s node_name list).
◦ If pkg2 is already running on another node, it will be dragged to node1,
provided it can run there.
— If pkg2 cannot start on node1, then both packages will attempt to start on
node2 (and so on).
Note that the nodes will be tried in the order of pkg1’s node_name list, and pkg2
will be dragged to the first suitable node on that list whether or not it is currently
running on another node.
• On failover:
— If pkg1 fails on node1, pkg1 will select node2 to fail over to (or node3 if it
can run there and node2 is not available or does not meet all of its dependencies;
etc.)
— pkg2 will be dragged to whatever node pkg1 has selected, and restart there;
then pkg1 will restart there.
• On failback:
— If both packages have moved to node2 and node1 becomes available, pkg1
will fail back to node1 if both packages can run there;
◦ otherwise, neither package will fail back.
Extended Dependencies
To the capabilities provided by Simple Dependencies (page 179), extended dependencies
add the following:
• You can specify whether the package depended on must be running or must be
down.
You define this condition by means of the dependency_condition, using one of the
literals UP or DOWN (the literals can be upper or lower case). We'll refer to the
requirement that another package be down as an exclusionary dependency; see
“Rules for Exclusionary Dependencies” (page 185).
• You can specify where the dependency_condition must be satisfied: on the same
node, a different node, all nodes, or any node in the cluster.
You define this by means of the dependency_location parameter (page 295), using
one of the literals same_node, different_node, all_nodes, or any_node.
different_node and any_node are allowed only if dependency_condition is UP.
all_nodes is allowed only if dependency_condition is DOWN.
See “Rules for different_node and any_node Dependencies” (page 186).
For more information about the dependency_ parameters, see the definitions starting
with “dependency_name” (page 294), and the cmmakepkg (1m) manpage.
NOTE: Unexpected behavior may result if you simultaneously halt two packages
that have an exclusionary dependency on each other.
Simple Method
Use this method if you simply want to control the number of packages that can run on
a given node at any given time. This method works best if all the packages consume
about the same amount of computing resources.
If you need to make finer distinctions between packages in terms of their resource
consumption, use the Comprehensive Method (page 190) instead.
To implement the simple method, use the reserved keyword package_limit to define
each node's capacity. In this case, Serviceguard will allow you to define only this single
type of capacity, and corresponding package weight, in this cluster. Defining package
weight is optional; for package_limit it will default to 1 for all packages, unless you
change it in the package configuration file.
Example 1
For example, to configure a node to run a maximum of ten packages at any one time,
make the following entry under the node's NODE_NAME entry in the cluster
configuration file:
NODE_NAME node1
...
CAPACITY_NAME package_limit
NOTE: Serviceguard does not require you to define a capacity for each node. If you
define the CAPACITY_NAME and CAPACITY_VALUE parameters for some nodes but
not for others, the nodes for which these parameters are not defined are assumed to
have limitless capacity; in this case, those nodes would be able to run any number of
eligible packages at any given time.
If some packages consume more resources than others, you can use the weight_name
and weight_value parameters to override the default value (1) for some or all packages.
For example, suppose you have three packages, pkg1, pkg2, and pkg3. pkg2 is about
twice as resource-intensive as pkg3 which in turn is about one-and-a-half times as
resource-intensive as pkg1. You could represent this in the package configuration files
as follows:
• For pkg1:
weight_name package_limit
weight_value 2
• For pkg2:
weight_name package_limit
weight_value 6
• For pkg3:
weight_name package_limit
weight_value 3
Now node1, which has a CAPACITY_VALUE of 10 for the reserved CAPACITY_NAME
package_limit, can run any two of the packages at one time, but not all three. If in
addition you wanted to ensure that the larger packages, pkg2 and pkg3, did not run
on node1 at the same time, you could raise the weight_value of one or both so that the
combination exceeded 10 (or reduce node1's capacity to 8).
Comprehensive Method
Use this method if the Simple Method (page 188) does not meet your needs. (Make sure
you have read that section before you proceed.) The comprehensive method works
best if packages consume differing amounts of computing resources, so that simple
one-to-one comparisons between packages are not useful.
IMPORTANT: You cannot combine the two methods. If you use the reserved capacity
package_limit for any node, Serviceguard will not allow you to define any other
type of capacity and weight in this cluster; so you are restricted to the Simple Method
in that case.
Defining Capacities
Begin by deciding what capacities you want to define; you can define up to four different
capacities for the cluster.
You may want to choose names that have common-sense meanings, such as “processor”,
“memory”, or “IO”, to identify the capacities, but you do not have to do so. In fact it
Example 2
To define these capacities, and set limits for individual nodes, make entries such as the
following in the cluster configuration file:
CLUSTER_NAME cluster_23
...
NODE_NAME node1
...
NOTE: You do not have to define capacities for every node in the cluster. If any
capacity is not defined for any node, Serviceguard assumes that node has an infinite
amount of that capacity. In our example, not defining capacity A for a given node would
automatically mean that node could run pkg1 and pkg2 at the same time no matter
what A weights you assign those packages; not defining capacity B would mean the
node could run pkg3 and pkg4 at the same time; and not defining either one would
mean the node could run all four packages simultaneously.
When you have defined the nodes' capacities, the next step is to configure the package
weights; see “Defining Weights”.
Defining Weights
Package weights correspond to node capacities, and for any capacity/weight pair,
CAPACITY_NAME and weight_name must be identical.
You define weights for individual packages in the package configuration file, but you
can also define a cluster-wide default value for a given weight, and, if you do, this
default will specify the weight of all packages that do not explicitly override it in their
package configuration file.
NOTE: There is one exception: system multi-node packages cannot have weight, so
a cluster-wide default weight does not apply to them.
NOTE: Option 4 means that the package is “weightless” as far as this particular
capacity is concerned, and can run even on a node on which this capacity is completely
consumed by other packages.
(You can make a package “weightless” for a given capacity even if you have defined
a cluster-wide default weight; simply set the corresponding weight to zero in the
package's cluster configuration file.)
Pursuing the example started under “Defining Capacities” (page 190), we can now use
options 1 and 2 to set weights for pkg1 through pkg4.
Example 4
In pkg1's package configuration file:
weight_name A
weight_value 60
In pkg2's package configuration file:
IMPORTANT: weight_name in the package configuration file must exactly match the
corresponding CAPACITY_NAME in the cluster configuration file. This applies to case
as well as spelling: weight_name a would not match CAPACITY_NAME A.
You cannot define a weight unless the corresponding capacity is defined: cmapplyconf
will fail if you define a weight in the package configuration file and no node in the
package's node_name list (page 289) has specified a corresponding capacity in the cluster
configuration file; or if you define a default weight in the cluster configuration file and
no node in the cluster specifies a capacity of the same name.
• Node capacity is defined in the cluster configuration file, via the CAPACITY_NAME
and CAPACITY_VALUE parameters.
• Capacities can be added, changed, and deleted while the cluster is running. This
can cause some packages to be moved, or even halted and not restarted.
• Package weight can be defined in cluster configuration file, via the WEIGHT_NAME
and WEIGHT_DEFAULT parameters, or in the package configuration file, via the
weight_name and weight_value parameters, or both.
• Weights can be assigned (and WEIGHT_DEFAULTs, apply) only to multi-node
packages and to failover packages whose failover_policy (page 292) is
configured_node and whose failback_policy (page 293) is manual.
• If you define weight (weight_name and weight_value) for a package, make sure you
define the corresponding capacity (CAPACITY_NAME and CAPACITY_VALUE)
in the cluster configuration file for at least one node on the package's node_name
list (page 289). Otherwise cmapplyconf will fail when you try to apply the package.
• Weights (both cluster-wide WEIGHT_DEFAULTs, and weights defined in the
package configuration files) can be changed while the cluster is up and the packages
are running. This can cause some packages to be moved, or even halted and not
restarted.
Example 1
• pkg1 is configured to run on nodes turkey and griffon. It has a weight of 1
and a priority of 10. It is down and has switching disabled.
• pkg2 is configured to run on nodes turkey and griffon. It has a weight of 1
and a priority of 20. It is running on node turkey and has switching enabled.
• turkey and griffon can run one package each (package_limit is set to 1).
If you enable switching for pkg1, Serviceguard will halt the lower-priority pkg2 on
turkey. It will then start pkg1 on turkey and restart pkg2 on griffon.
If neither pkg1 nor pkg2 had priority, pkg2 would continue running on turkey and
pkg1 would run on griffon.
Example 2
• pkg1 is configured to run on nodes turkey and griffon. It has a weight of 1
and a priority of 10. It is running on node turkey and has switching enabled.
• pkg2 is configured to run on nodes turkey and griffon. It has a weight of 1
and a priority of 20. It is running on node turkey and has switching enabled.
• pkg3 is configured to run on nodes turkey and griffon. It has a weight of 1
and a priority of 30. It is down and has switching disabled.
• pkg3 has a same_node dependency on pkg2
• turkey and griffon can run two packages each (package_limit is set to 2).
If you enable switching for pkg3, it will stay down because pkg2, the package it depends
on, is running on node turkey, which is already running two packages (its capacity
limit). pkg3 has a lower priority than pkg2, so it cannot drag it to griffon where
they both can run.
NOTE: In the case of the validate entry point, exit values 1 and 2 are treated the
same; you can use either to indicate that validation failed.
The script can make use of a standard set of environment variables (including the
package name, SG_PACKAGE, and the name of the local node, SG_NODE) exported by
the package manager or the master control script that runs the package; and can also
call a function to source in a logging function and other utility functions. One of these
functions, sg_source_pkg_env(), provides access to all the parameters configured
for this package, including package-specific environment variables configured via the
pev_ parameter (page 309).
function validate_command
{
typeset -i ret=0
typeset -i i=0
typeset -i found=0
# check PEV_ attribute is configured and within limits
if [[ -z PEV_MONITORING_INTERVAL ]]
then
sg_log 0 "ERROR: PEV_MONITORING_INTERVAL attribute not configured!"
ret=1
elif (( PEV_MONITORING_INTERVAL < 1 ))
then
sg_log 0 "ERROR: PEV_MONITORING_INTERVAL value ($PEV_MONITORING_INTERVAL) not within legal
limits!"
ret=1
fi
# check monitoring service we are expecting for this package is configured
while (( i < ${#SG_SERVICE_NAME[*]} ))
last_halt_failed
cmviewcl -v -f line displays a last_halt_failed flag.
NOTE: This section provides an example for a modular package; for legacy packages,
see “Configuring Cross-Subnet Failover” (page 383).
Suppose that you want to configure a package, pkg1, so that it can fail over among all
the nodes in a cluster comprising NodeA, NodeB, NodeC, and NodeD.
NodeA and NodeB use subnet 15.244.65.0, which is not used by NodeC and NodeD;
and NodeC and NodeD use subnet 15.244.56.0, which is not used by NodeA and
NodeB. (See “Obtaining Cross-Subnet Information” (page 248) for sample cmquerycl
output).
Configuring monitored_subnet_access
In order to monitor subnet 15.244.65.0 or 15.244.56.0, depending on where
pkg1 is running, you would configure monitored_subnet and monitored_subnet_access
in pkg1’s package configuration file as follows:
monitored_subnet 15.244.65.0
monitored_subnet_access PARTIAL
monitored_subnet 15.244.56.0
monitored_subnet_access PARTIAL
Configuring ip_subnet_node
Now you need to specify which subnet is configured on which nodes. In our example,
you would do this by means of entries such as the following in the package configuration
file:
ip_subnet 15.244.65.0
ip_subnet_node nodeA
ip_subnet_node nodeB
ip_address 15.244.65.82
ip_address 15.244.65.83
ip_subnet 15.244.56.0
ip_subnet_node nodeC
ip_subnet_node nodeD
NOTE: If these variables are not defined on your system, then source the file /etc/
cmcluster.conf in your login profile for user root. For example, you can add this
line to root’s .profile file:
. /etc/cmcluster.conf
Throughout this book, system filenames are usually given with one of these location
prefixes. Thus, references to $SGCONF/filename can be resolved by supplying the
definition of the prefix that is found in this file. For example, if SGCONF is defined as
/etc/cmcluster/, then the complete pathname for file $SGCONF/cmclconfig is
/etc/cmcluster/cmclconfig.
2. If the cluster does not yet exist, set up root access among the prospective nodes:
a. If you have not already done so, set up ssh public/private key pairs on each
node. This will allow the necessary commands to operate on all the prospective
nodes before a cluster is formed.
The simplest way to do this is via the DSAU csshsetup command; for
example, if you are setting up a two-node cluster with nodes node1 and
node2, and you are logged in on node1:
csshsetup -r node2
For a large number of nodes, you might want to enter the node names into a
file and use the -f option to get csshsetup to read the names from the file;
for example, if you have stored the names in the file /etc/cmcluster/
sshhosts:
csshsetup -r -f /etc/cmcluster/sshhosts
For more information about setting up ssh keys, see the HP-UX Secure Shell
Getting Started Guide at www.hp.com/go/hpux-core-docs.
b. Configure root access to each prospective node, using the hostname portion
(only) of the fully-qualified domain name:
cmpreparecl -n <node_name> -n <node_name> ...
For example, for a cluster that will consist four nodes, node1, node2, node3,
and node4:
cmpreparecl -n node1 -n node2 -n node3 -n node4
NOTE: Serviceguard must be installed on all of the nodes listed, and you
must be logged in as superuser on one of these nodes to run the command.
NOTE: cDSFs apply only to shared storage; they will not be generated for local
storage, such as root, boot, and swap devices.
• If the cluster does not exist yet, specify the name of each prospective node,
for example:
cmsetdsfgroup -n node1 -n node2 -n node3 -n node4
• If the cluster does exist, you can simply run:
cmsetdsfgroup -c
NOTE: You must be logged in as superuser on one of the cluster nodes. You
do not need to provide the cluster name.
The cDSFs created by cmsetdsfgroup reside in /dev/cdisk for block device files
and /dev/rcdisk for character devicefiles. You should use these new device files
exclusively when you configure the cluster lock (if any) and package storage; see
“Specifying a Lock Disk” (page 246), “Specifying a Lock LUN” (page 247), and “Creating
the Storage Infrastructure and Filesystems with LVM, VxVM and CVM” (page 230).
• To report information (in line output format only) about DSFs, cDSFs, and volume
groups for one or more nodes, use cmquerystg (1m).
• You can also use the HP-UX command io_cdsf_config (1m) to display
information about cDSFs.
See the manpages for more information.
IMPORTANT: If you change the cluster lock volume or volumes to cDSFs, you need
to change the cluster lock information in the cluster configuration file and re-apply the
configuration; follow one of the procedures under “Updating the Cluster Lock
Configuration” (page 364).
IMPORTANT: Before you start, you should have done the planning and preparation
outlined in Chapter 4 (page 121). You must also do the following.
• Install Serviceguard on each node that is to be configured into the cluster; see
“Installing and Updating Serviceguard ” (page 205).
You must have superuser capability on each node.
• Make sure all the nodes have access to at least one fully configured network.
NOTE: You cannot use the Easy Deployment commands to create a cross-subnet
configuration, as described under “Cross-Subnet Configurations” (page 41).
• If you have not already done so, set up ssh public/private key pairs on each node.
This will allow the Easy Deployment commands to operate on all the prospective
nodes before cluster is formed. If you need to set up the keys, you can use DSAU
to simplify the task; for an example, see “Creating cDSFs for a Group of Nodes”
(page 207).
• If you will be using a lock LUN (as opposed to an LVM lock disk or a quorum
server) set up the LUN partition before running cmdeploycl; see “Setting Up a
Lock LUN” (page 226).
• If you will be using individual disks (as opposed to RAID) for package data, and
you will be mirroring them with MirrorDisk/UX as HP recommends, make sure
the hardware is configured so as to allow PVG-strict mirroring.
For more information, see “Using Mirrored Individual Data Disks” (page 232) —
and it is good idea to read the entire section on “Creating a Storage Infrastructure
with LVM” (page 231) so that you understand what cmpreparestg is doing.
• Make sure you have read and understood the additional limitations spelled out
under “About Easy Deployment” (page 137); and it is a good idea to read the Easy
Deployment manpages (cmpreparecl (1m), cmdeploycl (1m), and
cmpreparestg (1m)) as well.
NOTE: For information about heartbeat and networking requirements, see the
sections listed under “Before You Start” (page 210).
If you omit steps 1 and 2, and all the prospective nodes are connected to at least
one subnet, cmdeploycl behaves as follows (the configuration is actually done
by cmquerycl (1m) which is called by cmdeploycl).
• If multiple subnets are configured among the nodes, cmdeploycl chooses a
subnet with standby interfaces as the heartbeat.
• If multiple subnets are configured, but no subnet has standby interfaces,
cmdeploycl chooses two subnets for the heartbeat.
• If only one subnet is configured, cmdeploycl configures that subnet as the
heartbeat.
CAUTION: If the subnet has no standby interfaces, the cluster will run, but
will not meet high-availability requirements for the heartbeat. You must
reconfigure the heartbeat as soon as possible; see “Changing the Cluster
Networking Configuration while the Cluster Is Running” (page 367).
3. If you have not already done so, create cluster-wide device special files (cDSFs).
This step is optional, but HP strongly recommends it. For instructions, see “Creating
Cluster-wide Device Special Files (cDSFs)” (page 206).
4. Create and start the cluster, configuring security and networking files, creating
and deploying shared storage for an LVM lock disk, and configuring a sample
package:
cmdeploycl –c <clustername> –n <node1> –n <node2> -N
<network_template_file> –b –L <vg>:<pv>
<clustername> must be the unique name for this cluster. <node1> and <node2>
must be the hostname portion, and only the hostname portion, of each node's
NOTE: The cluster does not yet have shared storage for packages.
Other forms of cmdeploycl allow you to create the cluster with a quorum server
or lock LUN instead of a lock disk; see the manpage for more information.
If you use a quorum server, the quorum server software must already be installed
and configured on the quorum server; see the latest version of the HP Serviceguard
Quorum Server Version A.04.00 Release Notes, at www.hp.com/go/
hpux-serviceguard-docs -> HP Serviceguard Quorum Server
Software.cmdeploycl requires an IPv4 address (or addresses) for the quorum
server; to configure IPv6 addresses, see “Specifying a Quorum Server” (page 248).
If shared storage already exists on both nodes, and you use cmdeploycl without
any -L option, cmdeploycl will create an LVM lock disk using the shared storage.
PVG bus1
/dev/disk/cdisk14
/dev/disk/cdisk15
NOTE: If you are not using cDSFs, you may need to change the DSF names
to make sure that the DSF names point to the same physical storage on each
node. If you are using cDSFs, the names are guaranteed to be the same.
NOTE: For more information and advice, see the white paper Securing Serviceguard
at www.hp.com/go/hpux-serviceguard-docs.
NOTE: When you upgrade a cluster from Version A.11.15 or earlier, entries in
$SGCONF/cmclnodelist are automatically updated to Access Control Policies in the
cluster configuration file. All non-root user-hostname pairs are assigned the role of
Monitor.
NOTE: You need to do this even if you plan to use cmpreparecl (1m)
orcmpdeploycl (1m), which calls cmpreparecl. For more information about these
commands, see “Using Easy Deployment Commands to Configure the Cluster”
(page 211).
About identd
HP strongly recommends that you use identd for user verification, so you should
make sure that each prospective cluster node is configured to run it. identd is usually
started by inetd from /etc/inetd.conf.
NOTE: If you plan to use cmpreparecl (1m) (or cmpdeploycl (1m), which calls
cmpreparecl), you can skip the rest of this subsection.
Make sure that a line such as the following is uncommented in /etc/inetd.conf:
auth stream tcp6 wait bin /usr/lbin/identd identd
NOTE: If the -T option to identd is available on your system, you should set it to
120 (-T120); this ensures that a connection inadvertently left open will be closed after
two minutes. In this case, the identd entry in /etc/inetd.conf should look like
this:
auth stream tcp6 wait bin /usr/lbin/identd identd -T120
Check the man page for identd to determine whether the -T option is supported for
your version of identd
(It is possible to disable identd, though HP recommends against doing so. If for some
reason you have to disable identd, see “Disabling identd” (page 276).)
For more information about identd, see the white paper Securing Serviceguard at
www.hp.com/go/hpux-serviceguard-docs, and the identd (1M) manpage.
Serviceguard nodes can communicate over any of the cluster’s shared networks, so the
network resolution service you are using (such as DNS, NIS, or LDAP) must be able
to resolve each of their primary addresses on each of those networks to the primary
hostname of the node in question.
In addition, HP recommends that you define name resolution in each node’s
/etc/hosts file, rather than rely solely on a service such as DNS. Configure the name
service switch to consult the /etc/hosts file before other services. See “Safeguarding
against Loss of Name Resolution Services” (page 220) for instructions.
NOTE: If you are using private IP addresses for communication within the cluster,
and these addresses are not known to DNS (or the name resolution service you use)
these addresses must be listed in /etc/hosts.
For requirements and restrictions that apply to IPv6–only clusters and mixed-mode
clusters, see “Rules and Restrictions for IPv6-Only Mode” (page 140) and “Rules and
Restrictions for Mixed Mode” (page 142), respectively, and the latest version of the
Serviceguard release notes.
Keep the following rules in mind when creating entries in a Serviceguard node's/etc/
hosts:
• NODE_NAME in the cluster configuration file must be identical to the hostname
which is normally the first element of a fully qualified domain name (a name
with four elements separated by periods). This hostname is what is normally
returned by the HP-UX hostname (1) command.
For example, if the node's fully-qualified domain name is gryf.uksr.hp.com,
the NODE_NAME must be gryf. For more information, see the NODE_NAME
entry under “Cluster Configuration Parameters ” (page 143).
• Each stationary and heartbeat IP address must map to an entry in which the
NODE_NAME is either:
— the official hostname, as defined by hosts (4), for example
or :
— the first (or only) alias. Examples:
15.145.162.131 gryf.uksr.hp.com gryf
10.8.0.131 gryf.uksr.hp.com gryf
10.8.1.131 gryf.uksr.hp.com gryf
Note that the alias is required whenever NODE_NAME is different from the official
hostname — even if the address maps to an entry in which the NODE_NAME is
the first element of the fully-qualified domain name (as in gryf.uksr.hp.com
and sly.uksr.hp.com).
If applications require the use of hostname aliases, the Serviceguard hostname must
be the first alias in all the entries for that host. For example, if the two-node cluster in
the previous example were configured to use the aliases alias-node1 and
alias-node2, then the entries in /etc/hosts should look something like this:
15.145.162.131 gryf.uksr.hp.com gryf alias-node1
10.8.0.131 gryf2.uksr.hp.com gryf alias-node1
10.8.1.131 gryf3.uksr.hp.com gryf alias-node1
NOTE: If such a hang or error occurs, Serviceguard and all protected applications
will continue working even though the command you issued does not. That is, only
the Serviceguard configuration commands (and corresponding Serviceguard Manager
functions) are affected, not the cluster daemon or package services.
The procedure that follows shows how to create a robust name-resolution configuration
that will allow cluster nodes to continue communicating with one another if a name
service fails. If a standby LAN is configured, this approach also allows the cluster to
continue to function fully (including commands such as cmrunnode and cmruncl)
after the primary LAN has failed.
NOTE: If a NIC fails, the affected node will be able to fail over to a standby LAN so
long as the node is running in the cluster. But if a NIC that is used by Serviceguard
fails when the affected node is not running in the cluster, Serviceguard will not be able
to restart the node. (For instructions on replacing a failed NIC, see “Replacing LAN or
Fibre Channel Cards” (page 407).)
NOTE: If you plan to use cmpreparecl (1m) (orcmpdeploycl (1m), which calls
cmpreparecl), the /etc/hosts and /etc/nsswitch.confconfiguration described
the procedure that follows will be done automatically, but you should still read the
entire subsection and make sure you understand the issues.
1. Edit the /etc/hosts file on all nodes in the cluster. Add name resolution for all
heartbeat IP addresses, and other IP addresses from all the cluster nodes; see
“Configuring Name Resolution” (page 218) for discussion and examples.
NOTE: For each cluster node, the public-network IP address must be the first
address listed. This enables other applications to talk to other nodes on public
networks.
2. If you are using DNS, make sure your name servers are configured in /etc/
resolv.conf, for example:
domain cup.hp.com
search cup.hp.com hp.com
nameserver 15.243.128.51
nameserver 15.243.160.51
If a line beginning with the string hosts: or ipnodes: already exists, then make
sure that the text immediately to the right of this string is (on one line):
files [NOTFOUND=continue UNAVAIL=continue] dns [NOTFOUND=return UNAVAIL=return]
or
files [NOTFOUND=continue UNAVAIL=continue] nis [NOTFOUND=return UNAVAIL=return]
This step is critical, allowing the cluster nodes to resolve hostnames to IP addresses
while DNS, NIS, or the primary LAN is down.
4. Create a $SGCONF/cmclnodelist file on all nodes that you intend to configure
into the cluster, and allow access by all cluster nodes. See “Allowing Root Access
to an Unconfigured Node” (page 216).
NOTE: HP recommends that you also make the name service itself highly available,
either by using multiple name servers or by configuring the name service into a
Serviceguard package.
CAUTION: HP supports enabling this parameter only if you are not using a
cross-subnet configuration (page 41). Otherwise, leave the parameter at its default
setting (zero, meaning disabled).
Enabling this parameter allows you to configure a default gateway for each physical
IPv4 interface. These gateways allow a system to send an unbound packet through
the interface for the address to which the socket (or communication endpoint) is
bound. If the socket (or communication endpoint) is not bound to a specific address,
the system sends the packet through the interface on which the unbound packet
was received.
This means that the packet source addresses (and therefore the interfaces on a
multihomed host) affect the selection of a gateway for outbound packets once
ip_strong_es_model is enabled. For more information see “Using a Relocatable
Address as the Source Address for an Application that is Bound to INADDR_ANY”
(page 435).
IMPORTANT: This must be done, as described below, whether or not you intend to
use cmpreparestg (1m) to configure storage. See “Using Easy Deployment
Commands to Configure the Cluster” (page 211) for more information about
cmpreparestg.
NOTE: Under agile addressing, the physical devices in these examples would have
names such as /dev/[r]disk/disk1, and /dev/[r]disk/disk2. See “About
Device File Names (Device Special Files)” (page 106).
1. Create a bootable LVM disk to be used for the mirror.
pvcreate -B /dev/rdsk/c4t6d0
NOTE: The boot, root, and swap logical volumes must be done in exactly the
following order to ensure that the boot volume occupies the first contiguous set
of extents on the new disk, followed by the swap and the root.
The following is an example of mirroring the boot logical volume:
lvextend -m 1 /dev/vg00/lvol1 /dev/dsk/c4t6d0
The following is an example of mirroring the primary swap logical volume:
lvextend -m 1 /dev/vg00/lvol2 /dev/dsk/c4t6d0
The following is an example of mirroring the root logical volume:
lvextend -m 1 /dev/vg00/lvol3 /dev/dsk/c4t6d0
5. Update the boot information contained in the BDRA for the mirror copies of boot,
root and primary swap.
/usr/sbin/lvlnboot -b /dev/vg00/lvol1
/usr/sbin/lvlnboot -s /dev/vg00/lvol2
/usr/sbin/lvlnboot -r /dev/vg00/lvol3
6. Verify that the mirrors were properly created.
lvlnboot -v
The output of this command is shown in a display like the following:
Boot Definitions for Volume Group /dev/vg00:
Physical Volumes belonging in Root Volume Group:
/dev/dsk/c4t5d0 (10/0.5.0) -- Boot Disk
/dev/dsk/c4t6d0 (10/0.6.0) -- Boot Disk
Boot: lvol1 on: /dev/dsk/c4t5d0
/dev/dsk/c4t6d0
Root: lvol3 on: /dev/dsk/c4t5d0
/dev/dsk/c4t6d0
Swap: lvol2 on: /dev/dsk/c4t5d0
/dev/dsk/c4t6d0
Dump: lvol2 on: /dev/dsk/c4t6d0, 0
NOTE: You must use the vgcfgbackup and vgcfgrestore commands to back up
and restore the lock volume group configuration data regardless of how you create the
lock volume group.
CAUTION: Before you start, make sure the disk or LUN that is to be partitioned has
no data on it that you need. idisk will destroy any existing data.
1. Use a text editor to create a file that contains the partition information. You need
to create at least three partitions, for example:
3
EFI 100MBHPUX 1MB
HPUX 100%
This defines:
• A 100 MB EFI (Extensible Firmware Interface) partition (this is required)
• A 1 MB partition that can be used for the lock LUN
• A third partition that consumes the remainder of the disk is and can be used
for whatever purpose you like.
2. Save the file; for example you might call it partition.txt.
NOTE: Device files for partitions cannot be cluster-wide DSFs (cDSFs). For more
information about cDSFs, see “About Cluster-wide Device Special Files (cDSFs)”
(page 135).
This will create three device files, for example
/dev/dsk/c1t4d0s1, /dev/dsk/c1t4d0s2, and /dev/dsk/c1t4d0s3
or:
/dev/disk/disk12_p1, /dev/disk/disk12_p2, and /dev/disk/disk12_p3
CAUTION: Once you have specified the lock LUN in the cluster configuration file,
running cmapplyconf (1m) or cmdeploycl (1m) will destroy any data on the
LUN.
Creating the Storage Infrastructure and Filesystems with LVM, VxVM and CVM
In addition to configuring the cluster, you create the appropriate logical volume
infrastructure to provide access to data from different nodes. This is done several ways:
• for Logical Volume Manager, see “Creating a Storage Infrastructure with LVM”
(page 231).
Do this before you configure the cluster if you use a lock disk; otherwise it can be
done before or after.
• for Veritas Volume Manager, see“Creating a Storage Infrastructure with VxVM”
(page 239)
Do this before you configure the cluster if you use a lock disk; otherwise it can be
done before or after.
• for Veritas Cluster File System with CVM, see “Creating a Storage Infrastructure
with Veritas Cluster File System (CFS)” (page 261)
Do this after you configure the cluster.
• for Veritas Cluster Volume Manager, see “Creating the Storage Infrastructure with
Veritas Cluster Volume Manager (CVM)” (page 268)
Do this after you configure the cluster.
You can also use a mixture of volume types, depending on your needs.
NOTE: The procedures that follow describe the command-line method of configuring
LVM storage. There are two other, more automated methods you can use.
• System Management Homepage
You can use the System Management Homepage to create or extend volume groups
and create logical volumes. From the System Management Homepage, choose
Disks and File Systems. Make sure you create mirrored logical volumes
with PVG-strict allocation; see“Using Mirrored Individual Data Disks” (page 232).
When you have created the logical volumes and created or extended the volume
groups, specify the filesystem that is to be mounted on the volume group, then
proceed with“Distributing Volume Groups to Other Nodes” (page 235).
• cmpreparestg
You can use cmpreparestg (1m) to accomplish the tasks described under
“Creating Volume Groups” (page 232). See “Using Easy Deployment Commands
to Configure the Cluster” (page 211) for more information. If you use
cmpreparestg, you do not need to perform the procedures that follow, but it is
a good idea to read them so that you understand what cmpreparestg does for
you. Then proceed to “Making Physical Volume Group Files Consistent” (page 238).
If you have already done LVM configuration, skip ahead to “Configuring the Cluster
” (page 242).
NOTE: You can create volume groups by means of the cmpreparestg (1m)
command. See “Using Easy Deployment Commands to Configure the Cluster” (page 211)
for more information. If you use cmpreparestg, you can skip this step and proceed
to “Making Physical Volume Group Files Consistent” (page 238).
Obtain a list of the disks on both nodes and identify which device files are used for the
same disk on both. Use the following command on each node to list available disks as
they are known to each system:
lssf /dev/d*/*
In the following examples, we use /dev/rdsk/c1t2d0 and /dev/rdsk/c0t2d0,
which happen to be the device names for the same disks on both ftsys9 and ftsys10.
In the event that the device file names are different on the different nodes, make a
careful note of the correspondences.
On the configuration node (ftsys9), use the pvcreate(1m) command to define disks
as physical volumes. This only needs to be done on the configuration node. Use the
following commands to create two physical volumes for the sample configuration:
pvcreate -f /dev/rdsk/c1t2d0
pvcreate -f /dev/rdsk/c0t2d0
Use the following procedure to build a volume group on the configuration node
(ftsys9). Later, you will create the same volume group on other nodes; see
“Distributing Volume Groups to Other Nodes” (page 235).
NOTE: If you are using the March 2008 version or later of HP-UX 11i v3, you can skip
steps 1 and 2; vgcreate (1m) will create the device file for you.
1. Create the group directory; for example, vgdatabase:
mkdir /dev/vgdatabase
2. Create a control file named group in the directory /dev/vgdatabase, as follows:
mknod /dev/vgdatabase/group c 64 0xhh0000
The major number is always 64, and the hexadecimal minor number has the form
0xhh0000
where hh must be unique to the volume group you are creating. Use a unique
minor number that is available across all the nodes for the mknod command above.
(This will avoid further reconfiguration later, when NFS-mounted logical volumes
are created in the volume group.)
Use the following command to display a list of existing volume groups:
ls -l /dev/*/group
3. Create the volume group and add physical volumes to it with the following
commands:
NOTE: If you are using cDSFs, you should be using them exclusively.
The first command creates the volume group and adds a physical volume to it in
a physical volume group called bus0. The second command adds the second drive
to the volume group, locating it in a different physical volume group named bus1.
The use of physical volume groups allows the use of PVG-strict mirroring of disks.
4. Repeat steps 1–3 for additional volume groups.
NOTE: You can create logical volumes by means of the cmpreparestg (1m)
command. See “Using Easy Deployment Commands to Configure the Cluster” (page 211)
for more information. If you use cmpreparestg, you can skip this step and proceed
to “Making Physical Volume Group Files Consistent” (page 238).
Use a command such as the following to create a logical volume (the example is for
/dev/vgdatabase).
lvcreate -L 120 -m 1 -s g /dev/vgdatabase
This command creates a 120 MB mirrored volume named lvol1. The name is supplied
by default, since no name is specified in the command. The -s g option means that
mirroring is PVG-strict;, that is, the mirror copy of any given piece of data will be in a
different physical volume group from the original.
NOTE: If you are using disk arrays in RAID 1 or RAID 5 mode, omit the -m 1 and
-s g options.
NOTE: You can create filesystems by means of the cmpreparestg (1m) command.
See “Using Easy Deployment Commands to Configure the Cluster” (page 211) for more
information. If you use cmpreparestg, you can skip the procedure that follows, and
proceed to “Making Physical Volume Group Files Consistent” (page 238).
Use the following commands to create a filesystem for mounting on the logical volume
just created.
1. Create the filesystem on the newly created logical volume:
newfs -F vxfs /dev/vgdatabase/rlvol1
Note the use of the raw device file for the logical volume.
2. Create a directory to mount the disk:
mkdir /mnt1
3. Mount the disk to verify your work:
mount /dev/vgdatabase/lvol1 /mnt1
Note the mount command uses the block device file for the logical volume.
4. Verify the configuration:
vgdisplay -v /dev/vgdatabase
NOTE: If you plan to use cmpreparestg, you can skip this step and proceed to
“Making Physical Volume Group Files Consistent” (page 238).
At the time you create the volume group, it is active on the configuration node (ftsys9,
for example). The next step is to unmount the file system and deactivate the volume
group; for example, on ftsys9:
umount /mnt1
vgchange -a n /dev/vgdatabase
NOTE: Do this during this setup process only, so that activation and mounting can
be done by the package control script at run time. You do not need to deactivate and
unmount a volume simply in order to create a map file (as in step 1 of the procedure
that follows).
NOTE: If you use cmpreparestg, you can skip the procedure that follows and
proceed to “Making Physical Volume Group Files Consistent” (page 238).
Use the following commands to set up the same volume group on another cluster node.
In this example, the commands set up a new volume group on ftsys10 which will
hold the same physical volume that was available on ftsys9. You must carry out the
same procedure separately for each node on which the volume group's package can
run.
To set up the volume group on ftsys10, use the following steps:
1. On ftsys9, copy the mapping of the volume group to a specified file.
vgexport -p -s -m /tmp/vgdatabase.map /dev/vgdatabase
NOTE: When you use PVG-strict mirroring, the physical volume group
configuration is recorded in the /etc/lvmpvg file on the configuration node. This
file defines the physical volume groups which are the basis of mirroring and
indicate which physical volumes belong to each physical volume group. Note that
on each cluster node, the /etc/lvmpvg file must contain the correct physical
volume names for the physical volume groups’s disks as they are known on that
node. Physical volume names for the same disks could be different on different
nodes. After distributing volume groups to other nodes, make sure each node’s
/etc/lvmpvg file correctly reflects the contents of all physical volume groups on
that node. See “Making Physical Volume Group Files Consistent” (page 238).
7. Make sure that you have deactivated the volume group on ftsys9. Then enable
the volume group on ftsys10:
vgchange -a y /dev/vgdatabase
8. Create a directory to mount the disk:
mkdir /mnt1
9. Mount and verify the volume group on ftsys10:
NOTE: The specific commands for creating mirrored and multi-path storage using
VxVM are described in the Veritas Volume Manager Reference Guide.
NOTE: Unlike LVM volume groups, VxVM disk groups are not entered in the cluster
configuration file, nor in the package configuration file.
Use the cmquerycl (1m) command to specify a set of nodes to be included in the
cluster and to generate a template for the cluster configuration file.
IMPORTANT: See the entry for NODE_NAME under “Cluster Configuration Parameters
” (page 143) for important information about restrictions on the node name.
Here is an example of the command (enter it all one line):
cmquerycl -v -C $SGCONF/clust1.conf -n ftsys9 -n ftsys10
This creates a template file, by default /etc/cmcluster/clust1.conf. In this output
file, keywords are separated from definitions by white space. Comments are permitted,
and must be preceded by a pound sign (#) in the far left column.
NOTE: HP strongly recommends that you modify the file so as to send the heartbeat
over all possible networks. See also HEARTBEAT_IP under “Cluster Configuration
Parameters ” (page 143), and “Specifying the Address Family for the Heartbeat ”
(page 244).
The cmquerycl(1m) manpage further explains the calling parameters as well as those
that appear in the template file. See also “Cluster Configuration Parameters ” (page 143).
Modify your /etc/cmcluster/clust1.conf file as needed.
cmquerycl Options
Speeding up the Process
In a larger or more complex cluster with many nodes, networks or disks, the cmquerycl
command may take several minutes to complete. To speed up the configuration process,
you can direct the command to return selected information only by using the -k and
-w options:
-k eliminates some disk probing, and does not return information about potential
cluster lock volume groups and lock physical volumes.
-w local lets you specify local network probing, in which LAN connectivity is verified
between interfaces within each node only. This is the default when you use cmquerycl
with the-C option.
NOTE: You can specify only one lock disk on the command line; if you need to specify
a second cluster lock disk, you must do so in the cluster configuration file.
For more information, see “Specifying a Lock Disk” (page 246), “Specifying a Lock
LUN” (page 247), and “Specifying a Quorum Server” (page 248).
IMPORTANT:
• You cannot use cmapplyconf -N if the cluster already exists; in that case, follow
instructions under “Changing the Cluster Networking Configuration while the
Cluster Is Running” (page 367).
• You can only add information to the output file (mynetwork in this example); do
not change the information already in the file.
For more information, see the cmquerycl (1m) and cmapplyconf (1m) manpages.
NOTE: This option must be used to discover actual or potential nodes and subnets
in a cross-subnet configuration. See “Obtaining Cross-Subnet Information” (page 248).
It will also validate IP Monitor polling targets; see “Monitoring LAN Interfaces and
Detecting Failure: IP Level” (page 98), and POLLING_TARGET under “Cluster
Configuration Parameters ” (page 143).
NOTE: If you are using cDSFs, the device file would be in the /dev/rdisk/ directory;
for example /dev/rdisk/disk100. See “About Cluster-wide Device Special Files
(cDSFs)” (page 135).
See also “Choosing Cluster Lock Disks” (page 225).
IMPORTANT: The following are standard instructions. For special instructions that
may apply to your version of Serviceguard and the Quorum Server see “Configuring
Serviceguard to Use the Quorum Server” in the latest version of the HP Serviceguard
Quorum Server Version A.04.00 Release Notes, at www.hp.com/go/
hpux-serviceguard-docs -> HP Quorum Server Software.
A cluster lock LUN or Quorum Server is required for two-node clusters. To obtain a
cluster configuration file that includes Quorum Server parameters, use the -q option
of the cmquerycl command, specifying a Quorum Server hostname or IP address ,
for example (all on one line):
cmquerycl -q <QS_Host> -n ftsys9 -n ftsys10 -C <ClusterName>.conf
To specify an alternate hostname or IP address by which the Quorum Server can be
reached, use a command such as (all on one line):
cmquerycl -q <QS_Host> <QS_Addr> -n ftsys9 -n ftsys10 -C
<ClusterName>.conf
Enter the QS_HOST (IPv4 or IPv6), optional QS_ADDR (IPv4 or IPv6),
QS_POLLING_INTERVAL, and optionally a QS_TIMEOUT_EXTENSION; and also
check the HOSTNAME_ADDRESS_FAMILY setting, which defaults to IPv4. See the
parameter descriptions under “Cluster Configuration Parameters ” (page 143) for more
information.
IMPORTANT: For important information, see also “About Hostname Address Families:
IPv4-Only, IPv6-Only, and Mixed Mode” (page 139); and “What Happens when You
Change the Quorum Configuration Online” (page 66)
1 lan3 (nodeA)
lan4 (nodeA)
lan3 (nodeB)
lan4 (nodeB)
2 lan1 (nodeA)
lan1 (nodeB)
3 lan2 (nodeA)
lan2 (nodeB)
4 lan3 (nodeC)
lan4 (nodeC)
lan3 (nodeD)
lan4 (nodeD)
5 lan1 (nodeC)
lan1 (nodeD)
6 lan2 (nodeC)
lan2 (nodeD)
IP subnets:
IPv4:
IPv6:
1 15.13.164.0
15.13.172.0
2 15.13.165.0
15.13.182.0
3 15.244.65.0
4 15.244.56.0
In the Route connectivity section, the numbers on the left (1-4) identify which
subnets are routed to each other (for example 15.13.164.0 and 15.13.172.0).
IMPORTANT: Note that in this example subnet 15.244.65.0, used by NodeA and
NodeB, is not routed to 15.244.56.0, used by NodeC and NodeD.
But subnets 15.13.164.0 and 15.13.165.0, used by NodeA and NodeB, are routed
respectively to subnets 15.13.172.0 and15.13.182.0, used by NodeC and NodeD.
At least one such routing among all the nodes must exist for cmquerycl to succeed.
NOTE: Remember to tune HP-UX kernel parameters on each node to ensure that they
are set high enough for the largest number of packages that will ever run concurrently
on that node.
Levels of Access
Serviceguard recognizes two levels of access, root and non-root:
• Root access: Full capabilities; only role allowed to configure the cluster.
As Figure 5-1 shows, users with root access have complete control over the
configuration of the cluster and its packages. This is the only role allowed to use
the cmcheckconf, cmapplyconf, cmdeleteconf, and cmmodnet -a
commands.
In order to exercise this Serviceguard role, you must log in as the HP-UX root user
(superuser) on a node in the cluster you want to administer. Conversely, the HP-UX
root user on any node in the cluster always has full Serviceguard root access
privileges for that cluster; no additional Serviceguard configuration is needed to
grant these privileges.
IMPORTANT: A remote user (one who is not logged in to a node in the cluster,
and is not connecting via rsh or ssh) can have only Monitor access to the cluster.
(Full Admin and Package Admin can be configured for such a user, but this usage
is deprecated and in a future release may cause cmapplyconf and cmcheckconf
to fail. As of Serviceguard A.11.18 configuring Full Admin or Package Admin for
remote users gives them Monitor capabilities. See “Setting up Access-Control
Policies” (page 254) for more information.)
NOTE: Once nodes are configured into a cluster, the access-control policies you set
in the cluster and package configuration files govern cluster-wide security; changes to
the “bootstrap” cmclnodelist file are ignored (see “Allowing Root Access to an
Unconfigured Node” (page 216)).
Access control policies are defined by three parameters in the configuration file:
• Each USER_NAME can consist either of the literal ANY_USER, or a maximum of
8 login names from the /etc/passwd file on USER_HOST. The names must be
separated by spaces or tabs, for example:
# Policy 1:
USER_NAME john fred patrick
USER_HOST bit
USER_ROLE PACKAGE_ADMIN
• USER_HOST is the node where USER_NAME will issue Serviceguard commands.
NOTE: The commands must be issued onUSER_HOST but can take effect on
other nodes; for example patrick can use bit’s command line to start a package
on gryf.
Choose one of these three values for USER_HOST:
— ANY_SERVICEGUARD_NODE - any node on which Serviceguard is configured,
and which is on a subnet with which nodes in this cluster can communicate
(as reported bycmquerycl -w full).
NOTE: You do not have to halt the cluster or package to configure or modify access
control policies.
Here is an example of an access control policy:
USER_NAME john
USER_HOST bit
USER_ROLE PACKAGE_ADMIN
If this policy is defined in the cluster configuration file, it grants user john the
PACKAGE_ADMIN role for any package on node bit. User john also has the MONITOR
role for the entire cluster, because PACKAGE_ADMIN includes MONITOR. If the policy
is defined in the package configuration file for PackageA, then user john on node
bit has the PACKAGE_ADMIN role only for PackageA.
Role Conflicts
Do not configure different roles for the same user and host; Serviceguard treats this as
a conflict and will fail with an error when applying the configuration. “Wildcards”,
such as ANY_USER and ANY_SERVICEGUARD_NODE, are an exception: it is acceptable
for ANY_USER and john to be given different roles.
IMPORTANT: Wildcards do not degrade higher-level roles that have been granted to
individual members of the class specified by the wildcard. For example, you might set
up the following policy to allow root users on remote systems access to the cluster:
USER_NAME root
USER_HOST ANY_SERVICEGUARD_NODE
USER_ROLE MONITOR
This does not reduce the access level of users who are logged in as root on nodes in this
cluster; they will always have full Serviceguard root-access capabilities.
Consider what would happen if these entries were in the cluster configuration file:
# Policy 1:
USER_NAME john
USER_HOST bit
USER_ROLE PACKAGE_ADMIN
# Policy 2:
USER_NAME john
USER_HOST bit
USER_ROLE MONITOR
# Policy 3:
USER_NAME ANY_USER
USER_HOST ANY_SERVICEGUARD_NODE
USER_ROLE MONITOR
In the above example, the configuration would fail because user john is assigned two
roles. (In any case, Policy 2 is unnecessary, because PACKAGE_ADMIN includes the role
of MONITOR.)
Policy 3 does not conflict with any other policies, even though the wildcard ANY_USER
includes the individual user john.
NOTE: If you are using CVM disk groups, they should be configured after cluster
configuration is done, using the procedures described in “Creating the Storage
Infrastructure with Veritas Cluster Volume Manager (CVM)” (page 268). Add CVM
disk groups to the package configuration file; see cvm_dg (page 305).
NOTE: Using the -k option means that cmcheckconf only checks disk connectivity
to the LVM disks that are identified in the ASCII file. Omitting the -k option (the default
behavior) means that cmcheckconf tests the connectivity of all LVM disks on all
nodes. Using -k can result in significantly faster operation of the command.
For more information, see the manpage for cmcheckconf (1m) and “Checking Cluster
Components” (page 337).
NOTE: Using the -k option means that cmapplyconf only checks disk
connectivity to the LVM disks that are identified in the ASCII file. Omitting the
-k option (the default behavior) means that cmapplyconf tests the connectivity
of all LVM disks on all nodes. Using -k can result in significantly faster operation
of the command.
The cmapplyconf command creates a binary version of the cluster configuration file
and distributes it to all nodes in the cluster. This action ensures that the contents of the
file are consistent across all nodes. Note that the cmapplyconf command does not
distribute the ASCII configuration file.
NOTE: The apply will not complete unless the cluster lock volume group is activated
on exactly one node before applying. There is one exception to this rule: a cluster lock
had been previously configured on the same physical volume and volume group.
After the configuration is applied, the cluster lock volume group must be deactivated.
NOTE: The next section, “Creating the Storage Infrastructure with Veritas Cluster
Volume Manager (CVM)” (page 268), explains how to configure Veritas Cluster Volume
Manager (CVM) disk groups without CFS; that is, for raw access only. Both solutions
use many of the same commands, but in a slightly different order.
Refer to the Serviceguard man pages for more information about the commands
cfscluster, cfsdgadm, cfsmntadm, cfsmount, cfsumount, and cmgetpkgenv.
Information is also in the documentation for HP Serviceguard Storage Management
Suite posted at www.hp.com/go/hpux-serviceguard-docs.
IMPORTANT: Before you proceed, make sure you have read “Planning Veritas Cluster
Volume Manager (CVM) and Cluster File System (CFS)” (page 170), which contains
important information and cautions.
NOTE: Do not edit the configuration file SG-CFS-pkg.conf. Create and modify
configuration using the cfs administration commands.
1. First, make sure the cluster is running:
cmviewcl
2. If it is not, start it:
cmruncl
3. If you have not initialized your disk groups, or if you have an old install that needs
to be re-initialized, use the vxinstall command to initialize VxVM/CVM disk
groups. See “Initializing the Veritas Volume Manager ” (page 269).
4. Activate the SG-CFS-pkg and start up CVM with the cfscluster command;
this creates SG-CFS-pkg, and also starts it.
This example, for the cluster file system, uses a timeout of 900 seconds; if your
CFS cluster has many disk groups and/or disk LUNs visible to the cluster nodes,
you may need to a longer timeout value. Use the -s option to start the CVM
package in shared mode:
cfscluster config -t 900 -s
5. Verify the system multi-node package is running and CVM is up, using the
cmviewcl or cfscluster command. Following is an example of using the
cfscluster command. In the last line, you can see that CVM is up, and that the
mount point is not yet configured:
cfscluster status
Node : ftsys9
Cluster Manager : up
CVM state : up (MASTER)
MOUNT POINT TYPE SHARED VOLUME DISK GROUP STATUS
Node : ftsys10
Cluster Manager : up
CVM state : up
MOUNT POINT TYPE SHARED VOLUME DISK GROUP STATUS
Creating Volumes
1. Make log_files volume on the logdata disk group:
vxassist -g logdata make log_files 1024m
2. Use the vxprint command to verify:
vxprint log_files
disk group: logdata
TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0
v log_files fsgen ENABLED 1048576 - ACTIVE - -
pl log_files-01 fsgen ENABLED 1048576 - ACTIVE - -
sd ct4t0d6-01 fsgen ENABLED 1048576 - ACTIVE - -
MULTI_NODE_PACKAGES
PACKAGE STATUS STATE AUTO_RUN SYSTEM
SG-CFS-pkg up running enabled yes
SG-CFS-DG-1 up running enabled no
SG-CFS-MP-1 up running enabled no
ftsys9/etc/cmcluster/cfs> bdf
Filesystem kbytes used avail %used Mounted on
/dev/vx/dsk/logdata/log_files 10485 17338 966793 2% tmp/logdata/log_files
ftsys10/etc/cmcluster/cfs> bdf
Filesystem kbytes used avail %used Mounted on
/dev/vx/dsk/logdata/log_files 10485 17338 966793 2% tmp/logdata/log_files
6. To view the package name that is monitoring a mount point, use the cfsmntadm
show_package command:
cfsmntadm show_package /tmp/logdata/log_files
SG-CFS-MP-1
7. After creating the mount point packages for the cluster file system, configure your
application package to depend on the mount points. In the configuration file,
configure a dependency, setting dependency_condition to SG-CFS-MP-pkg-# =UP
and dependency_location to SAME_NODE. For more information about these
parameters, see “Package Parameter Explanations” (page 287).
MULTI_NODE_PACKAGES
bdf
Filesystem kbytes used avail %used Mounted on
/dev/vg00/lvol3 544768 352240 180540 66% /
/dev/vg00/lvol1 307157 80196 196245 29% /stand
/dev/vg00/lvol5 1101824 678124 398216 63% /var
/dev/vg00/lvol7 2621440 1702848 861206 66% /usr
/dev/vg00/lvol4 4096 707 3235 18% /tmp
/dev/vg00/lvol6 2367488 1718101 608857 74% /opt
/dev/vghome/varopt 4194304 258655 3689698 7% /var/opt
/dev/vghome/home 2097152 17167 1949993 1% /home
/tmp/logdata/log_files
102400 1898 94228 2% /tmp/logdata/log_files
/tmp/logdata/log_files:check2
102400 1898 94228 2% /tmp/check_logfiles
Creating the Storage Infrastructure with Veritas Cluster Volume Manager (CVM)
Before you start, make sure the directory in which VxVM commands are stored, /usr/
lib/vxvm/bin, is in your path. Once you have created the root disk group with
vxinstall, you can use VxVM commands or the Veritas Storage Administrator GUI,
VEA, to carry out configuration tasks. Instructions for running vxinstall are in the
Veritas Installation Guide for your version. For more information, refer to the Veritas
Volume Manager Administrator’s Guide for your version.
You need to do the tasks described in the following sections:
• “Initializing the Veritas Volume Manager ” (page 269)
• “Preparing the Cluster for Use with CVM ” (page 269)
• “Identifying the Master Node” (page 270)
• “Initializing Disks for CVM” (page 270)
• “Creating Disk Groups” (page 271)
• “Creating Volumes ” (page 271)
• “Adding Disk Groups to the Package Configuration ” (page 272)
For more information, including details about configuration of plexes (mirrors),
multipathing, and RAID, refer to the HP-UX documentation for the Veritas Volume
Manager. See the documents for HP Serviceguard Storage Management Suite posted
at www.hp.com/go/hpux-serviceguard-docs.
IMPORTANT: Do this only if you are using CVM without CFS. If you are using CFS,
you set up CVM as part of CFS; see “Creating a Storage Infrastructure with Veritas
Cluster File System (CFS)” (page 261).
NOTE: The specific commands for creating mirrored and multipath storage using
CVM are described in the HP-UX documentation for the Veritas Volume Manager,
posted at www.hp.com/go/hpux-core-docs.
Creating Volumes
Use the vxassist command to create volumes, as in the following example:
vxassist -g logdata make log_files 1024m
NOTE: Special considerations apply in the case of the root volume group:
• If the root volume group is mirrored using MirrorDisk/UX, include it in the
custom_vg_activation function so that any stale extents in the mirror will be
re-synchronized.
• Otherwise, the root volume group does not need to be included in the
custom_vg_activation function, because it is automatically activated before
the/etc/lvmrc file is used at boot time.
NOTE: The /sbin/init.d/cmcluster file may call files that Serviceguard stores
in /etc/cmcluster/rc. This directory is for Serviceguard use only! Do not move,
delete, modify, or add files in this directory.
Single-Node Operation
Single-node operation occurs in a single-node cluster or in a multi-node cluster,
following a situation where all but one node has failed, or where you have shut down
all but one node, which will probably have applications running. As long as the
Serviceguard daemon cmcld is active, other nodes can rejoin the cluster at a later time.
If the Serviceguard daemon fails when in single-node operation, it will leave the single
node up and your applications running. This is different from the loss of the
Serviceguard daemon in a multi-node cluster, which halts the node with a system reset,
and causes packages to be switched to adoptive nodes.
It is not necessary to halt the single node in this scenario, since the application is still
running, and no other node is currently available for package switching.
However, you should not try to restart Serviceguard, since data corruption might occur
if the node were to attempt to start up a new instance of the application that is still
running on the node. Instead of restarting the cluster, choose an appropriate time to
shutdown and reboot the node, which will allow the applications to shut down and
then permit Serviceguard to restart the cluster after rebooting.
Disabling identd
Ignore this section unless you have a particular need to disable identd.
You can configure Serviceguard not to use identd.
3. Restart inetd:
/etc/init.d/inetd restart
NOTE: The cmdeleteconf command removes only the cluster binary file
/etc/cmcluster/cmclconfig. It does not remove any other files from the
/etc/cmcluster directory.
Although the cluster must be halted, all nodes in the cluster should be powered up
and accessible before you use the cmdeleteconf command. If a node is powered
down, power it up and boot. If a node is inaccessible, you will see a list of inaccessible
nodes together with the following message:
It is recommended that you do not proceed with the
configuration operation unless you are sure these nodes are
permanently unavailable.Do you want to continue?
Reply Yes to remove the configuration. Later, if the inaccessible node becomes available,
you should run the cmdeleteconf command on that node to remove the configuration
file.
279
NOTE: This is a new process for configuring packages, as of Serviceguard A.11.18.
This manual refers to packages created by this method as modular packages, and
assumes that you will use it to create new packages; it is simpler and more efficient
than the older method, allowing you to build packages from smaller modules, and
eliminating the separate package control script and the need to distribute it manually.
Packages created using Serviceguard A.11.17 or earlier are referred to as legacy
packages. If you need to reconfigure a legacy package (rather than create a new package),
see “Configuring a Legacy Package” (page 375).
It is also still possible to create new legacy packages by the method described in
“Configuring a Legacy Package”. If you are using a Serviceguard Toolkit such as
Serviceguard NFS Toolkit, consult the documentation for that product.
If you decide to convert a legacy package to a modular package, see “Migrating a
Legacy Package to a Modular Package” (page 386). Do not attempt to convert
Serviceguard Toolkit packages.
(Parameters that are in the package control script for legacy packages, but in the package
configuration file instead for modular packages, are indicated by (S) in the tables later
in this section (page 284).)
IMPORTANT: Before you start, you need to do the package-planning tasks described
under “Package Configuration Planning ” (page 168).
To choose the right package modules, you need to decide the following things about
the package you are creating:
• What type of package it is; see “Types of Package: Failover, Multi-Node, System
Multi-Node” (page 280).
• Which parameters need to be specified for the package (beyond those included in
the base type, which is normally failover, multi-node, or system-multi-node).
See “Package Modules and Parameters” (page 283).
When you have made these decisions, you are ready to generate the package
configuration file; see “Generating the Package Configuration File” (page 311).
IMPORTANT: But if the package uses volume groups, they must be activated in
shared mode: vgchange -a s, which is available only if the SGeRAC add-on
product is installed.
NOTE: On systems that support CFS, you configure the CFS multi-node packages
by means of thecfsdgadm and cfsmntadm commands, not by editing a package
configuration file; see “Creating a Storage Infrastructure with Veritas Cluster File
System (CFS)” (page 261).
Note that relocatable IP addresses cannot be assigned to multi-node packages. See
also “Differences between Failover and Multi-Node Packages” (page 282).
To generate a package configuration file that creates a multi-node package,
include-m sg/multi_node on the cmmakepkg command line. See “Generating
the Package Configuration File” (page 311).
• System multi-node packages. These packages run simultaneously on every node
in the cluster. They cannot be started and halted on individual nodes.
Both node_fail_fast_enabled (page 290) and auto_run (page 289) must be set to yes
for this type of package. All services must have service_fail_fast_enabled (page 301)
set to yes.
System multi-node packages are supported only for applications supplied by HP,
for example Veritas Cluster File System (CFS).
For more information about types of packages and how they work, see “How the
Package Manager Works” (page 67). For information on planning a package, see
“Package Configuration Planning ” (page 168).
When you have decided on the type of package you want to create, the next step is to
decide what additional package-configuration modules you need to include; see
“Package Modules and Parameters” (page 283).
NOTE: If you are going to create a complex package that contains many modules,
you may want to skip the process of selecting modules, and simply create a configuration
file that contains all the modules:
cmmakepkg -m sg/all $SGCONF/sg-all
(The output will be written to $SGCONF/sg-all.)
multi_node_all all parameters that can be used by a Use if you are creating a
multi-node package; includes multi_node, multi-node package that
dependency, monitor_subnet, service, requires most or all of the
resource, volume_group, filesystem, optional parameters that are
pev, external_pre, external, and acp available for this type of
modules. package.
NOTE: The default form for parameter names in the modular package configuration
file is lower case; for legacy packages the default is upper case. There are no
compatibility issues; Serviceguard is case-insensitive as far as the parameter names are
concerned. This manual uses lower case, unless the parameter in question is used only
in legacy packages, or the context refers exclusively to such a package.
package_name
Any name, up to a maximum of 39 characters, that:
• starts and ends with an alphanumeric character
• otherwise contains only alphanumeric characters or dot (.), dash (-), or underscore
(_)
• is unique among package names in this cluster
module_name
The module name (for example, failover, service, etc.) Do not change it. Used in
the form of a relative path (for example sg/failover) as a parameter to cmmakepkg
to specify modules to be used in configuring the package. (The files reside in the
$SGCONF/modules directory; see “Where Serviceguard Files Are Kept” (page 206) for
an explanation of Serviceguard directories.)
New for modular packages.
module_version
The module version. Do not change it.
New for modular packages.
package_description
The application that the package runs. This is a descriptive parameter that can be set
to any value you choose, up to a maximum of 80 characters. Default value is
Serviceguard Package. New for 11.19
node_name
The node on which this package can run, or a list of nodes in order of priority, or an
asterisk (*) to indicate all nodes. The default is *.
For system multi-node packages, you must specify *.
If you use a list, specify each node on a new line, preceded by the literal node_name,
for example:
node_name <node1>
node_name <node2>
node_name <node3>
The order in which you specify the node names is important. First list the primary
node name (the node where you normally want the package to start), then the first
adoptive node name (the best candidate for failover), then the second adoptive node
name, followed by additional node names in order of preference.
In case of a failover, control of the package will be transferred to the next adoptive
node name listed in the package configuration file, or (if that node is not available or
cannot run the package at that time) to the next node in the list, and so on.
auto_run
Can be set to yes or no. The default is yes.
node_fail_fast_enabled
Can be set to yes or no. The default is no.
yes means the node on which the package is running will be halted (HP-UX system
reset) if the package fails; no means Serviceguard will not halt the system.
If this parameter is set to yes and one of the following events occurs, Serviceguard
will halt the system (HP-UX system reset) on the node where the control script fails:
• A package subnet fails and no backup network is available
• An EMS resource fails
• Serviceguard is unable to execute the halt function
• The start or halt function times out
NOTE: If the package halt function fails with “exit 1”, Serviceguard does not halt
the node, but sets no_restart for the package, which disables package switching
(auto_run), thereby preventing the package from starting on any adoptive node.
Setting node_fail_fast_enabled to yes ensures that the package can fail over to another
node even if the package cannot halt successfully. Be careful when using
node_fail_fast_enabled, as it will cause all packages on the node to halt abruptly. For
more information, see “Responses to Failures ” (page 116) and “Responses to Package
and Service Failures ” (page 119).
For system multi-node packages, node_fail_fast_enabled must be set to yes.
run_script_timeout
The amount of time, in seconds, allowed for the package to start; or no_timeout. The
default is no_timeout. The maximum is 4294.
If the package does not complete its startup in the time specified by run_script_timeout,
Serviceguard will terminate it and prevent it from switching to another node. In this
case, if node_fail_fast_enabled is set to yes, the node will be halted (HP-UX system reset).
NOTE: VxVM disk groups are imported at package run time and exported at
package halt time. If a package uses a large number of VxVM disk, the timeout
value must be large enough to allow all of them to finish the import or export.
NOTE: If (no_timeout is specified, and the script hangs, or takes a very long time
to complete, during the validation step (cmcheckconf (1m)), cmcheckconf will
wait 20 minutes to allow the validation to complete before giving up.
halt_script_timeout
The amount of time, in seconds, allowed for the package to halt; or no_timeout. The
default is no_timeout. The maximum is 4294.
If the package’s halt process does not complete in the time specified by
halt_script_timeout, Serviceguard will terminate the package and prevent it from
switching to another node. In this case, if node_fail_fast_enabled (page 290) is set to yes,
the node will be halted (HP-UX system reset).
If a halt_script_timeout is specified, it should be greater than the sum of all the values set
for service_halt_timeout (page 301) for this package.
If a timeout occurs:
• Switching will be disabled.
• The current node will be disabled from running the package.
If a halt-script timeout occurs, you may need to perform manual cleanup. See “Package
Control Script Hangs or Failures” in Chapter 8. See also the note about VxVM under
run_script_timeout (page 290).
successor_halt_timeout
Specifies how long, in seconds, Serviceguard will wait for packages that depend on
this package to halt, before halting this package. Can be 0 through 4294, or
no_timeout. The default is no_timeout.
• no_timeout means that Serviceguard will wait indefinitely for the dependent
packages to halt.
• 0 means Serviceguard will not wait for the dependent packages to halt before
halting this package.
script_log_file
The full pathname of the package’s log file. The default is
$SGRUN/log/<package_name>.log. (See “Where Serviceguard Files Are Kept”
(page 206) for more information about Serviceguard pathnames.) See also log_level
(page 292).
operation_sequence
Defines the order in which the scripts defined by the package’s component modules
will start up. See the package configuration file for details.
This parameter is not configurable; do not change the entries in the configuration file.
New for modular packages.
log_level
Determines the amount of information printed to stdout when the package is validated,
and to the script_log_file (page 292) when the package is started and halted. Valid values
are 0 through 5, but you should normally use only the first two (0 or 1); the remainder
(2 through 5) are intended for use by HP Support.
• 0 - informative messages
• 1 - informative messages with slightly more detail
• 2 - messages showing logic flow
• 3 - messages showing detailed data structure information
• 4 - detailed debugging information
• 5 - function call flow
New for modular packages.
failover_policy
Specifies how Serviceguard decides where to restart the package if it fails. Can be set
to configured_node, min_package_node, site_preferred, or
site_preferred_manual. The default is configured_node.
• configured_node means Serviceguard will attempt to restart the package on
the next available node in the list you provide under node_name (page 289).
• min_package_node means Serviceguard will restart a failed package on
whichever node in the node_name list has the fewest packages running at the time.
• site_preferred means Serviceguard will try all the eligible nodes on the local
SITE before failing the package over to a node on another SITE. This policy can
be configured only in a site-aware disaster-tolerant cluster, which requires
failback_policy
Specifies what action the package manager should take when a failover package is not
running on its primary node (the first node on its node_name list) and the primary node
is once again available. Can be set to automatic or manual. The default is manual.
• manual means the package will continue to run on the current (adoptive) node.
• automatic means Serviceguard will move the package to the primary node as
soon as that node becomes available, unless doing so would also force a package
with a higher priority (page 293) to move.
This parameter can be set for failover packages only. If this package will depend on
another package or vice versa, see also “About Package Dependencies” (page 179). If
the package has a configured weight, see also “About Package Weights” (page 187).
priority
Assigns a priority to a failover package whose failover_policy (page 292) is
configured_node. Valid values are 1 through 3000, or no_priority. The
default is no_priority. See also the dependency_ parameter descriptions (page 294).
priority can be used to satisfy dependencies when a package starts, or needs to fail over
or fail back: a package with a higher priority than the packages it depends on can drag
those packages, forcing them to start or restart on the node it chooses, so that its
dependencies are met.
If you assign a priority, it must be unique in this cluster. HP recommends assigning
values in increments of 20 so as to leave gaps in the sequence; otherwise you may have
to shuffle all the existing priorities when assigning priority to a new package.
For more information about package dependencies, see the parameter descriptions
that follow, the cmmakepkg (1m) manpage, and the discussion in this manual “About
Package Dependencies” (page 179).
dependency_condition
The condition that must be met for this dependency to be satisfied.
The syntax is: <package_name> = <package_status>. <package_name> is the
name of the package depended on. Valid values for <package_status> are UP, or
DOWN.
• UP means that this package requires the package identified by <package_name>
to be up (that is, the status reported by cmviewcl is UP).
If you specify UP, the type and characteristics of the current package (the one you
are configuring) impose the following restrictions on the type of package it can
depend on:
— If the current package is a multi-node package, <package_name> must identify
a multi-node or system multi-node package.
— If the current package is a failover package and its failover_policy (page 292) is
min_package_node, <package_name> must identify a multi-node or system
multi-node package.
— If the current package is a failover package and its failover_policy (page 292) is
configured_node , <package_name> must identify a multi-node or system
dependency_location
Specifies where the dependency_condition must be met.
• If dependency_condition is UP, legal values fordependency_location are same_node,
any_node, and different_node.
— same_node means that the package depended on must be running on the same
node.
— different_node means that the package depended on must be running on
a different node in this cluster.
— any_node means that the package depended on must be running on some
node in this cluster.
• If dependency_condition is DOWN, legal values fordependency_location are same_node
andall_nodes.
— same_node means that the package depended on must not be running on the
same node.
— all_nodes means that the package depended on must not be running on any
node in this cluster.
For more information, see “About Package Dependencies” (page 179).
weight_name, weight_value
These parameters specify a weight for a package; this weight is compared to a node's
available capacity (defined by the CAPACITY_NAME and CAPACITY_VALUE
parameters in the cluster configuration file) to determine whether the package can run
there.
NOTE: But if weight_name is package_limit, you can use only that one weight and
capacity throughout the cluster. package_limit is a reserved value, which, if used,
must be entered exactly in that form. It provides the simplest way of managing weights
and capacities; see “Simple Method” (page 188) for more information.
The rules for forming weight_name are the same as those for forming package_name
(page 288). weight_name must exactly match the corresponding CAPACITY_NAME.
weight_value is an unsigned floating-point value between 0 and 1000000 with at most
three digits after the decimal point.
You can use these parameters to override the cluster-wide default package weight that
corresponds to a given node capacity. You can define that cluster-wide default package
weight by means of the WEIGHT_NAME and WEIGHT_DEFAULT parameters in the
cluster configuration file (explicit default). If you do not define an explicit default (that
is, if you define a CAPACITY_NAME in the cluster configuration file with no
corresponding WEIGHT_NAME and WEIGHT_DEFAULT), the default weight is
assumed to be zero (implicit default). Configuring weight_name and weight_value here
in the package configuration file overrides the cluster-wide default (implicit or explicit),
and assigns a particular weight to this package.
For more information, see “About Package Weights” (page 187). See also the discussion
of the relevant parameters under “Cluster Configuration Parameters ” (page 143), in
the cmmakepkg (1m) and cmquerycl (1m) manpages, and in the cluster
configuration and package configuration template files.
New for 11.19.
local_lan_failover_allowed
Specifies whether or not Serviceguard can transfer the package IP address to a standby
LAN card in the event of a LAN card failure on a cluster node.
Legal values are yes and no. Default is yes.
monitored_subnet
A LAN subnet that is to be monitored for this package. Replaces legacy SUBNET which
is still supported in the package configuration file for legacy packages; see “Configuring
a Legacy Package” (page 375).
You can specify multiple subnets; use a separate line for each.
If you specify a subnet as a monitored_subnet the package will not run on any node not
reachable via that subnet. This normally means that if the subnet is not up, the package
monitored_subnet_access
In cross-subnet configurations, specifies whether each monitored_subnet is accessible
on all nodes in the package’s node list (see node_name (page 289)), or only some. Valid
values are PARTIAL, meaning that at least one of the nodes has access to the subnet,
but not all; and FULL, meaning that all nodes have access to the subnet. The default is
FULL, and it is in effect if monitored_subnet_access is not specified.
See also ip_subnet_node (page 299) and “About Cross-Subnet Failover” (page 201).
New for modular packages. For legacy packages, see “Configuring Cross-Subnet
Failover” (page 383).
cluster_interconnect_subnet
Can be configured only for a multi-node package in a Serviceguard Extension for Real
Application Cluster (SGeRAC) installation.
CAUTION: HP recommends that this subnet be configured into the cluster. You do
this in the cluster configuration file by specifying a HEARTBEAT_IP or STATIONARY_IP
under a NETWORK_INTERFACE on the same subnet, for each node in this package's
NODE_NAME list. For example, an entry such as the following in the cluster
configuration file configures subnet 192.10.25.0 (lan1) on node ftsys9:
NODE_NAME ftsys9
NETWORK_INTERFACE lan1
HEARTBEAT_IP 192.10.25.18
See “Cluster Configuration Parameters ” (page 143) for more information.
If the subnet is not configured into the cluster, Serviceguard cannot manage or monitor
it, and in fact cannot guarantee that it is available on all nodes in the package's node-name
list (page 289) . Such a subnet is referred to as an external subnet, and relocatable
addresses on that subnet are known as external addresses. If you use an external subnet,
you risk the following consequences:
• A failed interface on this subnet will not fail over to a local standby. See “Local
Switching ” (page 93) for more information about this type of failover.
• If the subnet fails, the package will not fail over to an alternate node.
• Even if the subnet remains intact, if the package needs to fail over because of some
other type of failure, it could fail to start on an adoptive node because the subnet
is not available on that node.
For these reasons, configure all ip_subnets into the cluster, unless you are using a
networking technology, such as HyperFabric, that does not support DLPI. In such
cases, follow instructions in the networking product's documentation to integrate the
product with Serviceguard.
For each subnet used, specify the subnet address on one line and, on the following
lines, the relocatable IP addresses that the package uses on that subnet. These will be
configured when the package starts and unconfigured when it halts.
For example, if this package uses subnet 192.10.25.0 and the relocatable IP addresses
192.10.25.12 and 192.10.25.13, enter:
ip_subnet 192.10.25.0
ip_address 192.10.25.12
ip_address 192.10.25.13
If you want the subnet to be monitored, specify it in the monitored_subnet parameter as
well.
ip_subnet_node
In a cross-subnet configuration, specifies which nodes an ip_subnet is configured on.
If no ip_subnet_node are listed under an ip_subnet, it is assumed to be configured on all
nodes in this package’s node_name list (page 289).
Can be added or deleted while the package is running, with these restrictions:
• The package must not be running on the node that is being added or deleted.
• The node must not be the first to be added to, or the last deleted from, the list of
ip_subnet_nodes for this ip_subnet.
See also monitored_subnet_access (page 297) and “About Cross-Subnet Failover” (page 201).
New for modular packages. For legacy packages, see “Configuring Cross-Subnet
Failover” (page 383).
ip_address
A relocatable IP address on a specified ip_subnet (page 298). Replaces IP, which is still
supported in the package control script for legacy packages; see “Configuring a Legacy
Package” (page 375).
For more information about relocatable IP addresses, see “Stationary and Relocatable
IP Addresses ” (page 90).
This parameter can be set for failover packages only.
Can be added or deleted while the package is running.
service_name
A service is a program or function which Serviceguard monitors as long as the package
is up. service_name identifies this function and is used by the cmrunserv and
cmhaltserv commands. You can configure a maximum of 30 services per package
and 900 services per cluster.
The length and formal restrictions for the name are the same as for package_name
(page 288). service_name must be unique among all packages in the cluster.
service_cmd
The command that runs the application or service for this service_name, for example,
/usr/bin/X11/xclock -display 15.244.58.208:0
An absolute pathname is required; neither the PATH variable nor any other environment
variable is passed to the command. The default shell is /usr/bin/sh.
NOTE: Be careful when defining service run commands. Each run command is
executed in the following way:
• The cmrunserv command executes the run command.
• Serviceguard monitors the process ID (PID) of the process the run command
creates.
• When the command exits, Serviceguard determines that a failure has occurred
and takes appropriate action, which may include transferring the package to an
adoptive node.
• If a run command is a shell script that runs some other command and then exits,
Serviceguard will consider this normal exit as a failure.
Make sure that each run command is the name of an actual service and that its process
remains alive until the actual service stops. One way to manage this is to configure a
package such that the service is actually a monitoring program that checks the health
of the application that constitutes the main function of the package, and exits if it finds
the application has failed. The application itself can be started by an external_script
(page 309).
service_fail_fast_enabled
Specifies whether or not Serviceguard will halt the node (system reset) on which the
package is running if the service identified by service_name fails. Valid values are yes
and no. Default is no, meaning that failure of this service will not cause the node to
halt. yes is not meaningful if service_restart is set to unlimited.
service_halt_timeout
The length of time, in seconds, Serviceguard will wait for the service to halt before
forcing termination of the service’s process. The maximum value is 4294.
The value should be large enough to allow any cleanup required by the service to
complete.
If no value is specified, a zero timeout will be assumed, meaning that Serviceguard
will not wait any time before terminating the process.
resource_name
The name of a resource to be monitored.
resource_name, in conjunction with resource_polling_interval, resource_start and
resource_up_value, defines an Event Monitoring Service (EMS) dependency.
In legacy packages, RESOURCE_NAME in the package configuration file requires a
corresponding DEFERRED_RESOURCE_NAME in the package control script.
You can find a list of resources in Serviceguard Manager (Configuration -> Create
Package -> Monitored Resources -> Available EMS resources), or in
the documentation supplied with the resource monitor.
A maximum of 60 EMS resources can be defined per cluster. Note also the limit on
resource_up_value (see below).
The maximum length of the resource_name string is 1024 characters.
See “Parameters for Configuring EMS Resources” (page 178) for more information and
example.
resource_start
Specifies when Serviceguard will begin monitoring the resource identified by
resource_name. Valid values are automatic and deferred.
automatic means Serviceguard will begin monitoring the resource as soon as the
node joins the cluster.deferred means Serviceguard will begin monitoring the resource
when the package starts.
resource_up_value
Defines a criterion used to determine whether the resource identified by resource_name
is up.
Requires an operator and a value. Values can be string or numeric. The legal operators
are =, !=, >, <, >=, or<=, depending on the type of value. If the type is string, then only
= and != are valid. If the string contains white space, it must be enclosed in quotes.
String values are case-sensitive.
The maximum length of the entire resource_up_value string is 1024 characters.
You can configure a total of 15 resource_up_values per package. For example, if there is
only one resource (resource_name) in the package, then a maximum of 15
resource_up_values can be defined. If two resource_names are defined and one of them
has 10 resource_up_values, then the other resource_name can have only 5
resource_up_values.
concurrent_vgchange_operations
Specifies the number of concurrent volume group activations or deactivations allowed
during package startup or shutdown.
Legal value is any number greater than zero. The default is 1.
If a package activates a large number of volume groups, you can improve the package’s
start-up and shutdown performance by carefully tuning this parameter. Tune
performance by increasing the number a little at a time and monitoring the effect on
performance at each step; stop increasing it, or reset it to a lower level, as soon as
performance starts to level off or decline. Factors you need to take into account include
the number of CPUs, the amount of available memory, the HP-UX kernel settings for
nfile and nproc, and the number and characteristics of other packages that will be running
on the node.
enable_threaded_vgchange
Indicates whether multi-threaded activation of volume groups (vgchange -T)is
enabled. New for modular packages. Available on HP-UX 11i v3 only.
Legal values are zero (disabled) or 1 (enabled). The default is zero.
Set enable_threaded_vgchange to 1 to enable vgchange -T for all of a package’s volume
groups. This means that when each volume group is activated, physical volumes (disks
or LUNs) are attached to the volume group in parallel, and mirror copies of logical
volumes are synchronized in parallel, rather than serially. That can improve a package’s
startup performance if its volume groups contain a large number of physical volumes.
Note that, in the context of a Serviceguard package, this affects the way physical volumes
are activated within a volume group; concurrent_vgchange_operations (page 302) controls
how many volume groups the package can activate simultaneously.
IMPORTANT: Make sure you read the configuration file comments for both
concurrent_vgchange_operations and enable_threaded_vgchange before configuring these
options, as well as the vgchange (1m) manpage.
vgchange_cmd
Specifies the method of activation for each HP-UX Logical Volume Manager (LVM)
volume group identified by a vg entry (see (page 304)). Replaces VGCHANGE which
is still supported in the package control script for legacy packages; see “Configuring
a Legacy Package” (page 375).
The default is vgchange -a e.
The configuration file contains several other vgchange command variants; either
uncomment one of these and comment out the default, or use the default. For more
information, see the explanations in the configuration file, “LVM Planning ” (page 132),
and “Creating the Storage Infrastructure and Filesystems with LVM, VxVM and CVM”
(page 230).
NOTE: A given package can use LVM volume groups, VxVM volume groups, CVM
volume groups, or any combination.
cvm_activation_cmd
Specifies the method of activation for Veritas Cluster Volume Manager (CVM) disk
groups. The default is
vxdg -g \${DiskGroup} set activation=readonly
The configuration file contains several other vxdg command variants; either uncomment
one of these and comment out the default, or use the default. For more information,
see the explanations in the configuration file, and the sections “CVM and VxVM
Planning ” (page 134) and “Creating the Storage Infrastructure with Veritas Cluster
Volume Manager (CVM)” (page 268).
vxvol_cmd
Specifies the method of recovery for mirrored VxVM volumes. Replaces VXVOL, which
is still supported in the package control script for legacy packages; see “Configuring
a Legacy Package” (page 375).
If recovery is found to be necessary during package startup, by default the script will
pause until the recovery is complete. To change this behavior, comment out the line
vxvol_cmd "vxvol -g \${DiskGroup} startall"
in the configuration file, and uncomment the line
vxvol_cmd "vxvol -g \${DiskGroup} -o bg startall"
This allows package startup to continue while mirror re-synchronization is in progress.
vg
Specifies an LVM volume group (one per vg, each on a new line) on which a file system
needs to be mounted. A corresponding vgchange_cmd (page 303) specifies how the
volume group is to be activated. The package script generates the necessary filesystem
commands on the basis of the fs_ parameters (page 307).
cvm_dg
Specifies a CVM disk group (one per cvm_dg, each on a new line) used by this package.
CVM disk groups must be specified whether file systems need to be mounted on them
or not. A corresponding cvm_activation_cmd (page 304) specifies how the disk group is
to be activated.
Any package using a CVM disk group must declare a dependency (see the entries for
dependency_name and related parameters starting on (page 294)) on the CVM system
multi-node package. See “Preparing the Cluster for Use with CVM ” (page 269).
vxvm_dg
Specifies a VxVM disk group (one per vxvm_dg, each on a new line) on which a file
system needs to be mounted. See the comments in the package configuration file, and
“Creating the Storage Infrastructure and Filesystems with LVM, VxVM and CVM”
(page 230), for more information.
vxvm_dg_retry
Specifies whether to retry the import of a VxVM disk group, using vxdisk scandisks
to check for any missing disks that might have caused the import to fail. Equivalent to
VXVM_DG_RETRY in the legacy package control script.
Legal values are yes and no. yes means vxdisk scandisks will be run in the event
of an import failure. The default is no.
HP recommends setting this parameter to yes in Metrocluster installations using EMC
SRDF.
IMPORTANT: vxdisk scandisks can take a long time in the case of a large IO
subsystem.
deactivation_retry_count
Specifies how many times the package shutdown script will repeat an attempt to
deactivate a volume group (LVM) or disk group (VxVM, CVM) before failing.
Legal value is zero or any greater number. As of A.11.18, the default is 2.
kill_processes_accessing_raw_devices
Specifies whether or not to kill processes that are using raw devices (for example,
database applications) when the package shuts down. Default is no. See the comments
in the package configuration file for more information.
concurrent_fsck_operations
The number of concurrent fsck operations allowed on file systems being mounted
during package startup.
Legal value is any number greater than zero. The default is 1.
If the package needs to run fsck on a large number of filesystems, you can improve
performance by carefully tuning this parameter during testing (increase it a little at
time and monitor performance each time).
concurrent_mount_and_umount_operations
The number of concurrent mounts and umounts to allow during package startup or
shutdown.
Legal value is any number greater than zero. The default is 1.
If the package needs to mount and unmount a large number of filesystems, you can
improve performance by carefully tuning this parameter during testing (increase it a
little at time and monitor performance each time).
fs_umount_retry_count
The number of umount retries for each file system. Replaces FS_UMOUNT_COUNT,
which is still supported in the package control script for legacy packages; see
“Configuring a Legacy Package” (page 375).
Legal value is any greater number greater than zero. The default is 1. Operates in the
same way as fs_mount_retry_count (page 307).
fs_name
• For a local filesystem, this parameter, in conjunction with fs_directory, fs_type,
fs_mount_opt, fs_umount_opt, and fs_fsck_opt, specifies the filesystem that is to be
mounted by the package. fs_name must specify the block devicefile for a logical
volume.
• For an NFS-imported filesystem, the additional parameters required are fs_server,
fs_directory, fs_type, and fs_mount_opt; see fs_server (page 308) for an example.
Replaces LV, which is still supported in the package control script for legacy packages.
Filesystems are mounted in the order specified in this file, and unmounted in the reverse
order.
fs_server
The name or IP address (IPv4 or IPv6) of the NFS server for an NFS-imported filesystem.
In this case, you must also set fs_type to nfs, and set fs_mount_opt to "-o llock" (in
addition to any other values you use for fs_mount_opt). fs_name specifies the directory
to be imported from fs_server, and fs_directory specifies the local mount point.
For example:
fs_name /var/opt/nfs/share1
fs_server wagon
fs_directory /nfs/mnt/share1
fs_type nfs
fs_mount_opt "-o llock"
#fs_umount_opt
#fs_fsck_opt
Note that fs_umount_opt is optional and fs_fsck_opt is not used for an NFS-imported file
system. (Both are left commented out in this example.)
fs_directory
The root of the file system specified by fs_name. Replaces FS, which is still supported
in the package control script for legacy packages; see “Configuring a Legacy Package”
(page 375). See the mount (1m) manpage for more information.
fs_type
The type of the file system specified by fs_name. This parameter is in the package control
script for legacy packages.
For an NFS-imported filesystem, this must be set to nfs. See the example under
“fs_server” (page 308).
See the mount (1m) and fstyp (1m) manpages for more information.
fs_mount_opt
The mount options for the file system specified by fs_name. This parameter is in the
package control script for legacy packages.
See the mount (1m) manpage for more information.
fs_fsck_opt
The fsck options for the file system specified by fs_name. Using the-s (safe performance
mode) option of fsck will improve startup performance if the package uses a large
number of file systems. This parameter is in the package control script for legacy
packages.
See the fsck (1m) manpage for more information.
pev_
Specifies a package environment variable that can be passed to an external_pre_script,
external_script, or both, by means of the cmgetpkgenv (1m) command. New for
modular packages.
The variable name must be in the form pev_<variable_name> and contain only
alphanumeric characters and underscores. The letters pev (upper-case or lower-case)
followed by the underscore (_) are required.
The variable name and value can each consist of a maximum of MAXPATHLEN
characters (1024 on HP-UX systems).
You can define more than one variable. See “About External Scripts” (page 197), as well
as the comments in the configuration file, for more information.
external_pre_script
The full pathname of an external script to be executed before volume groups and disk
groups are activated during package startup, and after they have been deactivated
during package shutdown (that is, effectively the first step in package startup and last
step in package shutdown). New for modular packages.
If more than one external_pre_script is specified, the scripts will be executed on package
startup in the order they are entered into this file, and in the reverse order during
package shutdown.
See “About External Scripts” (page 197), and the comments in the configuration file,
for more information and examples.
external_script
The full pathname of an external script. This script is often the means of launching and
halting the application that constitutes the main function of the package. New for
modular packages.
user_name
Specifies the name of a user who has permission to administer this package. See also
user_host and user_role; these three parameters together define the access-control policy
for this package (see “Controlling Access to the Cluster” (page 251)). These parameters
must be defined in this order: user_name, user_host, user_role.
Legal values for user_name are any_user or a maximum of eight login names from
/etc/passwd on user_host.
user_host
The system from which a user specified by user_name can execute
package-administration commands.
Legal values are any_serviceguard_node, or cluster_member_node, or a specific
cluster node. If you specify a specific node it must be the official hostname (the
hostname portion, and only thehostname portion, of the fully qualified domain name).
As with user_name, be careful to spell the keywords exactly as given.
user_role
Must be package_admin, allowing the user access to the cmrunpkg, cmhaltpkg,
and cmmodpkg commands (and the equivalent functions in Serviceguard Manager)
for this package, and to the Monitor role for the cluster. See “Controlling Access to the
Cluster” (page 251) for more information.
IMPORTANT: The following parameters are used only by legacy packages. Do not
try to use them in modular packages. See “Configuring a Legacy Package” (page 375)
for more information.
PATH Specifies the path to be used by the script.
SUBNET Specifies the IP subnets that are to be monitored
for the package.
RUN_SCRIPT and HALT_SCRIPT Use the full pathname of each script.
These two parameters allow you to separate
package run instructions and package halt
instructions for legacy packages into separate
scripts if you need to. In this case, make sure you
include identical configuration information (such
as node names, IP addresses, etc.) in both scripts.
In most cases, though, HP recommends that you
use the same script for both run and halt
instructions. (When the package starts, the script
is passed the parameter start; when it halts, it is
passed the parameter stop.)
DEFERRED_RESOURCE_NAME Add DEFERRED_RESOURCE_NAME to a legacy
package control script for any resource that has
a RESOURCE_START setting of DEFERRED.
NOTE: If you do not include a base module (or default or all) on the cmmakepkg
command line, cmmakepkg will ignore the modules you specify and generate a default
configuration file containing all the parameters.
For a complex package, or if you are not yet sure which parameters you will need to
set, the default may be the best choice; see the first example below.
You can use the-v option with cmmakepkg to control how much information is
displayed online or included in the configuration file. Valid values are 0, 1 and 2. -v
0 removes all comments; -v 1 includes a brief heading for each parameter; -v 2 provides
a full description of each parameter. The default is level 2.
Next Step
The next step is to edit the configuration file you have generated; see “Editing the
Configuration File” (page 313).
IMPORTANT: Do not edit the package configuration file of a Veritas Cluster Volume
Manager (CVM) or Cluster File System (CFS) multi-node or system multi-node package.
Create SG-CFS-pkg by means of the cmapplyconf command. Create and modify
SG-CFS-DG-id# and SG-CFS-MP-id# using cfs* commands.
• package_name. Enter a unique name for this package. Note that there are stricter
formal requirements for the name as of A.11.18.
• package_type. Enter failover, multi_node, or system_multi_node.
(system_multi_node is reserved for special-purpose packages supplied by HP.)
Note that there are restrictions if another package depends on this package; see
“About Package Dependencies” (page 179).
See “Types of Package: Failover, Multi-Node, System Multi-Node” (page 280) for
more information.
• node_name. Enter the name of each cluster node on which this package can run,
with a separate entry on a separate line for each node.
• auto_run. For failover packages, enter yes to allow Serviceguard to start the package
on the first available node specified by node_name, and to automatically restart it
later if it fails. Enter no to keep Serviceguard from automatically starting the
package.
• node_fail_fast_enabled. Enter yes to cause the node to be halted (system reset) if the
package fails; otherwise enter no.
For system multi-node packages, you must enter yes.
• run_script_timeout and halt_script_timeout. Enter the number of seconds Serviceguard
should wait for package startup and shutdown, respectively, to complete; or leave
the default, no_timeout; see (page 290).
• successor_halt_timeout. Used if other packages depend on this package; see “About
Package Dependencies” (page 179).
• script_log_file (page 292).
• log_level (page 292).
• failover_policy (page 292). Enter configured_node or min_package_node.
(This parameter can be set for failover packages only.)
• failback_policy (page 293) Enter automatic or manual.
(This parameter can be set for failover packages only.)
NOTE: The package(s) this package depends on must already be part of the
cluster configuration by the time you validate this package (via cmcheckconf(1m);
see “Verifying and Applying the Package Configuration” (page 317)); otherwise
validation will fail.
NOTE: For modular packages, you now need to distribute any external scripts
identified by the external_pre_script and external_script parameters.
But if you are accustomed to configuring legacy packages, note that you do not have to
create a separate package control script for a modular package, or distribute it manually.
(You do still have to do this for legacy packages; see “Configuring a Legacy Package”
(page 375).)
CAUTION: Although Serviceguard uses the -C option within the package control
script framework, this option should not normally be used from the command line.
Chapter 8: “Troubleshooting Your Cluster” (page 399), shows some situations where
you might need to use -C from the command line.
The following example shows the command with the same options that are used by
the control script:
# vxdg -tfC import dg_01
This command takes over ownership of all the disks in disk group dg_01, even though
the disk currently has a different host ID written on it. The command writes the current
node’s host ID on all disks in disk group dg_01 and sets the noautoimport flag for the
disks. This flag prevents a disk group from being automatically re-imported by a node
following a reboot. If a node in the cluster fails, the host ID is still written on each disk
in the disk group. However, if the node is part of a Serviceguard cluster then on reboot
the host ID will be cleared by the owning node from all disks which have the
noautoimport flag set, even if the disk group is not under Serviceguard control. This
allows all cluster nodes, which have access to the disk group, to be able to import the
disks as part of cluster operation.
The control script also uses the vxvol startall command to start up the logical
volumes in each disk group that is imported.
NOTE: This section provides examples of the information you can obtain from
cmviewcl; for complete details, see the manpages cmviewcl(1m) and cmviewcl(5).
Note that the -f line option provides information not available with other options.
You can use the cmviewcl command without root access; in clusters running
Serviceguard version A.11.16 or later, grant access by assigning the Monitor role to the
users in question. In earlier versions, allow access by adding <nodename>
<nonrootuser> to the cmclnodelist file.
cmviewcl -v displays information about all the nodes and packages in a running
cluster, together with the settings of parameters that determine failover behavior.
Viewing Dependencies
The cmviewcl -v command output lists dependencies throughout the cluster. For a
specific package’s dependencies, use the -p <pkgname> option.
Cluster Status
The status of a cluster, as shown by cmviewcl, can be one of the following:
• up — At least one node has a running cluster daemon, and reconfiguration is not
taking place.
• down — No cluster daemons are running on any cluster node.
• starting — The cluster is in the process of determining its active membership.
At least one cluster daemon is running.
• unknown — The node on which the cmviewcl command is issued cannot
communicate with other nodes in the cluster.
Service Status
Services have only status, as follows:
• Up. The service is being monitored.
• Down. The service is not running. It may not have started, or have halted or failed.
• Unknown.
Network Status
The network interfaces have only status, as follows:
• Up.
• Down.
• Unknown. Serviceguard cannot determine whether the interface is up or down. A
standby interface has this status.
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 56/36.1 lan0
STANDBY up 60/6 lan1
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 service1
Subnet up 0 0 15.13.168.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys9 (current)
Alternate up enabled ftsys10
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 28.1 lan0
STANDBY up 32.1 lan1
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 service2
Subnet up 0 0 15.13.168.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys10 (current)
Alternate up enabled ftsys9
MULTI_NODE_PACKAGES
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 SG-CFS-vxconfigd
Service up 5 0 SG-CFS-sgcvmd
Service up 5 0 SG-CFS-vxfsckd
Service up 0 0 SG-CFS-cmvxd
Service up 0 0 SG-CFS-cmvxpingd
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 SG-CFS-vxconfigd
Service up 5 0 SG-CFS-sgcvmd
Service up 5 0 SG-CFS-vxfsckd
Service up 0 0 SG-CFS-cmvxd
Service up 0 0 SG-CFS-cmvxpingd
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 56/36.1 lan0
STANDBY up 60/6 lan1
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 service1
Subnet up 15.13.168.0
Resource up /example/float
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys9 (current)
Alternate up enabled ftsys10
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 28.1 lan0
STANDBY up 32.1 lan1
UNOWNED_PACKAGES
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Script_Parameters:
ITEM STATUS NODE_NAME NAME
Resource up ftsys9 /example/float
Subnet up ftsys9 15.13.168.0
Resource down ftsys10 /example/float
Subnet up ftsys10 15.13.168.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys10
Alternate up enabled ftsys9
pkg2 now has the status down, and it is shown as unowned, with package switching
disabled. Resource /example/float, which is configured as a dependency of pkg2,
is down on one node. Note that switching is enabled for both nodes, however. This
means that once global switching is re-enabled for the package, it will attempt to start
up on the primary node.
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 56/36.1 lan0
STANDBY up 60/6 lan1
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 service1
Subnet up 15.13.168.0
Resource up /example/float
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 service2.1
Subnet up 15.13.168.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys10
Alternate up enabled ftsys9 (current)
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 28.1 lan0
STANDBY up 32.1 lan1
Now pkg2 is running on node ftsys9. Note that switching is still disabled.
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover min_package_node
Failback automatic
Script_Parameters:
ITEM STATUS NODE_NAME NAME
Resource up manx /resource/random
Subnet up manx 192.8.15.0
Resource up burmese /resource/random
Subnet up burmese 192.8.15.0
Resource up tabby /resource/random
Subnet up tabby 192.8.15.0
Resource up persian /resource/random
Subnet up persian 192.8.15.0
SYSTEM_MULTI_NODE_PACKAGES:
/var/opt/sgtest/
/var/opt/sgtest/
tmp/mnt/dev/vx/dsk/
vg_for_cvm1_dd5/lvol4 regular lvol4 vg_for_cvm_dd5 MOUNTED
Node : ftsys8
Cluster Manager : up
CVM state : up
/var/opt/sgtest/
tmp/mnt/dev/vx/dsk/
vg_for_cvm1_dd5/lvol1 regular lvol1 vg_for_cvm_veggie_dd5 MOUNTED
/var/opt/sgtest/
tmp/mnt/dev/vx/dsk/
vg_for_cvm1_dd5/lvol4 regular lvol4 vg_for_cvm_dd5 MOUNTED
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 SG-CFS-vxconfigd
Service up 5 0 SG-CFS-sgcvmd
Service up 5 0 SG-CFS-vxfsckd
Service up 0 0 SG-CFS-cmvxd
Service up 0 0 SG-CFS-cmvxpingd
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 SG-CFS-vxconfigd
Service up 5 0 SG-CFS-sgcvmd
Service up 5 0 SG-CFS-vxfsckd
Service up 0 0 SG-CFS-cmvxd
Service up 0 0 SG-CFS-cmvxpingd
NOTE: The table includes all the checks available as of A.11.20, not just the new ones.
Table 7-1 Verifying Cluster Components
Component (Context) Tool or Command; More Information Comments
LVM logical volumes (package) cmcheckconf (1m), cmapplyconf Checked for modular
(1m) packages only, as part of
See also “Verifying and Applying the package validation
Package Configuration” (page 317). (cmcheckconf -P)
LVM physical volumes (package) cmcheckconf (1m), cmapplyconf Checked for modular
(1m) packages only, as part of
package validation
(cmcheckconf -P)
Quorum Server (cluster) cmcheckconf (1m), cmapplyconf Commands check that the
(1m). quorum server, if used, is
running and all nodes are
authorized to access it; and, if
more than one IP address is
specified, that the quorum
server is reachable from all
nodes through both the IP
addresses.
Lock disk(s) (cluster) cmcheckconf (1m), cmapplyconf Commands check that the
(1m) disk(s) are accessible from all
nodes, and that the lock
volume group(s) are activated
on at least one node.
Lock LUN (cluster) cmcheckconf (1m), cmapplyconf Commands check that all
(1m) nodes are be accessing the
same physical device and that
the lock LUN device file is a
block device file
VxVM disk groups (package) cmcheckconf (1m), cmapplyconf Commands check that each
(1m) node has a working physical
See also “Verifying and Applying the connection to the disks.
Package Configuration” (page 317).
Mount points (package) cmcheckconf (1m), cmapplyconf Commands check that the
(1m) mount-point directories
See also “Verifying and Applying the specified in the package
Package Configuration” (page 317). configuration file exist on all
nodes that can run the
package.
Service commands (package) cmcheckconf (1m), cmapplyconf Commands check that files
(1m) specified by service
See also “Verifying and Applying the commands exist and are
Package Configuration” (page 317). executable. Service commands
whose paths are nested within
an unmounted shared
filesystem are not checked.
File systems (package) cmcheckconf (1m), cmapplyconf For LVM only, commands
(1m) check that file systems have
See also “Verifying and Applying the been built on the logical
Package Configuration” (page 317). volumes identified by the
fs_name parameter (page 307).
External scripts and pre-scripts cmcheckconf (1m), cmapplyconf A non-zero return value from
(modular package) (1m) any script will cause the
commands to fail.
NOTE: The job must run on one of the nodes in the cluster. Because only the root
user can run cluster verification, and cron (1m) sets the job’s user and group ID’s to
those of the user who submitted the job, you must edit the file /var/spool/cron/
crontabs/root as the root user.
Example
The short script that follows runs cluster verification and sends an email to
admin@hp.com when verification fails.
#!/bin/sh
cmcheckconf -v >/tmp/cmcheckconf.output
if (( $? != 0 ))
then
mailx -s "Cluster verification failed" admin@hp.com 2>&1 </tmp/cmcheckconf.output
fi
Limitations
Serviceguard does not check the following conditions:
• Access Control Policies properly configured (see “Controlling Access to the Cluster”
(page 251) for information about Access Control Policies)
• File systems configured to mount automatically on boot (that is, Serviceguard does
not check /etc/fstab)
• Shared volume groups configured to activate on boot
• Volume group major and minor numbers unique
• Redundant storage paths functioning properly
• Kernel parameters and driver configurations consistent across nodes
• Mount point overlaps (such that one file system is obscured when another is
mounted)
• Unreachable DNS server
• Consistency of settings in .rhosts and /var/admin/inetd.sec
• Consistency across cluster of major and minor numbers device-file numbers
• Nested mount points
• Staleness of mirror copies
NOTE: Manually starting or halting the cluster or individual nodes does not require
access to the Quorum Server, if one is configured. The Quorum Server is only used
when tie-breaking is needed following a cluster partition.
CAUTION: Serviceguard cannot guarantee data integrity if you try to start a cluster
with the cmruncl -n command while a subset of the cluster's nodes are already
running a cluster. If the network connection is down between nodes, using cmruncl
-n might result in a second cluster forming, and this second cluster might start up the
same applications that are already running on the other cluster. The result could be
two applications overwriting each other's data on the disks.
NOTE: HP recommends that you remove a node from participation in the cluster (by
running cmhaltnode as shown below, or Halt Node in Serviceguard Manager)
before running the HP-UX shutdown command, especially in cases in which a packaged
application might have trouble during shutdown and not halt cleanly.
Use cmhaltnode to halt one or more nodes in a cluster. The cluster daemon on the
specified node stops, and the node is removed from active participation in the cluster.
To halt a node with a running package, use the -f option. If a package was running
that can be switched to an adoptive node, the switch takes place and the package starts
on the adoptive node. For example, the following command causes the Serviceguard
daemon running on node ftsys9 in the sample configuration to halt and the package
running on ftsys9 to move to an adoptive node The -v (verbose) option prints out
messages:
cmhaltnode -f -v ftsys9
• You cannot detach packages when the HP-UX Clustered I/O Congestion Control
feature is enabled.
• As of the date of this manual, you cannot detach ECMT-based packages.
IMPORTANT: This means that you will need to detect any errors that occur while
the package is detached, and take corrective action by running cmhaltpkg to halt
the detached package and cmrunpkg (1m) to restart the package on another
node.
• cmviewcl (1m) reports the status and state of detached packages as detached.
This is true even if a problem has occurred since the package was detached and
some or all of the package components are not healthy or not running.
• Because Serviceguard assumes that a detached package has remained healthy, the
package is considered to be UP for dependency purposes.
This means, for example, that if you halt node1, detaching pkgA, and pkgB
depends on pkgA to be UP on ANY_NODE, pkgB on node2 will continue to run (or
can start) while pkgA is detached. See “About Package Dependencies” (page 179)
for more information about dependencies.
• As always, packages cannot start on a halted node or in a halted cluster.
• When you restart a node or cluster whose packages have been detached, the
packages are re-attached; that is, Serviceguard begins monitoring them again.
At this point, Serviceguard checks the health of the packages that were detached
and takes any necessary corrective action — for example, if a failover package has
CAUTION: Serviceguard does not check LVM volume groups and mount points
when re-attaching packages.
• The detached state and status could appear to persist across a reboot.
If a node reboots while packages are detached (or detaching, or re-attaching), and
package components are left in an inconsistent state, the appropriate package halt
scripts will run to clean things up when the node comes back up. But cmviewcl
will continue to show the packages as detached. Either cmhaltpkg or
cmrunnonde (1m) will reset the packages' state and status.
• If you halt a package and disable it before running cmhaltcl -d to detach other
packages running in the cluster, auto_run will be automatically re-enabled for this
package when the cluster is started again, forcing the package to start.
To prevent this behavior and keep the package halted and disabled after the cluster
restarts, change auto_run to no in the package configuration file (page 289), and
re-apply the package, before running cmhaltcl -d.
• If an IP address is switched to the standby LAN because of a failure of on the
primary LAN before a node is halted in detached mode, and if the failure is detected
as an IP-only failure (meaning that the primary LAN was failed at the IP level
only) then the IP address will remain on the standby LAN even after the node is
restarted via cmrunnode. This will also happen if the IP address is switched to
the standby LAN and NETWORK_AUTO_FAILBACK cluster parameter is set to
FALSE.
If the primary LAN recovers while the node is halted and you want the IP address
to fail back to the primary LAN, run cmmodnet –e to re-enable the primary LAN
interface and trigger the failback.
NOTE: -d and -f are mutually exclusive. See cmhaltnode (1m) for more
information.
NOTE: If you do not do this, the cmhaltcl in the next step will fail.
NOTE: -d and -f are mutually exclusive. See cmhaltcl (1m) for more
information.
Starting a Package
Ordinarily, when a cluster starts up, the packages configured as part of the cluster will
start up on their configured nodes. You may need to start a package manually after it
has been halted manually. You can do this either in Serviceguard Manager or on the
Serviceguard command line.
If any package has a configured dependency on another package, Serviceguard will
start them in order, ensuring that a package will not start until its dependency is met.
You can use Serviceguard Manager, or Serviceguard commands as shown below, to
start a package.
Halting a Package
You halt a Serviceguard package when you want to bring the package out of use but
want the node to continue running in the cluster. You can halt a package using
Serviceguard Manager or on the Serviceguard command line.
Halting a package has a different effect from halting the node. When you halt the node,
its failover packages may switch to adoptive nodes (assuming that switching is enabled
for them); when you halt a failover package, it is disabled from switching to another
node, and must be restarted manually on another node or on the same node.
NOTE: If you need to do maintenance that requires halting a node, or the entire
cluster, you should consider Live Application Detach; see “Halting a Node or the
Cluster while Keeping Packages Running” (page 344).
• Maintenance mode is chiefly useful for modifying networks and EMS resources
used by a package while the package is running.
See “Performing Maintenance Using Maintenance Mode” (page 356).
• Partial-startup maintenance mode allows you to work on package services, file
systems, and volume groups.
See “Performing Maintenance Using Partial-Startup Maintenance Mode” (page 357).
• Neither maintenance mode nor partial-startup maintenance mode can be used for
legacy packages, multi-node packages, or system multi-node packages.
• Package maintenance does not alter the configuration of the package, as specified
in the package configuration file.
For information about reconfiguring a package, see “Reconfiguring a Package”
(page 385).
NOTE: In order to run a package in partial-startup maintenance mode, you must first
put it in maintenance mode. This means that packages in partial-startup maintenance
mode share the characteristics described below for packages in maintenance mode,
and the same rules and dependency rules apply. Additional rules apply to partial-startup
maintenance mode, and the procedure involves more steps, as explained
underPerforming Maintenance Using Partial-Startup Maintenance Mode.
NOTE: But a failure in the package control script will cause the package to fail.
The package will also fail if an external script (or pre-script) cannot be executed
or does not exist.
IMPORTANT: See the latest Serviceguard release notes for important information
about version requirements for package maintenance.
• The package must have package switching disabled before you can put it in
maintenance mode.
• You can put a package in maintenance mode only on one node.
— The node must be active in the cluster and must be eligible to run the package
(on the package's node_name list).
— If the package is not running, you must specify the node name when you run
cmmodpkg (1m) to put the package in maintenance mode.
— If the package is running, you can put it into maintenance only on the node on
which it is running.
— While the package is in maintenance mode on a node, you can run the package
only on that node.
• You cannot put a package in maintenance mode, or take it out maintenance mode,
if doing so will cause another running package to halt.
• Since package failures are ignored while in maintenance mode, you can take a
running package out of maintenance mode only if the package is healthy.
Serviceguard checks the state of the package’s services, subnets and EMS resources
to determine if the package is healthy. If it is not, you must halt the package before
taking it out of maintenance mode.
Procedure
Follow these steps to perform maintenance on a package's networking or EMS
components.
In this example, we'll call the package pkg1 and assume it is running on node1.
1. Place the package in maintenance mode:
cmmodpkg -m on -n node1 pkg1
2. Perform maintenance on the networks or resources and test manually that they
are working correctly.
NOTE: If you now run cmviewcl, you'll see that the STATUS of pkg1 is up and
its STATE is maintenance.
Procedure
Follow this procedure to perform maintenance on a package. In this example, we'll
assume a package pkg1 is running on node1, and that we want to do maintenance on
the package's services.
1. Halt the package:
cmhaltpkg pkg1
2. Place the package in maintenance mode:
cmmodpkg -m on -n node1 pkg1
NOTE: If you now run cmviewcl, you'll see that the STATUS of pkg1 is up and
its STATE is maintenance.
NOTE: You can also use cmhaltpkg -s, which stops the modules started by
cmrunpkg -m — in this case, all the modules up to and including package_ip.
Reconfiguring a Cluster
You can reconfigure a cluster either when it is halted or while it is still running. Some
operations can only be done when the cluster is halted. Table 7-2 shows the required
cluster state for many kinds of changes.
Table 7-2 Types of Changes to the Cluster Configuration
Change to the Cluster Configuration Required Cluster State
Add a new node All systems configured as members of this cluster must
be running.
Delete a volume group Cluster can be running. Packages that use the volume
group will not be able to start again until their
configuration is modified.
Change Quorum Server Configuration Cluster can be running; see the NOTE below this table
and “What Happens when You Change the Quorum
Configuration Online” (page 66).
Change Cluster Lock Configuration (LVM lock Cluster can be running; see “Updating the Cluster Lock
disk) Configuration” (page 364), “What Happens when You
Change the Quorum Configuration Online” (page 66),
and the NOTE below this table.
Change Cluster Lock Configuration (lock LUN) Cluster can be running. See “Updating the Cluster Lock
LUN Configuration Online” (page 364), “What Happens
when You Change the Quorum Configuration Online”
(page 66), and the NOTE below this table.
Add NICs and their IP addresses, if any, to the Cluster can be running. See “Changing the Cluster
cluster configuration Networking Configuration while the Cluster Is
Running” (page 367).
Delete NICs and their IP addresses, if any, from Cluster can be running.
the cluster configuration “Changing the Cluster Networking Configuration while
the Cluster Is Running” (page 367).
If removing the NIC from the system, see “Removing a
LAN or VLAN Interface from a Node” (page 372).
Change the designation of an existing interface Cluster can be running. See “Changing the Cluster
from HEARTBEAT_IP to STATIONARY_IP, or Networking Configuration while the Cluster Is
vice versa Running” (page 367).
Change an interface from IPv4 to IPv6, or vice Cluster can be running. See “Changing the Cluster
versa Networking Configuration while the Cluster Is
Running” (page 367)
Reconfigure IP addresses for a NIC used by the Must delete the interface from the cluster configuration,
cluster reconfigure it, then add it back into the cluster
configuration. See “What You Must Keep in Mind”
(page 368). Cluster can be running throughout.
Change IP Monitor parameters: SUBNET, Cluster can be running. See the entries for these
IP_MONITOR, POLLING TARGET parameters under “Cluster Configuration Parameters
” (page 143) for more information.
NOTE: You cannot use the -t option with any command operating on a package in
maintenance mode; see “Maintaining a Package: Maintenance Mode” (page 353).
For more information about these commands, see their respective manpages. You can
also perform these preview functions in Serviceguard Manager: check the Preview
[...] box for the action in question.
When you use the -t option, the command, rather than executing as usual, predicts
the results that would occur, sending a summary to $stdout. For example, assume
that pkg1 is a high-priority package whose primary node is node1, and which depends
on pkg2 and pkg3 to run on the same node. These are lower-priority packages which
are currently running on node2. pkg1 is down and disabled, and you want to see the
effect of enabling it:
cmmodpkg -e -t pkg1
You will see output something like this:
package:pkg3|node:node2|action:failing
package:pkg2|node:node2|action:failing
package:pkg2|node:node1|action:starting
package:pkg3|node:node1|action:starting
package:pkg1|node:node1|action:starting
cmmodpkg: Command preview completed successfully
This shows that pkg1, when enabled, will “drag” pkg2 and pkg3 to its primary node,
node1. It can do this because of its higher priority; see “Dragging Rules for Simple
Dependencies” (page 181). Running the preview confirms that all three packages will
successfully start on node2 (assuming conditions do not change between now and
when you actually enable pkg1, and there are no failures in the run scripts).
NOTE: The preview cannot predict run and halt script failures.
For more information about package dependencies and priorities, see “About Package
Dependencies” (page 179).
Using cmeval
You can use cmeval to evaluate the effect of cluster changes on Serviceguard packages.
You can also use it simply to preview changes you are considering making to the cluster
as a whole.
You can use cmeval safely in a production environment; it does not affect the state of
the cluster or packages. Unlike command preview mode (the -t discussed above)
IMPORTANT: See “What Happens when You Change the Quorum Configuration
Online” (page 66) for important information.
1. In the cluster configuration file, modify the values of FIRST_CLUSTER_LOCK_PV
and SECOND_CLUSTER_LOCK_PV for each node.
2. Run cmcheckconf to check the configuration.
3. Run cmapplyconf to apply the configuration.
For information about replacing the physical disk, see “Replacing a Lock Disk”
(page 404).
IMPORTANT: See “What Happens when You Change the Quorum Configuration
Online” (page 66) for important information.
1. In the cluster configuration file, modify the value of CLUSTER_LOCK_LUN for
each node.
2. Run cmcheckconf to check the configuration.
3. Run cmapplyconf to apply the configuration.
For information about replacing the physical device, see “Replacing a Lock LUN”
(page 405).
NOTE: Before you can add a node to a running cluster that uses Veritas CVM, the
node must already be connected to the disk devices for all CVM disk groups. The disk
groups will be available for import when the node joins the cluster.
NOTE: If you are trying to remove an unreachable node on which many packages
are configured to run (especially if the packages use a large number of EMS resources)
you may see the following message:
The configuration change is too large to process while the cluster is running.
Split the configuration change into multiple requests or halt the cluster.
In this situation, you must halt the cluster to remove the node.
NODE_NAME ftsys9
NETWORK_INTERFACE lan1
HEARTBEAT_IP 192.3.17.18
#NETWORK_INTERFACE lan0
#STATIONARY_IP 15.13.170.18
NETWORK_INTERFACE lan3
# Possible standby Network Interfaces for lan1, lan0: lan2.
NODE_NAME ftsys10
NETWORK_INTERFACE lan1
HEARTBEAT_IP 192.3.17.19
#NETWORK_INTERFACE lan0
# STATIONARY_IP 15.13.170.19
NETWORK_INTERFACE lan3
# Possible standby Network Interfaces for lan0, lan1: lan2
2. Edit the file to uncomment the entries for the subnet that is being added lan0 in
this example), and change STATIONARY_IP to HEARTBEAT_IP:
NODE_NAME ftsys9
NETWORK_INTERFACE lan1
HEARTBEAT_IP 192.3.17.18
NETWORK_INTERFACE lan0
HEARTBEAT_IP 15.13.170.18
NETWORK_INTERFACE lan3
# Possible standby Network Interfaces for lan1, lan0: lan2.
NODE_NAME ftsys10
NETWORK_INTERFACE lan1
HEARTBEAT_IP 192.3.17.19
NETWORK_INTERFACE lan0
HEARTBEAT_IP 15.13.170.19
NETWORK_INTERFACE lan3
# Possible standby Network Interfaces for lan0, lan1: lan2
3. Verify the new configuration:
cmcheckconf -C clconfig.ascii
NODE_NAME ftsys9
NETWORK_INTERFACE lan1
HEARTBEAT_IP 192.3.17.18
# NETWORK_INTERFACE lan0
# STATIONARY_IP 15.13.170.18
# NETWORK_INTERFACE lan3
# Possible standby Network Interfaces for lan1, lan0: lan2.
NODE_NAME ftsys10
NETWORK_INTERFACE lan1
HEARTBEAT_IP 192.3.17.19
# NETWORK_INTERFACE lan0
# STATIONARY_IP 15.13.170.19
# NETWORK_INTERFACE lan3
# Possible standby Network Interfaces for lan0, lan1: lan2
NOTE: This can be done on a running system only on HP-UX 11i v3. You must shut
down an HP-UX 11i v2 system before removing the interface.
1. If you are not sure whether or not a physical interface (NIC) is part of the cluster
configuration, run olrad -C with the affected I/O slot ID as argument. If the NIC
is part of the cluster configuration, you’ll see a warning message telling you to
remove it from the configuration before you proceed. See the olrad(1M) manpage
for more information about olrad.
2. Use the cmgetconf command to store a copy of the cluster’s existing cluster
configuration in a temporary file. For example:
cmgetconf clconfig.ascii
3. Edit clconfig.ascii and delete the line(s) specifying the NIC name and its IP
address(es) (if any) from the configuration.
4. Run cmcheckconf to verify the new configuration.
5. Run cmapplyconf to apply the changes to the configuration and distribute the
new configuration file to all the cluster nodes.
6. Runolrad -d to remove the NIC.
See also “Replacing LAN or Fibre Channel Cards” (page 407).
NOTE: If the volume group that you are deleting from the cluster is currently activated
by a package, the configuration will be changed but the deletion will not take effect
until the package is halted; thereafter, the package will no longer be able to run without
further modification, such as removing the volume group from the package
configuration file or control script.
CAUTION: Serviceguard manages the Veritas processes, specifically gab and LLT.
This means that you should never use administration commands such as gabconfig,
llthosts, and lltconfig to administer a cluster. It is safe to use the read-only
variants of these commands, such as gabconfig -a. But a Veritas administrative
command could potentially crash nodes or the entire cluster.
NOTE: If you are removing a disk group from the cluster configuration, make sure
that you also modify or delete any package configuration file (or legacy package control
script) that imports and deports this disk group. Be sure to remove the disk group from
the configuration of any package that used it, as well as the corresponding dependency_
parameters.
Changing MAX_CONFIGURED_PACKAGES
As of Serviceguard A.11.17, you can change MAX_CONFIGURED_PACKAGES while
the cluster is running. The default for MAX_CONFIGURED_PACKAGES is the maximum
number allowed in the cluster. You can use Serviceguard Manager to change
MAX_CONFIGURED_PACKAGES, or Serviceguard commands as shown below.
Use cmgetconf to obtain a current copy of the cluster's existing configuration; for
example:
cmgetconf -c <cluster_name> clconfig.ascii
Edit the clconfig.ascii file to include the new value for
MAX_CONFIGURED_PACKAGES. Then use the cmcheckconf command to verify
the new configuration. Using the -k or -K option can significantly reduce the response
time.
Use cmapplyconf to apply the changes to the configuration and send the new
configuration file to all cluster nodes. Using -k or -K can significantly reduce the
response time.
IMPORTANT: You can still create a new legacy package. If you are using a Serviceguard
Toolkit such as Serviceguard NFS Toolkit, consult the documentation for that product.
Otherwise, use this section to maintain and rework existing legacy packages rather
than to create new ones. The method described in Chapter 6 “Configuring Packages
and Their Services ”, is simpler and more efficient for creating new packages, allowing
packages to be built from smaller modules, and eliminating the separate package control
script and the need to distribute it manually.
If you decide to convert a legacy package to a modular package, see “Migrating a
Legacy Package to a Modular Package” (page 386). Do not attempt to convert
Serviceguard Toolkit packages.
NOTE: HP strongly recommends that you never edit the package configuration file
of a CVM/CFS multi-node or system multi-node package, although Serviceguard does
not prohibit it. Create SG-CFS-pkg by issuing the cmapplyconf command. Create
and modify SG-CFS-DG-id# and SG-CFS-MP-id# using cfs commands.
• PACKAGE_TYPE. Enter the package type; see “Types of Package: Failover,
Multi-Node, System Multi-Node” (page 280) and package_type (page 289).
NOTE: For modular packages, the default form for parameter names in the
package configuration file is lower case; for legacy packages the default is upper
case. There are no compatibility issues; Serviceguard is case-insensitive as far as
the parameter names are concerned.
Because this section is intended to be used primarily when you are reconfiguring
an existing legacy package, we are using the legacy parameter names (in upper
case) for sake of continuity. But if you generate the configuration file using
cmmakepkg or cmgetconf, you will see the parameter names as they appear in
modular packages; see the notes below and the “Package Parameter Explanations”
(page 287) for details of the name changes.
IMPORTANT: Each subnet specified here must already be specified in the cluster
configuration file via the NETWORK_INTERFACE parameter and either the
HEARTBEAT_IP or STATIONARY_IP parameter. See “Cluster Configuration
Parameters ” (page 143) for more information.
See also “Stationary and Relocatable IP Addresses ” (page 90) and monitored_subnet
(page 296).
IMPORTANT: For cross-subnet configurations, see “Configuring Cross-Subnet
Failover” (page 383).
• If your package runs services, enter the SERVICE_NAME (page 299) and values
for SERVICE_FAIL_FAST_ENABLED (page 301) and
SERVICE_HALT_TIMEOUT(page 301). Enter a group of these three for each service.
IMPORTANT: Note that the rules for valid SERVICE_NAMEs are more restrictive
as of A.11.18.
• To configure monitoring for a registered resource, enter values for the following
parameters.
— RESOURCE_NAME
— RESOURCE_POLLING_INTERVAL
NOTE: For legacy packages, DEFERRED resources must be specified in the package
control script.
function customer_defined_halt_cmds
{
# ADD customer defined halt commands.
: # do nothing instruction, because a function must contain some command.
date >> /tmp/pkg1.datelog
echo 'Halting pkg1' >> /tmp/pkg1.datelog
test_return 52
}
NOTE: You must use cmcheckconf and cmapplyconf again any time you make
changes to the cluster and package configuration files.
Configuring node_name
First you need to make sure that pkg1 will fail over to a node on another subnet only
if it has to. For example, if it is running on NodeA and needs to fail over, you want it
to try NodeB, on the same subnet, before incurring the cross-subnet overhead of failing
over to NodeC or NodeD.
Configuring monitored_subnet_access
In order to monitor subnet 15.244.65.0 or 15.244.56.0, you would configure
monitored_subnet and monitored_subnet_access in pkg1’s package configuration file as
follows:
monitored_subnet 15.244.65.0
monitored_subnet_access PARTIAL
monitored_subnet 15.244.56.0
monitored_subnet_access PARTIAL
Reconfiguring a Package
You reconfigure a package in much the same way as you originally configured it; for
modular packages, see Chapter 6: “Configuring Packages and Their Services ” (page 279);
for older packages, see “Configuring a Legacy Package” (page 375).
The cluster, and the package itself, can be either halted or running during package
reconfiguration; see “Reconfiguring a Package on a Running Cluster ” (page 386). The
types of changes that can be made and the times when they take effect depend on
NOTE: The cmmigratepkg command requires Perl version 5.8.3 or higher on the
system on which you run the command. It should already be on the system as part of
the HP-UX base product.
CAUTION: Make sure you read and understand the information and caveats
under“Allowable Package States During Reconfiguration ” (page 390) before you
decide to reconfigure a running package.
1. Make a copy of the old script, save it with the new name, and edit the copy as
needed.
2. Edit the package configuration file to use the new name.
CAUTION: If cmcheckconf fails, do not proceed to the next step until you have
corrected all the errors.
CAUTION: You must not use the HP-UX mount and umount commands in a CFS
environment; use cfsmount or cfsumount. Non-CFS commands (for example, mount
-o cluster, dbed_chkptmount, or sfrac_chkptmount) could cause conflicts
with subsequent operations on the file system or Serviceguard packages, and will not
create an appropriate multi-node package, with the result that cluster packages are not
aware of file system changes.
1. Remove any dependencies on the package being deleted. Delete dependency_
parameters from the failover application package configuration file, then apply
the modified configuration file:
cmapplyconf -v -P app1.conf
2. Unmount the shared file system
cfsumount <mount point>
3. Remove the mount point package from the cluster
cfsmntadm delete <mount point>
This disassociates the mount point from the cluster. When there is a single volume
group associated with the mount point, the disk group package will also be
removed
4. Remove the disk group package from the cluster. This disassociates the disk group
from the cluster.
cfsdgadm delete <disk group>
In general, you have greater scope for online changes to a modular than to a legacy
package. In some cases, though, the capability of legacy packages has been upgraded
NOTE: If neither legacy nor modular is called out under “Change to the Package”, the
“Required Package State” applies to both types of package. Changes that are allowed,
but which HP does not recommend, are labeled “should not be running”.
IMPORTANT: Actions not listed in the table can be performed for both types of package
while the package is running.
In all cases the cluster can be running, and packages other than the one being
reconfigured can be running. And remember too that you can make changes to package
configuration files at any time; but do not apply them (using cmapplyconf or
Serviceguard Manager) to a running package in the cases indicated in the table.
NOTE: All the nodes in the cluster must be powered up and accessible when you
make package configuration changes.
Table 7-3 Types of Changes to Packages
Change to the Package Required Package State
Change run script contents: Package can be running, but should not be starting.
legacy package Timing problems may occur if the script is changed while the package
is starting.
Change halt script contents: Package can be running, but should not be halting.
legacy package Timing problems may occur if the script is changed while the package
is halting.
Add or remove a SUBNET (in Package must not be running. (Also applies to cross-subnet
control script): legacy package configurations.)
Add or remove an IP (in Package must not be running. (Also applies to cross-subnet
control script): legacy package configurations.)
Add or remove a resource: For AUTOMATIC resources, package can be running. See the entry for
legacy package “Add or remove a resource: modular package” for more information.
For DEFERRED resources, package must not be running.
Change a file system: modular Package should not be running (unless you are only changing
package fs_umount_opt).
Changing file-system options other than fs_umount_opt may cause
problems because the file system must be unmounted (using the existing
fs_umount_opt) and remounted with the new options; the CAUTION
under “Remove a file system: modular package” applies in this case as
well.
If only fs_umount_opt is being changed, the file system will not be
unmounted; the new option will take effect when the package is halted
or the file system is unmounted for some other reason.
Add, change, or delete external Package can be running but proceed with caution. See “Renaming or
scripts and pre-scripts: modular Replacing an External Script Used by a Running Package” (page 387).
package Changes take effect when applied, whether or not the package is
running. If you add a script, Serviceguard validates it and then (if there
are no errors) runs it when you apply the change. If you delete a script,
Serviceguard stops it when you apply the change.
Change package auto_run Package can be running. See “Choosing Switching and Failover
Behavior” (page 176).
Add CFS packages Before you can add an SG-CFS-DG-id# disk group package, the
SG-CFS-pkg Cluster File System package must be up and running.
Before you can add an SG-MP-id# mount point package to a node, the
SG-DG-id# disk group package must be up and running on that node.
NOTE: You will not be able to cancel if you use cmapplyconf -f.
• Package nodes
• Package dependencies
• Package weights (and also node capacity, defined in the cluster configuration file)
• Package priority
• auto_run
• failback_policy
•
Single-Node Operation
In a multi-node cluster, you could have a situation in which all but one node has failed,
or you have shut down all but one node, leaving your cluster in single-node operation.
This remaining node will probably have applications running on it. As long as the
Serviceguard daemon cmcld is active, other nodes can rejoin the cluster.
If the Serviceguard daemon fails when in single-node operation, it will leave the single
node up and your applications running. (This is different from the loss of the
Serviceguard daemon in a multi-node cluster, which halts the node with a TOC, and
causes packages to be switched to adoptive nodes.) It is not necessary to halt the single
node in this scenario, since the application is still running, and no other node is currently
available for package switching.
You should not try to restart Serviceguard, since data corruption might occur if another
node were to attempt to start up a new instance of the application that is still running
on the single node.
Instead of restarting the cluster, choose an appropriate time to shut down the
applications and reboot the node; this will allow Serviceguard to restart the cluster
after the reboot.
CAUTION: Remove the node from the cluster first. If you run the swremove command
on a server that is still a member of a cluster, it will cause that cluster to halt, and the
cluster configuration to be deleted.
To remove Serviceguard:
1. If the node is an active member of a cluster, halt the node.
2. If the node is included in a cluster configuration, remove the node from the
configuration.
3. If you are removing Serviceguard from more than one node, issue swremove on
one node at a time.
CAUTION: In testing the cluster in the following procedures, be aware that you are
causing various components of the cluster to fail, so that you can determine that the
cluster responds correctly to failure situations. As a result, the availability of nodes and
applications may be disrupted.
Monitoring Hardware
Good standard practice in handling a high availability system includes careful fault
monitoring so as to prevent failures if possible or at least to react to them swiftly when
they occur. The following should be monitored for errors or warnings of all kinds:
• Disks
• CPUs
• Memory
• LAN cards
• Power sources
• All cables
• Disk interface cards
Some monitoring can be done through simple physical inspection, but for the most
comprehensive monitoring, you should examine the system log file (/var/adm/
syslog/syslog.log) periodically for reports on all configured HA devices. The
presence of errors relating to a device will show the need for maintenance.
When the proper redundancy has been configured, failures can occur with no external
symptoms. Proper monitoring is important. For example, if a Fibre Channel switch in
a redundant mass storage configuration fails, LVM will automatically fail over to the
alternate path through another Fibre Channel switch. Without monitoring, however,
you may not know that the failure has occurred, since the applications are still running
normally. But at this point, there is no redundant path if another failover occurs, so the
mass storage configuration is vulnerable.
Replacing Disks
The procedure for replacing a faulty disk mechanism depends on the type of disk
configuration you are using. Separate descriptions are provided for replacing an array
mechanism and a disk in a high availability enclosure.
For more information, see the section Replacing a Bad Disk in the Logical Volume
Management volume of the HP-UX System Administrator’s Guide, at
www.hp.com/go/hpux-core-docs.
NOTE: This example assumes you are using legacy DSF naming. Under agile
addressing, the physical volume would have a name such as /dev/disk/disk1.
See “About Device File Names (Device Special Files)” (page 106).
If you are using cDSFs, the device file would be in the /dev/rdisk/ directory;
for example /dev/rdisk/disk1. See “About Cluster-wide Device Special Files
(cDSFs)” (page 135).
If you need to replace a disk under the 11i v3 agile addressing scheme (also used
by cDSFs), you may be able to reduce downtime by using the
io_redirect_dsf(1M) command to reassign the existing DSF to the new device.
See the section Replacing a Bad Disk in the Logical Volume Management volume
of the HP-UX System Administrator’s Guide, posted at
www.hp.com/go/hpux-core-docs.
2. Identify the names of any logical volumes that have extents defined on the failed
physical volume.
3. On the node on which the volume group is currently activated, use the following
command for each logical volume that has extents on the failed physical volume:
lvreduce -m 0 /dev/vg_sg01/lvolname /dev/dsk/c2t3d0
4. At this point, remove the failed disk and insert a new one. The new disk will have
the same HP-UX device name as the old one.
5. On the node from which you issued the lvreduce command, issue the following
command to restore the volume group configuration data to the newly inserted
disk:
vgcfgrestore -n /dev/vg_sg01 /dev/dsk/c2t3d0
6. Issue the following command to extend the logical volume to the newly inserted
disk:
lvextend -m 1 /dev/vg_sg01 /dev/dsk/c2t3d0
IMPORTANT: If you need to replace a disk under the HP-UX 11i v3 agile addressing
scheme, also used by cDSFs (see “About Device File Names (Device Special Files)”
(page 106) and “About Cluster-wide Device Special Files (cDSFs)” (page 135)), and you
use the same DSF, you may need to use the io_redirect_dsf(1M) command to
reassign the existing DSF to the new device, depending on whether the operation
changes the WWID of the device. See the section Replacing a Bad Disk in the Logical
Volume Management volume of the HP-UX System Administrator’s Guide, posted at
www.hp.com/go/hpux-core-docs. See also the section on io_redirect_dsf at
the same address.
If you do not use the existing DSF for the new device, you must change the name of
the DSF in the cluster configuration file and re-apply the configuration; see “Updating
the Cluster Lock Disk Configuration Online” (page 364). Do this after running
vgcfgrestore as described below.
CAUTION: Before you start, make sure that all nodes have logged a message in syslog
saying that the lock disk is corrupt or unusable.
Replace a failed LVM lock disk in the same way as you replace a data disk. If you are
using a dedicated lock disk (one with no user data on it), then you need to use only one
LVM command, for example:
vgcfgrestore -n /dev/vg_lock /dev/dsk/c2t3d0
Serviceguard checks the lock disk every 75 seconds. After using the vgcfgrestore
command, review the syslog file of an active cluster node for not more than 75 seconds.
By this time you should see a message showing that the lock disk is healthy again.
IMPORTANT: If you need to replace a LUN under the HP-UX 11i v3 agile addressing
scheme, also used by cDSFs (see “About Device File Names (Device Special Files)”
(page 106) and “About Cluster-wide Device Special Files (cDSFs)” (page 135), and you
use the same DSF, you may need to use the io_redirect_dsf(1M) command to
reassign the existing DSF to the new device, depending on whether the operation
changes the WWID of the LUN; see the section on io_redirect_dsf in the white
paper The Next Generation Mass Storage Stack at www.hp.com/go/hpux-core-docs.
If you are not able to use the existing DSF for the new device, or you decide not to, you
must change the name of the DSF in the cluster configuration file and re-apply the
configuration; see “Updating the Cluster Lock LUN Configuration Online” (page 364).
Do this after running vgcfgrestore as described below.
CAUTION: Before you start, make sure that all nodes have logged a message such as
the following in syslog:
WARNING: Cluster lock LUN /dev/dsk/c0t1d1 is corrupt: bad label.
Until this situation is corrected, a single failure could cause
all nodes in the cluster to crash.
Once all nodes have logged this message, use a command such as the following to
specify the new cluster lock LUN:
cmdisklock reset /dev/dsk/c0t1d1
CAUTION: You are responsible for determining that the device is not being used by
any subsystem on any node connected to the device before using cmdisklock -f. If
you use cmdisklock -f without taking this precaution, you could lose data.
NOTE: cmdisklock is needed only when you are repairing or replacing a lock LUN
or lock disk; see the cmdisklock (1m) manpage for more information.
Serviceguard checks the lock LUN every 75 seconds. After using the cmdisklock
command, review the syslog file of an active cluster node for not more than 75 seconds.
By this time you should see a message showing that the lock LUN is healthy again.
Offline Replacement
Follow these steps to replace an I/O card off-line.
1. Halt the node by using the cmhaltnode command.
2. Shut down the system using /usr/sbin/shutdown, then power down the system.
3. Remove the defective I/O card.
4. Install the new I/O card. The new card must be exactly the same card type, and it
must be installed in the same slot as the card you removed.
5. Power up the system.
6. If necessary, add the node back into the cluster by using the cmrunnode command.
(You can omit this step if the node is configured to join the cluster automatically.)
Online Replacement
If your system hardware supports hotswap I/O cards, you have the option of replacing
the defective I/O card online, using the HP-UX olrad command. The new card must
be exactly the same card type as the card you removed. Serviceguard will automatically
recover a LAN card once it has been replaced and reconnected to the network.
For more information, see the olrad(1m) manpage and the Interface Card OL* Support
Guide which as of the date of this manual can be found at http://
h20000.www2.hp.com/bc/docs/support/SupportManual/c01916176/
c01916176.pdf. See also “Removing a LAN or VLAN Interface from a Node”
(page 372).
2. Use the cmapplyconf command to apply the configuration and copy the new
binary file to all cluster nodes:
cmapplyconf -C config.ascii
This procedure updates the binary file with the new MAC address and thus avoids
data inconsistency between the outputs of the cmviewconf and lanscan commands.
NOTE: While the old Quorum Server is down and the new one is being set up, these
things can happen:
• These three commands will not work:
cmquerycl -q
cmapplyconf -C
cmcheckconf -C
• If there is a node or network failure that creates a 50-50 membership split, the
quorum server will not be available as a tie-breaker, and the cluster will fail.
NOTE: Make sure that the old Quorum Server system does not rejoin the network
with the old IP address.
Troubleshooting Approaches
The following sections offer a few suggestions for troubleshooting by reviewing the
state of the running system and by examining cluster status data, log files, and
configuration files. Topics include:
• Reviewing Package IP Addresses
• Reviewing the System Log File
• Reviewing Configuration Files
• Reviewing the Package Control Script
• Using cmquerycl and cmcheckconf
• Using cmviewcl
• Reviewing the LAN Configuration
Solving Problems
Problems with Serviceguard may be of several types. The following is a list of common
categories of problem:
• Serviceguard command hangs
• Networking and security configuration errors
Name: ftsys9.cup.hp.com
Address: 15.13.172.229
If the output of this command does not include the correct IP address of the node, then
check your name resolution services further.
In many cases, a symptom such as Permission denied... or Connection
refused... is the result of an error in the networking or security configuration. Most
such problems can be resolved by correcting the entries in /etc/hosts. See
“Configuring Name Resolution” (page 218) for more information.
CAUTION: Do not use the HP-UX mount and umount commands in a CFS cluster;
use cfsmount or cfsumount. Non-cfs commands (such as mount -o cluster,
dbed_chkptmount, or sfrac_chkptmount) could cause conflicts with subsequent
command operations on the file system or Serviceguard packages. These mount
commands will not create an appropriate multi-node package, with the result that the
cluster packages are not aware of the file system changes.
In this kind of situation, Serviceguard will not restart the package without manual
intervention. You must clean up manually before restarting the package. Use the
following steps as guidelines:
1. Perform application-specific cleanup. Any application-specific actions the control
script might have taken should be undone to ensure successfully starting the
package on an alternate node. This might include such things as shutting down
application processes, removing lock files, and removing temporary files.
2. Ensure that package IP addresses are removed from the system; use the
cmmodnet(1m) command.
First determine which package IP addresses are installed by inspecting the output
of netstat -in. If any of the IP addresses specified in the package control script
appear in the netstat output under the Address column for IPv4 or the Address
column for IPv6, use cmmodnet to remove them:
cmmodnet -r -i <ip-address> <subnet>
where <ip-address> is the address in the Address or the Address column and
<subnet> is the corresponding entry in the Network column for IPv4, or the
prefix (which can be derived from the IPV6 address) for IPv6.
3. Ensure that package volume groups are deactivated. First unmount any package
logical volumes which are being used for filesystems. This is determined by
inspecting the output resulting from running the command bdf -l. If any package
The default Serviceguard control scripts are designed to take the straightforward steps
needed to get an application running or stopped. If the package administrator specifies
a time limit within which these steps need to occur and that limit is subsequently
exceeded for any reason, Serviceguard takes the conservative approach that the control
script logic must either be hung or defective in some way. At that point the control
script cannot be trusted to perform cleanup actions correctly, thus the script is terminated
and the package administrator is given the opportunity to assess what cleanup steps
must be taken.
If you want the package to switch automatically in the event of a control script timeout,
set the node_fail_fast_enabled parameter (page 301) to yes. In this case, Serviceguard will
cause the node where the control script timed out to halt (system reset). This effectively
cleans up any side effects of the package's run or halt attempt (but remember that the
system reset will cause all the packages running on that node to halt abruptly). In this
case the package will be automatically restarted on any available alternate node for
which it is configured. For more information, see “Responses to Package and Service
Failures ” (page 119).
CAUTION: Do not use the HP-UX mount and umount commands in a CFS cluster;
use cfsmount or cfsumount. Commands such asmount -o cluster,
dbed_chkptmount, or sfrac_chkptmount could cause conflicts with subsequent
command operations on the file system or Serviceguard packages. Non-CFS mount
commands will not create an appropriate multi-node package, with the result that the
cluster packages are not aware of the file system changes.
NOTE: See the HP Serviceguard Quorum Server Version A.04.00 Release Notes for
information about configuring the Quorum Server. Do not proceed without reading
the Release Notes for your version.
Timeout Problems
The following kinds of message in a Serviceguard node’s syslog file may indicate
timeout problems:
Unable to set client version at quorum server 192.6.7.2: reply
timed out
Probe of quorum server 192.6.7.2 timed out
These messages could be an indication of an intermittent network problem; or the
default Quorum Server timeout may not be sufficient. You can set the
QS_TIMEOUT_EXTENSION to increase the timeout, or you can increase the
MEMBER_TIMEOUT value. See “Cluster Configuration Parameters ” (page 143) for
more information about these parameters.
A message such as the following in a Serviceguard node’s syslog file indicates that
the node did not receive a reply to its lock request on time. This could be because of
delay in communication between the node and the Quorum Server or between the
Quorum Server and other nodes in the cluster:
Attempt to get lock /sg/cluser1 unsuccessful. Reason:
request_timedout
Messages
Serviceguard sometimes sends a request to the Quorum Server to set the lock state.
(This is different from a request to obtain the lock in tie-breaking.) If the Quorum
Server’s connection to one of the cluster nodes has not completed, the request to set
may fail with a two-line message like the following in the quorum server’s log file:
Oct 008 16:10:05:0: There is no connection to the applicant
2 for lock /sg/lockTest1
Oct 08 16:10:05:0:Request for lock /sg/lockTest1 from
applicant 1 failed: not connected to all applicants.
423
424
B Designing Highly Available Cluster Applications
This appendix describes how to create or port applications for high availability, with
emphasis on the following topics:
• Automating Application Operation
• Controlling the Speed of Application Failover (page 427)
• Designing Applications to Run on Multiple Systems (page 430)
• Using a Relocatable Address as the Source Address for an Application that is
Bound to INADDR_ANY (page 435)
• Restoring Client Connections (page 437)
• Handling Application Failures (page 439)
• Minimizing Planned Downtime (page 440)
Designing for high availability means reducing the amount of unplanned and planned
downtime that users will experience. Unplanned downtime includes unscheduled
events such as power outages, system failures, network failures, disk crashes, or
application failures. Planned downtime includes scheduled events such as scheduled
backups, system upgrades to new OS revisions, or hardware replacements.
Two key strategies should be kept in mind:
1. Design the application to handle a system reboot or panic. If you are modifying
an existing application for a highly available environment, determine what happens
currently with the application after a system panic. In a highly available
environment there should be defined (and scripted) procedures for restarting the
application. Procedures for starting and stopping the application should be
automatic, with no user intervention required.
2. The application should not use any system-specific information such as the
following if such use would prevent it from failing over to another system and
running properly:
• The application should not refer to uname() or gethostname().
• The application should not refer to the SPU ID.
• The application should not refer to the MAC (link-level) address.
Use Checkpoints
Design applications to checkpoint complex transactions. A single transaction from the
user's perspective may result in several actual database transactions. Although this
issue is related to restartable transactions, here it is advisable to record progress locally
on the client so that a transaction that was interrupted by a system failure can be
completed after the failover occurs.
For example, suppose the application being used is calculating PI. On the original
system, the application has gotten to the 1,000th decimal point, but the application has
not yet written anything to disk. At that moment in time, the node crashes. The
application is restarted on the second node, but the application is started up from
scratch. The application must recalculate those 1,000 decimal points. However, if the
application had written to disk the decimal points on a regular basis, the application
could have restarted from where it left off.
Use DNS
DNS provides an API which can be used to map hostnames to IP addresses and vice
versa. This is useful for BSD socket applications such as telnet which are first told the
target system name. The application must then map the name to an IP address in order
to establish a connection. However, some calls should be used with caution.
Applications should not reference official hostnames or IP addresses. The official
hostname and corresponding IP address for the hostname refer to the primary LAN
card and the stationary IP address for that card. Therefore, any application that refers
to, or requires the hostname or primary IP address may not work in an HA environment
where the network identity of the system that supports a given application moves from
one system to another, but the hostname does not move.
One way to look for problems in this area is to look for calls to gethostname(2) in
the application. HA services should use gethostname() with caution, since the
response may change over time if the application migrates. Applications that use
gethostname() to determine the name for a call to gethostbyname(2) should also
Using a Relocatable Address as the Source Address for an Application that is Bound to INADDR_ANY 435
ip_strong_es_model is set to 1 and the sending socket (or communication endpoint) is
bound to INADDR_ANY, IP will send the packet using the interface on which the inbound
packet was received.
For more information about this parameter, see:
• The help menu for ndd –h ip_strong_es_model
• The HP-UX IPSec Version A.03.00 Administrator's Guide which you can find at
www.docs.hp.com/en/internet.html
• The section “Default Gateway for each Physical IPv4 Interface” in the latest edition
of the TOUR Transition Release Notes. As of the date of this manual, those release
notes could be found at http://docs.hp.com/en/5992-1909/
5992-1909.pdf.
Perform the following steps on each node before configuring the cluster:
1. Enable strong end-system model permanently by editing /etc/rc.config.d/
nddconf as follows:
TRANSPORT_NAME[1]=ip
NDD_NAME[1]=ip_strong_es_model
NDD_VALUE[1]=1
2. If you have not already done so, disable dead gateway probing permanently by
editing /etc/rc.config.d/nddconf as follows:
TRANSPORT_NAME[2]=ip
NDD_NAME[2]=ip_ire_gw_probe
NDD_VALUE[2]=0
Once this has been done, use the HP-UX command route (1m) from within the
package to add or delete a default route for each relocatable IP address, to allow it to
communicate with all remote subnets. See the examples that follow.
IMPORTANT: You need to add and delete default routes only in a configuration in
which the clients reside on a subnet different from that of the server's relocatable
address. If all the client applications are on the same subnet as the relocatable IP address,
you do not need to add or delete any routes for the relocatable addresses; they are
added automatically when you add the relocatable addresses to the server.
For example, put a command such as the following in the
customer_defined_run_commands function of a legacy package, or the
start_command function in the external_script (page 309) for a modular package:
/usr/sbin/route add net default 128.17.17.1 1 source 128.17.17.17
In this example, 128.17.17.17 is the relocatable IP address of the package, and
128.17.17.1 is the gateway address of this network. So clients on any remote subnets
coming into the 128.17.17.17 address will get 128.17.17.17 returned as the
source IP address if the application in the package is bound to INADDR_ANY. This
allows the IP packets to go through the firewall to reach other organizations on the
network.
When the package halts, the route must be removed.
NOTE: If your package has more than one relocatable address on a physical interface,
you must add a route statement for each relocatable address during package start up,
and delete each of these routes during package halt.
For more information about configuring modular, packages, see Chapter 6 (page 279);
for legacy packages, see “Configuring a Legacy Package” (page 375).
IMPORTANT: If you use a Quorum Server, make sure that you list all relocatable IP
addresses that are associated with the per-interface default gateways in the authorization
file /etc/cmcluster/qs_authfile.
For more information about the Quorum Server, see the latest version of the HP
Serviceguard Quorum Server Release Notes at http://www.hp.com/go/
hpux-serviceguard-docs —> HP Serviceguard Quorum Servr Software
443
NOTE: Check the Serviceguard/SGeRAC/SMS/Serviceguard Manager Plug-in
Compatibility and Feature Matrix and the latest Release Notes for your version of
Serviceguard for up-to-date information about support for CFS (http://
www.hp.com/go/hpux-serviceguard-docs).
When you see the Membership: message, and specifically the capitalized word FORMED
in this message, the transition to the new cluster manager is complete and you can use
all the Serviceguard commands.
Types of Upgrade
Rolling Upgrade
In a rolling upgrade, you upgrade the HP-UX operating system (if necessary) and the
Serviceguard software one node at a time without bringing down your cluster. A rolling
Non-Rolling Upgrade
In a non-rolling upgrade, you halt the cluster before upgrading HP-UX (if necessary)
and the Serviceguard software.
This method involves cluster down time, but allows for a broader range of upgrades
than rolling upgrade. For example, you will need to do a non-rolling upgrade, or a
migration with cold install, if you are upgrading from an older release of Serviceguard
than rolling upgrade supports (see the Release Notes for the target version of
Serviceguard for the specific rolling upgrade requirements).
You can perform a non-rolling upgrade (that is, an upgrade performed while the cluster
is down) from any Serviceguard release to the latest Serviceguard release.
450 Software Upgrades
Performing a Rolling Upgrade
Limitations of Rolling Upgrades
The following limitations apply to rolling upgrades:
NOTE: This means that you cannot migrate to the HP-UX 11i v3 agile addressing
scheme for device files during a rolling upgrade if cluster lock disks are used as
a tie-breaker, because that involves changing the cluster configuration. See
“Updating the Cluster Lock Configuration” (page 364) for instructions in this case.
See “About Device File Names (Device Special Files)” (page 106) for more
information about agile addressing.
• None of the features of the newer release of Serviceguard are allowed until all
nodes have been upgraded.
• Binary configuration files may be incompatible between releases of Serviceguard.
Do not manually copy configuration files between nodes.
• No more than two versions of Serviceguard can be running in the cluster while
the rolling upgrade is in progress.
• Rolling upgrades are not intended as a means of using mixed releases of
Serviceguard or HP-UX within the cluster. HP strongly recommends that you
upgrade all cluster nodes as quickly as possible to the new release level.
• You cannot delete Serviceguard software (via swremove) from a node while a
rolling upgrade is in progress.
CAUTION: Do not proceed with an upgrade to A.11.19 until you have read and
understood the Special Considerations for Upgrade to Serviceguard A.11.19 (page 447).
IMPORTANT: All the limitations listed under “Guidelines for Rolling Upgrade”
(page 450) and “Limitations of Rolling Upgrades ” (page 451) also apply to a rolling
upgrade with DRD. You should read the entire section on “Performing a Rolling
Upgrade” (page 451) before you proceed.
CAUTION: Do not proceed with an upgrade to A.11.19 until you have read and
understood the Special Considerations for Upgrade to Serviceguard A.11.19 (page 447).
IMPORTANT: Not all paths that are supported for rolling upgrade are supported
for an upgrade using DRD, and there are additional requirements and restrictions
for paths that are supported.
Do not proceed until you have read the “Announcements”, “Compatibility”, and
“Installing Serviceguard” sections of the latest version of the Serviceguard Release
Notes, made sure your cluster meets the current requirements, and taken any necessary
preparation steps as instructed in the release notes.
• Make sure you plan sufficient system capacity to allow moving the packages from
node to node during the process without an unacceptable loss of performance.
• Make sure you have read and understood the “Restrictions for DRD Upgrades”
(page 449).
• Make sure that you have downloaded the latest DRD software and are thoroughly
familiar with the DRD documentation. See “Rolling Upgrade Using DRD” (page 449)
for more information.
NOTE: Warning messages may appear during a rolling upgrade while the node is
determining what version of software is running. This is a normal occurrence and not
a cause for concern.
The following example shows a simple rolling upgrade on two nodes running one
package each, as shown in Figure D-1. (This and the following figures show the starting
point of the upgrade as “SG (old)” and “HP-UX (old)”, with a roll to “SG (new)” and
“HP-UX (new)”. Substitute the actual release numbers of your rolling upgrade path.)
Step 1.
Halt the first node, as follows
# cmhaltnode -f node1
This will cause pkg1 to be halted cleanly and moved to node 2. The Serviceguard
daemon on node 1 is halted, and the result is shown in Figure D-2.
Step 2.
Upgrade node 1 to the next operating system release (“HP-UX (new)”), and install
the next version of Serviceguard (“SG (new)”).
Step 3.
When upgrading is finished, enter the following command on node 1 to restart the
cluster on node 1.
# cmrunnode -n node1
At this point, different versions of the Serviceguard daemon (cmcld) are running on
the two nodes, as shown in Figure D-4.
Step 4.
Repeat the process on node 2. Halt the node, as follows:
# cmhaltnode -f node2
This causes both packages to move to node 1. Then upgrade node 2 to the new
versions of HP-UX and Serviceguard.
Step 5.
Move pkg2 back to its original node. Use the following commands:
cmhaltpkg pkg2
cmrunpkg -n node2 pkg2
cmmodpkg -e pkg2
The cmmodpkg command re-enables switching of the package, which was disabled by
the cmhaltpkg command. The final running cluster is shown in Figure D-6.
Other Considerations
See also “Keeping Kernels Consistent” and “Migrating cmclnodelist entries from
A.11.15 or earlier” (page 453).
IMPORTANT: Not all paths that are supported for upgrade are supported for an
upgrade using DRD, and there are additional requirements and restrictions for paths
that are supported.
Do not proceed until you have read the “Announcements”, “Compatibility”, and
“Installing Serviceguard” sections of the latest version of the Serviceguard release notes,
made sure your cluster meets the current requirements, and taken any necessary
preparation steps as instructed in the release notes.
You must also make sure you have read and understood the “Restrictions for DRD
Upgrades” (page 449).
CAUTION: You must reboot all the nodes from their original disks before restarting
the cluster; do not try to restart the cluster with some nodes booted from the upgraded
disks and some booted from the pre-upgrade disks.
CAUTION: The cold install process erases the existing software, operating system,
and data. If you want to retain any existing software, make sure you back up that
software before migrating.
NOTE: Data on shared disks, or on local disks in volumes that are not are touched
by the HP-UX installation process, will not normally be erased by the cold install;
you can re-import this data after the cold install. If you intend to do this, you must
do the following before you do the cold install:
• For LVM: create a map file for each LVM volume group and save it as part of
your backup.
• For VxVM: deport disk groups (halting the package should do this).
See “Creating the Storage Infrastructure and Filesystems with LVM, VxVM and
CVM” (page 230) for more information.
463
Worksheet for Hardware Planning
HARDWARE WORKSHEET Page ___ of ____
===============================================================================
Node Information:
===============================================================================
Disk I/O Information:
Hardware Device
Bus Type ______ Path ______________ File Name ______________
Hardware Device
Bus Type ______ Path ______________ File Name ______________
Hardware Device
Bus Type ______ Path ______________ File Name ______________
===============================================================================
Disk Power:
===============================================================================
Tape Backup Power:
===============================================================================
Other Power:
==============================================================================
Access Policies:
User:_________________ From node:_______ Role:_____________
User:_________________ From node:_______ Role:_____________
Priority_____________ Successor_halt_timeout____________
dependency_name _____ dependency_condition ___
_==========================================================================
LVM Volume Groups:vg_______________vg________________vg________________vg________________
vgchange_cmd: _____________________________________________
CVM Disk Groups [ignore CVM items if CVM is not being used]:
cvm_vg___________cvm_dg_____________cvm_vg_______________
cvm_activation_cmd: ______________________________________________
VxVM Disk Groups:
_________vxvm_dg____________vxvm_dg_____________
vxvol_cmd _________________________________________________________
IP_subnet_node______________IP_subnet_node______________IP_subnet_node__________
Monitored subnet:___________________________monitored_subnet_access_____________
Monitored subnet:___________________________monitored_subnet_access_____________
Cluster interconnect subnet [SGeRAC only]:____________________________________
===============================================================================
Service Name: _______ Command: _________ Restart:___ Fail Fast enabled:____
Service Name: _______ Command: _________ Restart: __ Fail Fast enabled:_____
Service Name: _______ Command: _________ Restart: __ Fail Fast enabled:_____
================================================================================
Package environment variable:________________________________________________
Package environment variable:________________________________________________
External pre-script:_________________________________________________________
External script:_____________________________________________________________
================================================================================
Loading VxVM
Before you can begin migrating data, you must install the Veritas Volume Manager
software and all required VxVM licenses on all cluster nodes. This step requires each
system to be rebooted, so it requires you to remove the node from the cluster before
the installation, and restart the node after installation. This can be done as a part of a
rolling upgrade procedure, described in Appendix E.
Information about VxVM installation are in the Veritas Installation Guide for your
version of VxVM, available fromwww.hp.com/go/hpux-core-docs.
NOTE: Remember that the cluster lock disk, if used, must be configured on an
LVM volume group and physical volume. If you have a lock volume group
containing data that you wish to move to VxVM, you can do so, but do not use
vxvmconvert, because the LVM header is still required for the lock disk.
6. Restore the data to the new VxVM disk groups. Use whatever means are most
appropriate for the way in which the data was backed up in step 3 above.
FS[0]="/mnt_dg0101"
FS[1]="/mnt_dg0102"
FS[2]="/mnt_dg0201"
FS[3]="/mnt_dg0202"
FS_MOUNT_OPT[0]="-o ro"
FS_MOUNT_OPT[1]="-o rw"
FS_MOUNT_OPT[2]="-o ro"
FS_MOUNT_OPT[3]="-o rw"
4. Be sure to copy from the old script any user-specific code that may have been
added, including environment variables and customer defined functions.
5. Distribute the new package control scripts to all nodes in the cluster.
6. Test to make sure the disk group and data are intact.
7. Deport the disk group:
vxdg deport DiskGroupName
8. Make the disk group visible to the other nodes in the cluster by issuing the
following command on all other nodes:
vxdctl enable
9. Restart the package.
Anycast An address for a set of interfaces. In most cases these interfaces belong to different
nodes. A packet sent to an anycast address is delivered to one of these interfaces
identified by the address. Since the standards for using anycast addresses is still
evolving, they are not supported in HP-UX as of now.
Multicast An address for a set of interfaces (typically belonging to different nodes). A packet
sent to a multicast address will be delivered to all interfaces identified by that address.
Unlike IPv4, there are no broadcast addresses in IPv6 because their functions are
superseded by multicast.
Unicast Addresses
IPv6 unicast addresses are classified into different types. They are global aggregatable
unicast address, site-local address and link-local address. Typically a unicast address
is logically divided as follows:
Table G-2
n bits 128-n bits
Interface identifiers in a IPv6 unicast address are used to identify the interfaces on a
link. Interface identifiers are required to be unique on that link. The link is generally
identified by the subnet prefix.
A unicast address is called an unspecified address if all the bits in the address are zero.
Textually it is represented as “::”.
The unicast address ::1 or 0:0:0:0:0:0:0:1 is called the loopback address. It is
used by a node to send packets to itself.
Example:
::192.168.0.1
Example:
::ffff:192.168.0.1
where: FP = Format prefix. Value of this is “001” for Aggregatable Global unicast
addresses.
Link-Local Addresses
Link-local addresses have the following format:
Table G-6
10 bits 54 bits 64 bits
1111111010 0 interface ID
Link-local address are supposed to be used for addressing nodes on a single link.
Packets originating from or destined to a link-local address will not be forwarded by
a router.
Site-Local Addresses
Site-local addresses have the following format:
Table G-7
10 bits 38 bits 16 bits 64 bits
Link-local address are supposed to be used within a site. Routers will not forward any
packet with site-local source or destination address outside the site.
Multicast Addresses
A multicast address is an identifier for a group of nodes. Multicast addresses have the
following format:
Table G-8
8 bits 4 bits 4 bits 112 bits
“FF” at the beginning of the address identifies the address as a multicast address.
The “flgs” field is a set of 4 flags “000T”. The higher order 3 bits are reserved and must
be zero. The last bit ‘T’ indicates whether it is permanently assigned or not. A value of
zero indicates that it is permanently assigned otherwise it is a temporary assignment.
The “scop” field is a 4-bit field which is used to limit the scope of the multicast group.
For example, a value of ‘1’ indicates that it is a node-local multicast group. A value of
• Serviceguard supports IPv6 only on the Ethernet networks, including 10BT, 100BT,
and Gigabit Ethernet
Example Configurations
An example of a LAN configuration on a cluster node using both IPv4 and IPv6
addresses is shown in below.
The same LAN card can be configured with both IPv4 and IPv6 addresses, as shown
in below.
NOTE: See the latest version of the Serviceguard Release Notes at www.hp.com/go/
hpux-serviceguard-docs for the most up-to-date setup requirements.
You must have, or have done, the following before you can start using HP Serviceguard
Manager:
• The hpuxswTOMCAT product.
hpuxswTOMCAT is installed by default with HP-UX. Use the following command
to check if it is on your system:
swlist -l fileset | grep TOMCAT
• SMH (System Management Homepage; see the Release Notes for the required
version)
• A web browser with access to SMH. See the Release Notes for supported browsers,
• Have launched SMH (settings -> Security -> User Groups) to configure
user roles for SMH.
NOTE: If a cluster is not yet configured, then you will not see the Serviceguard Cluster
section on this screen. To create a cluster, from the SMH Tools menu, you must click
Serviceguard Manager link in the Serviceguard box first, then click Create Cluster.
The figure below shows a browser session at the HP Serviceguard Manager Main Page.
1 Cluster and Displays information about the Cluster status, alerts and general information.
Overall
status and
alerts
2 Menu tool The menu tool bar is available from the HP Serviceguard Manager
bar Homepage, and from any cluster, node or package view-only property page.
Menu option availability depends on which type of property page (cluster,
node or package) you are currently viewing.
3 Tab bar The default Tab bar allows you to view additional cluster-related
information. The Tab bar displays different content when you click on a
specific node or package.
4 Node Displays information about the Node status, alerts and general information.
information
5 Package Displays information about the Package status, alerts and general
information information.
NOTE: If you click on a cluster running an earlier Serviceguard release, the page will
display a link that will launch Serviceguard Manager A.05.01 (if installed) via Java
Webstart.
491
492
Index
effect of default value, 118
A AUTO_START_TIMEOUT
parameter in cluster manager configuration, 161
Access Control Policies, 251
automatic failback
Access Control Policy, 167
configuring with failover policies, 75
Access roles, 167
automatic restart of cluster, 61
active node, 31
automatically restarting the cluster, 344
adding a package to a running cluster, 388
automating application operation, 425
adding cluster nodes
autostart delay
advance planning, 204
parameter in the cluster configuration file, 161
adding nodes to a running cluster, 342
autostart for clusters
adding packages on a running cluster , 318
setting up, 274
additional package resources
monitoring, 79
addressing, SCSI, 125 B
administration backing up cluster lock information, 226
adding nodes to a ruuning cluster, 342 binding
cluster and package states, 322 in network applications, 433
halting a package, 351 bridged net
halting the entire cluster, 344 defined, 38
moving a package, 352 for redundancy in network interfaces, 38
of packages and services, 350 broadcast storm
of the cluster, 341 and possible TOC, 161
reconfiguring a package while the cluster is running, 386 building a cluster
reconfiguring a package with the cluster offline, 388 CFS infrastructure, 261
reconfiguring the cluster, 365 cluster configuration steps, 242
removing nodes from operation in a ruuning cluster, 343 CVM infrastructure, 268
responding to cluster events, 397 identifying cluster lock volume group, 246
reviewing configuration files, 411, 412 identifying heartbeat subnets, 251
starting a cluster when all nodes are down, 342 identifying quorum server, 248
starting a package, 350 logical volume infrastructure, 231
troubleshooting, 409 verifying the cluster configuration, 258
adoptive node, 31 VxVM infrastructure, 239
agile addressing bus type
defined, 106 hardware planning, 126
migrating cluster lock disks to, 364
migrating to, 107, 451 C
sources of information, 107 CAPACITY_NAME
APA defined, 157
auto port aggregation, 103 CAPACITY_VALUE
applications definedr, 157
automating, 425 CFS
checklist of steps for integrating with Serviceguard, 443 creating a storage infrastructure, 261
handling failures, 439 not supported on all HP-UX versions, 32
writing HA services for networks, 427 changes in cluster membership, 62
ARP messages changes to cluster allowed while the cluster is running, 359
after switching, 98 changes to packages while the cluster is running, 391
array changing the volume group configuration while the cluster is
replacing a faulty mechanism, 402 running, 372
arrays checkpoints, 429
disk arrays for data protection, 44 client connections
auto port aggregation restoring in applications, 437
define, 103 cluster
AUTO_START
493
configuring with commands, 243 parameter in cluster manager configuration, 143
redundancy of components, 37 cluster parameters
Serviceguard, 30 initial configuration, 59
typical configuration, 29 cluster re-formation
understanding components, 37 scenario, 117
cluster administration, 341 cluster re-formation time, 130
solving problems, 413 cluster startup
cluster and package maintenance , 321 manual, 61
cluster configuration cluster volume group
creating with SAM or Commands, 242 creating physical volumes, 233
file on all nodes, 59 parameter in cluster manager configuration, 167
identifying cluster lock volume group, 246 cluster with high availability disk array
identifying cluster-aware volume groups, 251 figure, 48, 49
planning, 135 clusters
planning worksheet, 167 active/standby type, 50
sample diagram, 123 larger size, 50
verifying the cluster configuration, 258 cmapplyconf, 259, 383
cluster configuration file cmassistd daemon, 54
Autostart Delay parameter (AUTO_START_TIMEOUT), 161 cmcheckconf, 258, 318, 382
cluster coordinator troubleshooting, 412
defined, 59 cmclconfd daemon, 54, 55
cluster lock, 62 cmcld daemon, 54
4 or more nodes, 66 and node TOC, 55
and cluster re-formation time, 130 and safety timer, 55
and cluster reformation, example, 117 functions, 55
and power supplies, 49 cmclnodelist bootstrap file, 216
backup up lock data, 226 cmdeleteconf
dual lock disk, 64 deleting a package configuration, 388
identifying in configuration file, 246, 248 deleting the cluster configuration, 277
migrating device file names, 364 cmfileassistd daemon, 54, 56
migrating disks to agile addressing, 451 cmlogd daemon, 54, 56
no locks, 66 cmlvmd daemon, 54, 56
single lock disk, 64 cmmodnet
storing configuration data, 260 assigning IP addresses in control scripts, 90
two nodes, 63 cmnetassist daemon, 57
updating configuration, 364 cmnetassistd daemon, 54
use in re-forming a cluster, 63 cmomd daemon, 54, 56
cluster manager cmquerycl
automatic restart of cluster, 61 troubleshooting, 412
blank planning worksheet, 469 cmserviced daemon, 57
cluster node parameter, 143 cmsnmpd daemon, 54
cluster volume group parameter, 167 cmvxd daemon, 54
defined, 59 cmvxd for CVM and CFS, 59
dynamic re-formation, 61 cmvxping for CVM and CFS, 59
heartbeat subnet parameter, 151 cmvxpingd daemon, 55
initial configuration of the cluster, 59 configuration
main functions, 59 basic tasks and steps, 35
member timeout parameter, 160 cluster planning, 135
monitored non-heartbeat subnet, 155 of the cluster, 59
network polling interval parameter, 161 package, 279
physical lock volume parameter, 145 package planning, 168
planning the configuration, 143 service, 279
quorum server parameter, 146, 147 configuration file
quorum server polling interval parameter, 147 for cluster manager, 59
quorum server timeout extension parameter, 147 troubleshooting, 411, 412
testing, 400 Configuration synchronization with DSAU, 34
cluster node CONFIGURED_IO_TIMEOUT_EXTENSION
494 Index
defined, 162 sample configurations, 47, 48
Configuring clusters with Serviceguard command line, 243 disk enclosures
configuring packages and their services, 279 high availability, 45
control script disk failure
adding customer defined functions, 380 protection through mirroring, 30
in package configuration, 378 disk group
pathname parameter in package configuration, 311 planning, 134
support for additional productss, 381 disk group and disk planning, 134
troubleshooting, 412 disk I/O
controlling the speed of application failover, 427 hardware planning, 126
creating the package configuration, 375 disk layout
Critical Resource Analysis (CRA) planning, 132
LAN or VLAN, 372 disk logical units
customer defined functions hardware planning, 127
adding to the control script, 380 disk management, 106
CVM, 113 disk monitor, 46
creating a storage infrastructure, 268 disk monitor (EMS), 79
not supported on all HP-UX versions, 32 disk storage
planning, 134 creating the infrastructure with CFS, 261
CVM planning creating the infrastructure with CVM, 268
Version 4.1 with CFS, 170 disk types supported by Serviceguard, 44
Version 4.1 without CFS, 170 disks
CVM_ACTIVATION_CMD in Serviceguard, 44
in package control script, 379 replacing, 402
CVM_DG disks, mirroring, 45
in package control script, 379 Distributed Systems Administration Utilities, 34
distributing the cluster and package configuration, 317, 382
D DNS services, 220
down time
data
minimizing planned, 440
disks, 44
DSAU, 34, 382
data congestion, 60
dual cluster locks
databases
choosing, 64
toolkits, 423
dynamic cluster re-formation, 61
deactivating volume groups, 236
Dynamic Multipathing (DMP)
deciding when and where to run packages, 69
and HP-UX, 46
deferred resource name, 311
deleting a package configuration
using cmdeleteconf, 388 E
deleting a package from a running cluster, 388 eight-node active/standby cluster
deleting nodes while the cluster is running, 366, 373 figure, 51
deleting the cluster configuration eight-node cluster with disk array
using cmdeleteconf, 277 figure, 52
dependencies EMS
configuring, 179 for disk monitoring, 46
designing applications to run on multiple systems, 430 for preventive monitoring, 401, 402
detecting failures monitoring package resources with, 79
in network manager, 92 using the EMS HA monitors, 79
device special files (DSFs) enclosure for disks
agile addressing, 106, 451 replacing a faulty mechanism, 403
legacy, 107 enclosures
migrating cluster lock disks to, 364 high availability, 45
disk Ethernet
choosing for volume groups, 232 redundant configuration, 40
data, 44 Event Monitoring Service
interfaces, 44 for disk monitoring, 46
mirroring, 44 in troubleshooting, 401, 402
root, 44 event monitoring service
495
using, 79 FS
exclusive access in sample package control script, 379
relinquishing via TOC, 118 FS_MOUNT_OPT
expanding the cluster in sample package control script, 379
planning ahead, 122
expansion G
planning for, 176 GAB for CVM and CFS, 58
general planning, 121
F gethostbyname
failback policy and package IP addresses, 90
used by package manager, 75 gethostbyname(), 432
FAILBACK_POLICY parameter
used by package manager, 75 H
failover HA
controlling the speed in applications, 427 disk enclosures, 45
defined, 31 HA monitors (EMS), 79
failover behavior HALT_SCRIPT
in packages, 177 parameter in package configuration, 311
failover package, 68 HALT_SCRIPT_TIMEOUT (halt script timeout)
failover policy parameter in package configuration, 311
used by package manager, 72 halting a cluster, 344
FAILOVER_POLICY parameter halting a package, 351
used by package manager, 72 halting the entire cluster, 344
failure handling application failures, 439
kinds of responses, 116 hardware
network communication, 120 blank planning worksheet, 464
package, service, node, 116 monitoring, 401
response to hardware failures, 118 hardware failures
responses to package and service failures, 119 response to, 118
restarting a service after failure, 119 hardware for OPS on HP-UX
failures power supplies, 49
of applications, 439 hardware planning
figures Disk I/O Bus Type, 126
cluster with high availability disk array, 48, 49 disk I/O information for shared disks, 126
eight-node active/standby cluster, 51 host IP address, 124, 131
eight-node cluster with EMC disk array, 52 host name, 124
mirrored disks connected for high availability, 47 I/O bus addresses, 127
node 1 rejoining the cluster, 457 I/O slot numbers, 126
node 1 upgraded to new HP-UX vesion, 457 LAN information, 124
redundant LANs, 40 LAN interface name, 124, 131
running cluster after upgrades, 458 LAN traffic type, 125
running cluster before rolling upgrade, 456 memory capacity, 124
running cluster with packages moved to node 1, 458 number of I/O slots, 124
running cluster with packages moved to node 2, 456 planning the configuration, 122
sample cluster configuration, 123 S800 series number, 124
typical cluster after failover, 31 SPU information, 123
typical cluster configuration, 29 subnet, 124, 131
file locking, 435 worksheet, 127
file systems heartbeat messages, 30
creating for a cluster, 235, 241 defined, 60
planning, 132 heartbeat subnet address
FIRST_CLUSTER_LOCK_PV parameter in cluster configuration, 151
parameter in cluster manager configuration, 145, 157 HEARTBEAT_IP
floating IP address configuration requirements, 152
defined, 90 parameter in cluster configuration, 151
floating IP addresses, 90 high availability, 29
in Serviceguard, 90
496 Index
HA cluster defined, 37 L
objectives in planning, 121 LAN
host IP address Critical Resource Analysis (CRA), 372
hardware planning, 124, 131 heartbeat, 60
host name interface name, 124, 131
hardware planning, 124 planning information, 124
HOSTNAME_ADDRESS_FAMILY LAN CRA (Critical Resource Analysis), 372
defined, 144 LAN failure
discussion and restrictions, 139 Serviceguard behavior, 37
how the cluster manager works, 59 LAN interfaces
how the network manager works, 89 monitoring with network manager, 92
HP Predictive monitoring primary and secondary, 38
in troubleshooting, 402 LAN planning
host IP address, 124, 131
I traffic type, 125
I/O bus addresses larger clusters, 50
hardware planning, 127 legacy DSFs
I/O slots defined, 107
hardware planning, 124, 126 legacy package, 375
I/O subsystem link-level addresses, 432
changes as of HP-UX 11i v3, 46, 106 LLT for CVM and CFS, 59
identifying cluster-aware volume groups, 251 load balancing
in-line terminator HP-UX and Veritas DMP, 46
permitting online hardware maintenance, 406 load sharing with IP addresses, 92
Installing Serviceguard, 205 local switching, 93
installing software lock
quorum server, 230 cluster locks and power supplies, 49
integrating HA applications with Serviceguard, 443 use of the cluster lock disk, 63
internet use of the quorum server, 65
toolkits, 423 lock disk
introduction 4 or more nodes, 63
Serviceguard at a glance, 29 specifying, 246
IP lock volume group
in sample package control script, 379 identifying in configuration file, 246
IP address planning, 130
adding and deleting in packages, 91 lock volume group, reconfiguring, 365
for nodes and packages, 90 logical volumes
hardware planning, 124, 131 blank planning worksheet, 468
portable, 90 creating for a cluster, 234, 240, 241, 271
reviewing for packages, 410 creating the infrastructure, 231, 239
switching, 70, 72, 98 planning, 132
IP_MONITOR worksheet, 133
defined, 165 lssf
using to obtain a list of disks, 232
LV
J in sample package control script, 379
JFS, 427 lvextend
creating a root mirror with, 225
K LVM, 112
kernel commands for cluster use, 231
hang, and TOC, 116 creating a root mirror, 224
safety timer, 55 disks, 44
kernel consistency migrating to VxVM, 471
in cluster configuration, 222 planning, 132
kernel interrupts setting up volume groups on another node, 236
and possible TOC, 161 LVM configuration
worksheet, 133
497
M adding and deleting package IP addresses, 91
MAC addresses, 432 failure, 94
managing the cluster and nodes, 341 load sharing with IP addresses, 92
manual cluster startup, 61 local interface switching, 93
MAX_CONFIGURED_PACKAGES local switching, 94
defined, 166 OTS/9000 support, 491
maximum number of nodes, 37 redundancy, 40
MEMBER_TIMEOUT remote system switching, 97
and cluster re-formation, 117 network communication failure, 120
and safety timer, 55 network components
configuring, 160 in Serviceguard, 38
defined, 159 network failure detection
maximum and minimum values , 159 INONLY_OR_INOUT, 93
modifying, 251 INOUT, 92
membership change network manager
reasons for, 62 adding and deleting package IP addresses, 91
memory capacity main functions, 89
hardware planning, 124 monitoring LAN interfaces, 92
memory requirements testing, 400
lockable memory for Serviceguard, 122 network planning
minimizing planned down time, 440 subnet, 124, 131
mirror copies of data network polling interval (NETWORK_POLLING_INTERVAL)
protection against disk failure, 30 parameter in cluster manager configuration, 161
MirrorDisk/UX, 44 network time protocol (NTP)
mirrored disks connected for high availability for clusters, 222
figure, 47 NETWORK_AUTO_FAILBACK
mirroring defined, 164
disks, 44 NETWORK_FAILURE_DETECTION
mirroring disks, 45 defined, 164
mkboot networking
creating a root mirror with, 225 redundant subnets, 124
modular package, 279 networks
monitor cluster with Serviceguard commands, 273 binding to IP addresses, 433
monitored non-heartbeat subnet binding to port addresses, 433
parameter in cluster manager configuration, 155 IP addresses and naming, 431
monitored resource failure node and package IP addresses, 90
Serviceguard behavior, 37 packages using IP addresses, 432
monitoring clusters with Serviceguard Manager, 272 supported types, 38
monitoring hardware, 401 writing network applications as HA services, 427
monitoring LAN interfaces no cluster locks
in network manager, 92 choosing, 66
moving a package, 352 node
multi-node package, 68 basic concepts, 37
multipathing failure, TOC, 116
and Veritas DMP, 46 halt (TOC), 116, 117
automatically configured, 45 in Serviceguard cluster, 30
native, 45 IP addresses, 90
sources of information, 46 timeout and TOC example, 117
multiple systems node types
designing applications for, 430 active, 31
primary, 31
NODE_FAIL_FAST_ENABLED
N effect of setting, 119
name resolution services, 220 NODE_NAME
native mutipathing cluster configuration parameter, 149
defined, 45 parameter in cluster manager configuration, 143
network nodetypes
498 Index
primary, 31 package failover behavior, 177
NTP package failures
time protocol for clusters, 222 responses, 119
package IP address
O defined, 90
package IP addresses, 90
olrad command
defined, 90
removing a LAN or VLAN interface, 372
reviewing, 410
online hardware maintenance
package manager
by means of in-line SCSI terminators, 406
blank planning worksheet, 470
OTS/9000 support, 491
testing, 399
outages
package switching behavior
insulating users from, 426
changing, 353
Package types, 30
P failover, 30
package multi-node, 30
adding and deleting package IP addresses, 91 system multi-node, 30
base modules, 283 package types, 30
basic concepts, 37 packages
changes while the cluster is running, 391 deciding where and when to run, 69
configuring legacy, 375 managed by cmcld, 55
failure, 116 parameters
halting, 351 for failover, 177
legacy, 375 parameters for cluster manager
local interface switching, 93 initial configuration, 59
modular, 283 PATH, 311
modular and legacy, 279 persistent LUN binding
modules, 283 defined, 106
moving, 352 physical volume
optional modules, 284 for cluster lock, 63
parameters, 287 parameter in cluster lock configuration, 145
reconfiguring while the cluster is running, 386 physical volumes
reconfiguring with the cluster offline, 388 creating for clusters, 233
remote switching, 97 filled in planning worksheet, 467
starting, 350 planning, 132
toolkits for databases, 423 worksheet, 133
types, 280 planning
package administration, 350 cluster configuration, 135
solving problems, 413 cluster lock and cluster expansion, 130
package and cluster maintenance, 321 cluster manager configuration, 143
package configuration disk groups and disks, 134
distributing the configuration file, 317, 382 disk I/O information, 126
planning, 168 for expansion, 176
run and halt script timeout parameters, 311 hardware configuration, 122
step by step, 279 high availability objectives, 121
subnet parameter, 311 LAN information, 124
using Serviceguard commands, 375 overview, 121
verifying the configuration, 317, 382 package configuration, 168
writing the package control script, 378 power, 128
package configuration file quorum server, 130
package dependency paramters, 294 SCSI addresses, 125
successor_halt_timeout, 291 SPU information, 123
package coordinator volume groups and physical volumes, 132
defined, 60 worksheets, 127
package dependency worksheets for physical volume planning, 467
parameters, 294 planning for cluster expansion, 122
successor_halt_timeou, 291 planning worksheets
499
blanks, 463 README
point of failure for database toolkits, 423
in networking, 40 reconfiguring a package
point to point connections to storage devices, 51 while the cluster is running, 386
POLLING_TARGET reconfiguring a package with the cluster offline, 388
defined, 165 reconfiguring a running cluster, 365
ports reconfiguring the entire cluster, 365
dual and single aggregated, 104 reconfiguring the lock volume group, 365
power planning recovery time, 135
power sources, 128 redundancy
worksheet, 129 in networking, 40
power supplies of cluster components, 37
blank planning worksheet, 464 redundancy in network interfaces, 38
power supply redundant Ethernet configuration, 40
and cluster lock, 49 redundant LANS
blank planning worksheet, 465 figure, 40
UPS for OPS on HP-UX, 49 redundant networks
Predictive monitoring, 402 for heartbeat, 30
primary LAN interfaces relocatable IP address
defined, 38 defined, 90
primary network interface, 38 relocatable IP addresses, 90
primary node, 31 in Serviceguard, 90
pvcreate remote switching, 97
creating a root mirror with, 224 removing nodes from operation in a running cluster, 343
PVG-strict mirroring removing packages on a running cluster, 318
creating volume groups with, 233 Removing Serviceguard from a system, 398
replacing disks, 402
Q resources
disks, 44
qs daemon, 54
responses
QS_ADDR
to cluster events, 397
parameter in cluster manager configuration, 147
to package and service failures, 119
QS_HOST
responses to failures, 116
parameter in cluster manager configuration, 146
responses to hardware failures, 118
QS_POLLING_INTERVAL
restart
parameter in cluster manager configuration, 147
automatic restart of cluster, 61
QS_TIMEOUT_EXTENSION
following failure, 119
parameter in cluster manager configuration, 147
restartable transactions, 428
quorum
restarting the cluster automatically, 344
and cluster reformation, 117
restoring client connections in applications, 437
quorum server
rolling software upgrades, 447
and safety timer, 55
example, 455
blank planning worksheet, 466
steps, 451
installing, 230
rolling software upgrades (DRD)
parameters in cluster manager configuration, 146, 147
steps, 454
planning, 130
rolling upgrade
status and state, 327
limitations, 451
use in re-forming a cluster, 65
root mirror
worksheet, 131
creating with LVM, 224
rotating standby
R configuring with failover policies, 73
RAID setting package policies, 73
for data protection, 44 RUN_SCRIPT
raw volumes, 427 parameter in package configuration, 311
re-formation RUN_SCRIPT_TIMEOUT (run script timeout)
of cluster, 61 parameter in package configuration, 311
re-formation time, 130 running cluster
500 Index
adding or removing packages , 318 Serviceguard behavior, 37
software planning
S CVM and VxVM, 134
LVM, 132
safety timer
solving problems, 413
and node TOC, 55
SPU information
and syslog.log, 55
planning, 123
duration, 55
standby LAN interfaces
sample cluster configuration
defined, 38
figure, 123
standby network interface, 38
sample disk configurations, 47, 48
starting a package, 350
SCSI addressing, 125, 129
startup of cluster
SECOND_CLUSTER_LOCK_PV
manual, 61
parameter in cluster manager configuration, 145, 157
when all nodes are down, 342
service administration, 350
state
service command
of cluster and package, 322
variable in package control script, 311
stationary IP addresses, 90
service configuration
STATIONARY_IP
step by step, 279
parameter in cluster manager configuration, 155
service failures
status
responses, 119
cmviewcl, 321
service restarts, 119
multi-node packages, 322
SERVICE_CMD
of cluster and package, 322
array variable in package control script, 311
package IP address, 410
in sample package control script, 379
system log file, 410
SERVICE_FAIL_FAST_ENABLED
stopping a cluster, 344
and node TOC, 119
storage management, 106
SERVICE_NAME
SUBNET
in sample package control script, 379
in sample package control script, 379
SERVICE_RESTART
parameter in package configuration, 311
in sample package control script, 379
subnet
Serviceguard
hardware planning, 124, 131
install, 205
parameter in package configuration, 311
introduction, 29
SUBNET (for IP Monitor)
Serviceguard at a glance, 29
defined, 164
Serviceguard behavior after monitored resource failure, 37
successor_halt_timeout parameter, 291
Serviceguard behavior in LAN failure, 37
supported disks in Serviceguard, 44
Serviceguard behavior in software failure, 37
switching
Serviceguard commands
ARP messages after switching, 98
to configure a package, 375
local interface switching, 93
Serviceguard Manager, 33
remote system switching, 97
overview, 32
switching IP addresses, 70, 72, 98
SG-CFS-DG-id# multi-node package, 170
system log file
SG-CFS-MP-id# multi-node package, 170
troubleshooting, 410
SG-CFS-pkg system multi-node package, 170
system message
SGCONF, 206
changing for clusters, 275
shared disks
system multi-node package, 68
planning, 126
used with CVM, 270
single cluster lock
choosing, 64
single point of failure T
avoiding, 30 tasks in Serviceguard configuration, 35
single-node operation, 275, 397 testing
size of cluster cluster manager, 400
preparing for changes, 204 network manager, 400
SMN package, 68 package manager, 399
SNA applications, 435 testing cluster operation, 399
software failure time protocol (NTP)
501
for clusters, 222 vgextend
timeout creating a root mirror with, 225
node, 117 vgimport
TOC using to set up volume groups on another node, 237
and MEMBER_TIMEOUT, 117 VLAN
and package availability, 118 Critical Resource Analysis (CRA), 372
and safety timer, 161 volume group
and the safety timer, 55 creating for a cluster, 233
defined, 55 creating physical volumes for clusters, 233
when a node fails, 116 deactivating before export to another node, 236
toolkits for cluster lock, 63
for databases, 423 planning, 132
traffic type setting up on another node with LVM Commands, 236
LAN hardware planning, 125 worksheet, 133
troubleshooting volume group and physical volume planning, 132
approaches, 409 volume managers, 106
monitoring hardware, 401 comparison, 114
replacing disks, 402 CVM, 113
reviewing control scripts, 412 LVM, 112
reviewing package IP addresses, 410 migrating from LVM to VxVM, 471
reviewing system log file, 410 VxVM, 112
using cmquerycl and cmcheckconf, 412 VOLUME_GROUP
troubleshooting your cluster, 399 parameter in cluster manager configuration, 167
typical cluster after failover vxfend for CVM and CFS, 59
figure, 31 VxVM, 112
typical cluster configuration creating a storage infrastructure, 239
figure, 29 migrating from LVM to VxVM, 471
planning, 134
U VXVM_DG
in package control script, 379
uname(2), 433
UPS
in power planning, 128 W
power supply for OPS on HP-UX, 49 WEIGHT_DEFAULT
use of the cluster lock, 63, 65 defined, 166
USER_HOST, 167 WEIGHT_NAME
USER_NAME, 167 defined, 166
USER_ROLE, 167 What is Serviceguard?, 29
worksheet
V cluster configuration, 167
hardware configuration, 127
verifying cluster configuration, 258
power supply configuration, 129
verifying the cluster and package configuration, 317, 382
quorum server configuration, 131
VERITAS
volume group and physical volumes, 133
CFS and CVM not supported on all HP-UX versions, 32
worksheets
Dynamic Multipathing (DMP), 46
physical volume planning, 467
VERITAS CFS components, 58
worksheets for planning
VERITAS disk group packages
blanks, 463
creating, 263
VERITAS mount point packages
creating, 264
VERITAS system multi-node packages, 261
VG
in sample package control script, 379
vgcfgbackup
and cluster lock data, 260
VGCHANGE
in package control script, 379
502 Index