PowerHA - 3 Basic Configuration
PowerHA - 3 Basic Configuration
PowerHA - 3 Basic Configuration
Basic Configuration
HACMP配置过程
¾ HACMP配置前的准备工作
z 配置IP地址
z 编辑/etc/hosts文件
z 编写应用程序的启动/停止脚本
z 创建vg和文件系统
z 准备串口设备及磁盘心跳设备
¾ HACMP的Standard配置过程
z 添加Cluster和节点
z 配置Cluster资源
z 创建Cluster资源组
z 同步HACMP的配置
¾ HACMP的Extended配置过程
z 添加心跳
z 定制Cluster资源
Page 2
实施专家级课程 PowerHA
Page 3
实施专家级课程 PowerHA
Extended Configuration
1
2
Page 4
实施专家级课程 PowerHA
1.1
1.2
1.3
1.4
1.5
Page 5
实施专家级课程 PowerHA
2.1
2.2
Page 6
实施专家级课程 PowerHA
2.1.1
2.1.2
Page 7
实施专家级课程 PowerHA
2.2.1
2.2.2
Page 8
实施专家级课程 PowerHA
Page 9
实施专家级课程 PowerHA
Page 10
实施专家级课程 PowerHA
Graceful down
Take over
Force down
Page 11
实施专家级课程 PowerHA
Hands-on Description
¾1.Active - Standby(Service ip,2 nodes)
z Simulate Application Takeover
z Parent/Child Dependency
实施专家级课程 PowerHA
Page 13
实施专家级课程 PowerHA
Active Standby
Node: Node:
hb_vlpar1 hb_vlpar2
test1vg
rg1:vlpar1_svc,test1vg 新加部分
Share Disk:hdisk1
Page 14
实施专家级课程 PowerHA
Active Active
Node: Node:
hb_vlpar1 hb_vlpar2
test1vg
test2vg 新加部分
rg1:vlpar1_svc,test1vg
rg2:vlpar2_svc,test2vg
Share Disk:hdisk1/2
Page 15
实施专家级课程 PowerHA
Page 16
实施专家级课程 PowerHA
新加部分
Page 17
实施专家级课程 PowerHA
修改部分
Page 18
实施专家级课程 PowerHA
Case 7.资源组之间的关联关系
Page 19
Thank
You!
© Copyright IBM Corporation 2010
实施专家级课程 PowerHA
Backups
Page 21
实施专家级课程 PowerHA
DARE(Dynamic Reconfiguration )
Page 22
实施专家级课程 PowerHA
Page 23
实施专家级课程 PowerHA
Page 24
实施专家级课程 PowerHA
Page 25
实施专家级课程 PowerHA
¾ HACMP allows you to do many tasks without stopping the cluster; you can do
many tasks dynamically using the DARE and C-SPOC utilities. However, in
order to do the following tasks, you must stop the cluster:
z Change the name of a cluster component: network module, cluster node, or
network interface. Once you configure the cluster, you should not need to
change these names.
z Maintain RSCT.
z Change automatic error notification.
z Convert a service IP label from IPAT via IP Replacement to IPAT via IP
Aliases.
Page 26
实施专家级课程 PowerHA
Page 27
实施专家级课程 PowerHA
日常管理
z clshowsrv –v
查询HACMP子系统的状态
z clRGinfo
显示资源组目前的状态
z cllscf/cltopinfo
显示集群拓扑结构信息
z clshowres
显示资源组的配置信息
z cllsnw、cllsif
显示集群网络信息
z clstat(需要启动clinfoES服务)
显示集群内所有节点运行情况
Page 28
实施专家级课程 PowerHA
日常管理
z /usr/sbin/snap –e
collects the hacmp data.
z /usr/sbin/rsct/bin/dhb_read –p devicename –r/-t
test the link status of the disk heartbeating path.
z clpasswd
Changes a user’s password on each node in the cluster.
z cllsdisk
Lists PVIDs of accessible disks in a specified resource chain
z cllsvg
List volume groups accessible in a specified resource chain.
z cllsparam
Lists runtime parameters.
Page 29
实施专家级课程 PowerHA
日常管理
z cl_clstop
Stops cluster services on nodes running C-SPOC.
z cl_lsfs
Displays shared filesystem attributes for all cluster nodes.
z cl_lsgroup
Displays group attributes for all cluster nodes.
z cl_lslv
Displays shared logical volume attributes for cluster nodes.
z cl_lsuser
Displays user account attributes for all nodes.
z cl_lsvg
Displays shared volume group attributes for cluster nodes.
Page 30
实施专家级课程 PowerHA
日常管理-参数调整
z I/O pacing
每当系统内有其它应用在做大量的I/O操作时,用户可能会碰到如交互性能受到严重影响等
问题,能够通过调整系统的I/O pacing,以使系统在大量的磁盘读写操作期间的资源分配
更加均衡。可以使用smitty chgsys 去设置I/O pacing 到high-water 和low-water ,缺省值
为“0”(disable I/O pacing)。
z 改变故障检测速率
如果在群集内enable I/O pacing或延长syncd频率而不能解决deadman问题,在deadman
switch 在挂起节点上被请求之前和接管节点检测一个节点故障而获得挂起节点的资源之
前,可通过改变故障检测速率到–“slow”,延长这个被请求的时间。
z syncd 频率
编辑/sbin/rc.boot 文件去增加syncd 频率,可以从缺省的60 秒到30、20、10 秒,增加此
频率可在繁重的I/O传输期间促使更频繁的 I/O flush 和减少触发deadman switch的可能
性。
Page 31
实施专家级课程 PowerHA
HACMP相关的日志文件1/7
z /tmp/clstrmgr.debug
Contains time-stamped, formatted messages generated by the clstrmgrES daemon. The default messages
are verbose and are typically adequate for troubleshooting most problems, however IBM support may direct
you to enable additional debugging.
Recommended Use: Information in this file is for IBM Support personnel.
z /tmp/cspoc.log☺
Contains time-stamped, formatted messages generated by HACMP C-SPOC commands. The
/tmp/cspoc.log file resides on the node that invokes the C-SPOC command.
Recommended Use: Use the C-SPOC log file when tracing a C-SPOC command’s execution on cluster
nodes.
z /tmp/emuhacmp.out
Contains time-stamped, formatted messages generated by the HACMP Event Emulator. The messages are
collected from output files on each node of the cluster, and cataloged together into the /tmp/emuhacmp.out
log file. In verbose mode (recommended), this log file contains a line-by-line record of every event emulated.
Customized scripts within the event are displayed, but commands within those scripts are not executed.
Page 32
实施专家级课程 PowerHA
HACMP相关的日志文件2/7
z /var/hacmp/log (V5.4以前/tmp/hacmp.out )
Contains time-stamped, formatted messages generated by HACMP scripts on the current
day.In verbose mode (recommended), this log file contains a line-by-line record of every
command executed by scripts,including the values of all arguments to each command.An
event summary of each high-level event is included at the end of each event’s details.
Recommended Use: Because the information in this log file supplements and expands
upon the information in the /usr/es/adm/cluster.log file, it is the primary source of
information when investigating a problem.
Note: With recent changes in the way resource groups are handled and prioritized in
fallover circumstances, the hacmp.out file and its event summaries have become even
more important in tracking the activity and resulting location of your resource groups. In
HACMP releases prior to 5.2, non-recoverable event script failures result in the
event_error event being run on the cluster node where the failure occurred. The
remaining cluster nodes do not indicate the failure. With HACMP 5.2 and up, all cluster
nodes run the event_error event if any node has a fatal error. All nodes log the error and
call out the failing node name in the hacmp.out log file
Page 33
实施专家级课程 PowerHA
HACMP相关的日志文件3/7
z /usr/es/adm/cluster.log
Contains time-stamped, formatted messages generated by HACMP scripts and daemons.
Recommended Use: Because this log file provides a high-level view of current cluster status, check this file first
when diagnosing a cluster problem.
z /usr/es/sbin/cluster/history/cluster.mmddyyyy
Contains time-stamped, formatted messages generated by HACMP scripts. The system creates a cluster history file
every day, identifying each file by its file name extension,where mm indicates the month, dd indicates the day, and
yyyy the year. For information about viewing this log file and interpreting its messages, see the section
Recommended Use: Use the cluster history log files to get an extended view of cluster behavior over time.Note that
this log is not a good tool for tracking resource groups processed in parallel. In parallel processing,certain steps
formerly run as separate events are now processed differently and these steps will not be evident in the cluster
history log. Use the hacmp.out file to track parallel processing activity.
z /usr/es/sbin/cluster/snapshots/clsnapshot.log
Contains logging information from the snapshot utility of HACMP, and information about errors found and/or actions
taken by HACMP for resetting cluster tunable values.
Page 34
实施专家级课程 PowerHA
HACMP相关的日志文件4/7
z /var/adm/clavan.log
Contains the state transitions of applications managed by HACMP. For example, when each application managed by HACMP is
started or stopped and when the node stops on which an application is running. Each node has its own instance of the file. Each
record in the clavan.log file consists of a single line. Each line contains a fixed portion and a variable portion:
Recommended Use: By collecting the records in the clavan.log file from every node in the cluster, a utility program can
determine how long each application has been up, as well as compute other statistics describing application availability time.
z /var/ha/log/grpglsm
Contains time-stamped messages in ASCII format. These track the execution of internal activities of the RSCT Group Services
Globalized Switch Membership daemon. IBM support personnel use this information for troubleshooting. The file gets trimmed
regularly.Therefore, please save it promptly if there is a chance you may need it.
z /var/ha/log/grpsvcs
Contains time-stamped messages in ASCII format. These track the execution of internal activities of the RSCT Group Services
daemon. IBM support personnel use this information for troubleshooting. The file gets trimmed regularly. Therefore, please save it
promptly if there is a chance you may need it.
Page 35
实施专家级课程 PowerHA
HACMP相关的日志文件5/7
z /var/ha/log/topsvcs
Contains time-stamped messages in ASCII format. These track the execution of internal activities of the RSCT Topology Services
daemon. IBM support personnel use this information for troubleshooting. The file gets trimmed regularly. Therefore, please save it
promptly if there is a chance you may need it.
z /var/hacmp/clcomd/clcomddiag.log
Contains time-stamped, formatted, diagnostic messages generated by clcomd.
Recommended Use: Information in this file is for IBM Support personnel.
z /var/hacmp/clcomd/clcomd.log
Contains time-stamped, formatted messages generated by Cluster Communications daemon (clcomd) activity. The log shows
information about incoming and outgoing connections, both successful and unsuccessful. Also displays a warning if the file
permissions for /usr/es/sbin/cluster/etc/rhosts are not set correctly—users on the system should not be able to write to the file.
Recommended Use: Use information in this file to troubleshoot inter-node communications, and to obtain information about
attempted connections to the daemon (and therefore to HACMP).
z /var/hacmp/clverify/clverify.log ☺
The /var/hacmp/clverify/clverify.log file contains the verbose messages output by the cluster verification utility.The messages
indicate the node(s), devices, command, etc. in which any verification error occurred.
Page 36
实施专家级课程 PowerHA
HACMP相关的日志文件6/7
z /var/hacmp/log/clutils.log
Contains information about the date, time, results, and which node performed an automatic cluster
configuration verification. It also contains information for the file collection utility,the two-node cluster
configuration assistant, the cluster test tool and the OLPW conversion tool.
z /var/hacmp/log/cl_configassist.log
Contains debugging information for the Two-Node Cluster Configuration Assistant. The Assistant stores up
to ten copies of the numbered log files to assist with troubleshooting activities.
z /var/hacmp/log/cl_testtool.log ☺
Includes excerpts from the hacmp.out file. The Cluster Test Tool saves up to three log files and numbers
them so that you can compare the results of different cluster tests.The tool also rotates the files with the
oldest file being overwritten
Page 37
实施专家级课程 PowerHA
HACMP相关的日志文件7/7
z 修改默认日志目录
1. Enter smit hacmp
2. In SMIT, select System Management (C-SPOC) > HACMP Log Viewing and Management
> Change/Show a Cluster Log Directory.
3. Select a log that you want to redirect
Page 38
实施专家级课程 PowerHA
Q & A 规划-网络通讯
z Persistent ip
当HACMP成功启动,A机Persistent IP绑定ent0,Server IP绑定ent1(B机情况正常,忽
略不谈),
- 如果1:拔ent0的网线,正常情况下,Persistent IP应该会漂移到ent1上,但是发现并没有漂移,并且
此时ping Server IP丢包严重
- 如果2:拔ent1的网线,Server IP正常的漂移到ent0上,此时ent0有3个IP(boot1、Persistent IP、
Server IP),然后接回ent1的网线,没有任何动作(应该是正确的情况吧?),再拔除ent0的网线,
发现Persistent IP和Server IP成功漂移到ent1上
问题:为什么会出现这样的问题呢?Persistent IP正常的情况下应该是可以在本机ent0和
ent1进行漂移的。(测试环境AIX5.3TL06SP01 HACMP5.4.0.1)
回答:这属于正常情况
Page 39
实施专家级课程 PowerHA
Q & A 规划-网络通讯
z Persistent ip
HACMP中能否指定persistent 使用某一块网卡?
回答:在配置当中不能指定,可以通过ifconfig命令来修改
Page 40
实施专家级课程 PowerHA
Q & A 规划-网络通讯
z Disk HeartBeat
最近看到一篇文档,发现把磁盘心跳的盘做成了资源组,请问什么时候需要这样做呢?
回答:目前用于磁盘心跳的VG可以用做其他用途,例如创建文件系统、配置成concurrent
rource group等;不过需要确保该盘不能太繁忙,如果过于繁忙,会引起dead man
swith。
Page 41
实施专家级课程 PowerHA
Q & A 规划-网络通讯
z 串口网络
有关HACMP心跳网络的各种常用实施方法,希望举例多台(三台或三台以上)小型机使
用八步异口卡配置HACMP ?
回答:根据不同的资源组类型,来定义不同的串口网络,主要分网状、星型状或环状三种
拓扑结构;建议在需要相互切换的节点之间,都需要配置non-ip网络,避免cluster被孤立
的情况。
Page 42
实施专家级课程 PowerHA
Q & A 规划-网络通讯
z 网关
主要就是网关如果加入?在脚本中加入?还是配置/etc/rc.net加入呢?还是有别的建议呢?
回答:对于service的网关,可以在rg的启动脚本中增加,也可以增加persistent ip来解
决。
Page 43
实施专家级课程 PowerHA
Q & A 规划-网络通讯
z EtherChannel
如果两块网卡做捆绑,在HACMP中是否需要有特殊的设置?
回答:没有特殊的配置.
Page 44
实施专家级课程 PowerHA
Q & A 规划-网络通讯
z Site
配置oracle rac需要hacmp做那些准备工作,HACMP的site是否必须配置?
回答:由于site用在异地灾备环境,在oracle rac运行环境中,一般不需要配置site.
Page 45
实施专家级课程 PowerHA
Q & A 规划-网络通讯
z rlogin
HACMP5.3、5.4是否可以不用配置rlogin环境?
回答:不需要,从hacmp 5.1开始就不需要了,hacmp采用clcomdES守护进程进行节点间
访问
Page 46
实施专家级课程 PowerHA
Page 47
实施专家级课程 PowerHA
Page 48
实施专家级课程 PowerHA
Page 49
实施专家级课程 PowerHA
Q & A 日常管理
z Take over
在配置EMC CX系列带PowerPath与IBM HACMP 5.2环境时,在HACMP切换时,对存储
的LUN每个进行SCSI Reservation Reset时,每个LUN花很长时间处理 ?
当共享卷组中包含大量的LV时(比如几百个),切换业务时表现得非常慢,这是正常现象
还是配置有问题?
回答:建议采用fast disk takeover方式
Page 50
实施专家级课程 PowerHA
Q & A 日常管理
z ip change
有关HACMP测试过程中反复插拔网线,往往发现网卡地址和实际配置不同,比如主网卡配置
172.168.1.1,备网卡配置192.168.1.1,经过反复插拔网线后,使用netstat –in命令发现主网
卡上地址是192.168.1.1,备网卡则成了172.168.1.1,明显和smitty tcpip里面的实际配置相反?
回答:基于ipat over replace方式的网络拓扑,这是正常现象。
Page 51
实施专家级课程 PowerHA
Q & A 日常管理
z Take over
共享存储用的是富士通的,切换测试时带takeover切换都正常,但这天遇麻烦了,主机突然
DOWN机,备机HACMP在获取共享存储时,无法清除硬盘上事前在主机上设置的的保留
标志,导致共享硬盘无法访问,进而共享卷组激活失败,最后备机无法接管主机的应用。
同主机的其它LPAR的HACMP用的存储是IBM ESS.就没有出现这情况.很轻松的就接管或
是手工的将RG move to backup-node.不知道是不是富士通的存储与IBM HACMP有兼容性
问题呢 ?
回答:一些第三方存储的解锁机制与IBM的存储不同,需要咨询存储厂家具体的解锁方
法,加到cl_disk_available脚本中。
Page 52