This page documents how to set up a two-node cluster test with Cluster Suite/High Availability, comparing fencing methods between KVM virtual machines and bare-metal servers. You will be required to do some manual editing of xml files, but it’s a lot easier to view than some other clustering file formats. These examples were done with the luci/conga GUI along with editing the xml in cluster.conf.
Hardware requirements:
One bare-metal server for the cluster master, and two servers (virtual or physical) for the cluster nodes.
Software requirements:
RHEL6, and a RHEL subscription to the High Availability package/add-on channel. We will be working with the packages luci, ricci, rgmanager, and cman.
At first, I thought this must be a tribute to a famous 50’s TV show couple, but what’s with the ricci instead of ricky? Maybe the developers were a fan of a certain actress with that last name.
Package installation
Install these packages on the cluster master:
yum groupinstall "High Availability" or: yum install luci
passwd luci ; service luci start yum install ricci ; passwd ricci
Install on cluster nodes:
yum install ricci ; passwd ricci ; service ricci start ; chkconfig ricci on
192.168.1.2 test1.domain.org 192.168.1.3 test2.domain.org 192.168.1.4 test1-vm.domain.org 192.168.1.5 test2-vm.domain.org
HOSTNAME=test1.domain.org
Add nodes
<clusternodes> <clusternode name="test1.domain.org" nodeid="1"/> <clusternode name="test2.domain.org" nodeid="2"/> </clusternodes>
Add fence devices
Create fence devices for Dell DRAC5
<clusternodes> <clusternode name="test1.domain.org" nodeid="1"/> <fence> <method name="fence_drac5"> <device name=test1-drac"/> </method> </fence> </clusternode> <clusternode name="test2.domain.org" nodeid="2"/> <fence> <method name="fence_drac5"> <device name=test2-drac"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice agent="fence_drac5" ipaddr="192.168.0.1" login="root" \ module_name="server-1" name="test1-drac" passwd="addpasshere" action="reboot" \ secure="on"/> <fencedevice agent="fence_drac5" ipaddr="192.168.0.2" login="root" \ module_name="server-2" name="test2-drac" passwd="addpasshere" action="reboot" \ secure="on"/> </fencedevices>
Note: module_name is a required option which refers to the DRAC’s definition of which server is to be accessed. The cluster.conf validation wouldn’t work without it.
Other sources mentioned adding command_prompt=”admin->” but this didn’t work for me.
Add fencing for KVM virtual machines
This configuration applies to two VMs running on the same physical host, using fence_xvm.
<clusternodes> <clusternode name="test1-vm.domain.org" nodeid="1"/> <fence> <method name="1"> <device domain="test1-vm" name="fence_xvm"/> </method> </fence> </clusternode> <clusternode name="test2-vm.domain.org" nodeid="2"/> <fence> <method name="1"> <device domain="test2-vm" name="fence_xvm"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice agent="fence_xvm" name="fence_xvm"/> </fencedevices> <rm>
You could, of course, configure fence_xvm as the first fencing method, fence_drac5 as the backup method. This would work, but if you have other virtual machines running on the same host, they would all be restarted with the fencing.
At the time of this writing, fence_virt/fence_virsh across multiple physical hosts is still not very well documented or supported. This is a bit disappointing since it’s a feature which many virtualization users could use.
Test Your Fencing Methods
On one node, try: cman_tool kill -n nodename.fqdn.org
and watch the logs on the other node for successful fencing, and takeover.
Other methods of triggering fence:
1. service network stop
2. echo c > /proc/sysrq-trigger # hangs the system
3. pull the network cable on one node
Add failover domain
Create domain “failover1” and add the two nodes to it. This section defines which nodes are a member of the failover group that will be active.
Add service group
<resources> <ip address="192.168.1.10" monitor_link="on" sleeptime="10"/> </resources> <service autostart="0" domain="failover1" name="servicegroup1" recovery="restart"> <ip ref="192.168.1.10"/> </service>