OpenNode HA Cluster setup (2-node active/passive)

OpenNode High Availability Cluster solutions are based on Red Hat Cluster Suite (RHCS) software. In this howto we describe one of the simplest ways to setup basic HA Cluster with shared block device (SAN or iSCSI) - resulting with simple 2-node active/passive failover setup which is ment to handle physical server failure.

For more featured cluster configurations - like VM level HA (for both OpenVZ and KVM), DRBD replicated storage, GFS2 shared storage and Load-Balanced Clusters - please contact us for commercial support: info@opennodecloud.com.

More detailed overview about RHCS can be obtained here: https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/High_Availability_Add-On_Overview/index.html

Complete RHCS management documentation can be found here: https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/index.html

Requirements

2 or more physical cluster nodes required
network equipment should allow UDP multicasting in local LAN for cluster heartbeat
shared block device between cluster nodes for simple cluster storage required (SAN or iSCSI)
separate and dedicated network link between nodes for cluster heartbeat is recommended
is case of 2-node cluster using also quorum disk is highly recommended

Simple 2-node failover cluster design for Highly Available OpenVZ service

This basic failover cluster is ment to handle server hardware failures - eg in case one of the cluster nodes fails all VM-s are automatically restarted on another cluster node. For simplicity we use shared block device between cluster nodes - but not shared failsystem. Filesystem residing on shared block device is active ONLY on single cluster node at the time - eg it gets mounted only on node where currently all VMs are running. HA service being created and monitored by Cluster Manager is OpenVZ (vz) service.

Installing RHCS software packages

# Execute these tasks on ALL cluster nodes
yum groupinstall "High Availability" "Resilient Storage" -y

# Enable and start ricci service
chkconfig ricci on && service ricci start

# Generate ricci password - needed later for cluster node addition
passwd ricci

# Enable cluster services
chkconfig cman on && chkconfig rgmanager on && chkconfig modclusterd on

Dedicated network connection for cluster heartbeat (multicast)

# If your server has more than one network interface it is
# strongly suggested that you setup a separate directly cabled
# link between server nodes for cluster heartbeat

# Configure dedicated network interfaces on both nodes
# with some unused private LAN IP addresses
nano -w /etc/sysconfig/network-scripts/ifcfg-ethX
--- MODIFY ---
NM_CONTROLLED="no"
ONBOOT="yes"
IPADDR=192.168.50.x
NETMASK=255.255.255.0
--- MODIFY ---

# Add additional hostnames (host-p attached) to both nodes /etc/hosts
# in order to route cluster heartbeat into dedicated link
nano -w /etc/hosts
--- ADD ---
192.168.50.1 node1-p.example.com node1-p
192.168.50.2 node2-p.example.com node2-p
--- ADD ---

Creating 2-node HA Cluster

# Execute the following commands on one of cluster nodes
ccs -h localhost --createcluster VZHA
ccs -h localhost --addnode node1-p.example.com
ccs -h localhost --addnode node2-p.example.com
ccs -h localhost --setfencedaemon post_join_delay=30

#For 2-node cluster ONLY
ccs -h localhost --setcman expected_votes="1" two_node="1"

#Validate cluster conf
ccs_config_validate
ccs -h localhost --getconf

#NB! Don't forget to sync and activate cluster.conf!!!
ccs -h localhost --sync --activate

#Start cman services on ALL cluster nodes
service cman start
service rgmanager start
service modclusterd start

#Verify cluster status
cman_tool status
clustat

Attaching shared iSCSI block device to ALL cluster nodes (skip if using SAN block device)

# You need to perform these command on ALL cluster nodes
# Install iSCSCI utils
yum install iscsi-initiator-utils -y

# Enable and start iscsid service
service iscsid start && chkconfig iscsid on

#Test discovery
iscsiadm --mode discovery --type sendtargets --portal portal_ip_hostname

# Log into the iSCSI target - to make it persistent across reboots
iscsiadm -m node -T iqn.xxx.com.example.iscsi:diskname -p portal_ip_hostname -l

# Verify that disk is attached
cat /proc/partitions

Enabling Clustered LVM (CLVM)

# You need to execute these commands on ALL cluster nodes!
# Changing LVM locking on all nodes to clustered type
nano -w /etc/lvm/lvm.conf
--- MODIFY ---
locking_type = 3
--- MODIFY ---

# Enabling and starting clvmd service
service clvmd start && chkconfig clvmd on

Creating clustered LVM data volume for VM storage

# Applies to both shared iSCSI or SAN block device
# Creating LVM Physical Volume (PV) and clustered Volume Group (VG)
# on top of shared vlock device
# Execute on one of the nodes
pvcreate /dev/sdb
vgcreate -c y sanvg1 /dev/sdb

# Creating clustered LVM data volume for /vz partition
lvcreate -L 50G -n storage sanvg1

# Creating filesystem
mkfs.ext4 /dev/sanvg1/storage

# Verify that sanvg1 is clustered
service clvmd status

Creating clustered directory structure and migrating OpenVZ config files

### ABOUT ARCHITECTURE ###
# /etc/init.d/vz will be cluster service which depends on /storage/local/vz filesystem subresource
# VE config files will be put into /storage/local/vz filesystem subresource - which gets mounted before vz service starts
# VEs are started/stopped by vz service

# Execute on ALL cluster nodes
chkconfig vz off && service vz stop

# Execute only on master node
mount /dev/sanvg1/storage /mnt
rsync -av /storage/* /mnt/
mkdir -p /mnt/local/vz
rsync -av /vz/* /mnt/local/vz/
rmdir /mnt/local/vz/lost+found
umount /mnt

# Execute on ALL cluster nodes
# Move vz folders as original locations will be symlinked
mv /etc/vz /etc/vz.orig
mv /etc/sysconfig/vz-scripts /etc/sysconfig/vz-scripts.orig
mv /var/vzquota /var/vzquota.orig

# Unmount old /vz and /storage partitions
umount /vz
umount /storage

# Remove old /storage /vz mounts from /etc/fstab
sed -i '/\/storage/d' /etc/fstab
sed -i '/\/vz/d' /etc/fstab

# Fake symlink targets on all nodes for vz-lib updates - otherwise they will destroy symlinks
mkdir -p /storage/local/vz/etc/sysconfig
mkdir -p /storage/local/vz/etc/vz
mkdir -p /storage/local/vz/var/vzquota

# Copy original vz folders into "fake" locations
rsync -av /etc/vz.orig/ /storage/local/vz/etc/vz/
rsync -av /etc/sysconfig/vz-scripts.orig/ /storage/local/vz/etc/sysconfig/vz-scripts/
rsync -av /var/vzquota.orig/ /storage/local/vz/var/vzquota/

# Relocate /vz to /storage/local/vz
rmdir /vz
cd / && ln -s /storage/local/vz

# Execute only on master node
# Mount clustered /storage volume
mount /dev/sanvg1/storage /storage

# Create OpenVZ dirs
mkdir -p /storage/local/vz/etc/sysconfig
mkdir -p /storage/local/vz/etc/vz
mkdir -p /storage/local/vz/var/vzquota

# Copy original vz dirs contents
rsync -av /etc/vz.orig/ /storage/local/vz/etc/vz/
rsync -av /etc/sysconfig/vz-scripts.orig/ /storage/local/vz/etc/sysconfig/vz-scripts/
rsync -av /var/vzquota.orig/ /storage/local/vz/var/vzquota/

# Unmount clustered /storage volume
umount /storage

# Execute on ALL cluster nodes
# Re-link vz dirs
ln -s /storage/local/vz/etc/vz /etc/vz
ln -s /storage/local/vz/etc/sysconfig/vz-scripts /etc/sysconfig/vz-scripts
ln -s /storage/local/vz/var/vzquota /var/vzquota

Setting up failover domain

ccs -h localhost --addfailoverdomain VZHA ordered nofailback
ccs -h localhost --addfailoverdomainnode VZHA node1-p.example.com 1
ccs -h localhost --addfailoverdomainnode VZHA node2-p.example.com 2
ccs -h localhost --lsfailoverdomain
ccs -h localhost --sync --activate

Setting up cluster service and resources

# Add cluster service named vz
ccs -h localhost --addservice vz domain=VZHA recovery=relocate autostart=0
# Add shared block device subresource
ccs -h localhost --addsubservice vz fs name=storage device=/dev/sanvg1/storage mountpoint=/storage fstype=ext4 options=noatime,nodiratime
# Add vz init script subresource
ccs -h localhost --addsubservice vz fs:script file=/etc/init.d/vz name=vzctl
# Populate and activate cluster config
ccs -h localhost --sync --activate

# Display cluster status
clustat
# Enable clustered vz service
clusvcadm -e vz
# Do relocation test for vz service - see it migrating between nodes
clusvcadm -r vz

Setup IPMI devices on cluster nodes

# While there might be other methods for cluster fencing -
# most common is to use servers IPMI interfaces for power control.
# Here we install OpenIPMI kernel driver and ipmitool -
# together with ipmi configuration

# Install OpenIPMI and load kernel module
yum install OpenIPMI OpenIPMI-tools -y
service ipmi start

# We don't recommend leaving IPMI service running
# as ipmi kernel driver has been unstable in longer run
#chkconfig ipmi on

# How to configure BMC IPMI LAN device from OS
ipmitool -I open lan set 1 ipsrc static
ipmitool -I open lan set 1 ipaddr 192.168.1.10
ipmitool -I open lan set 1 netmask 255.255.255.0
ipmitool -I open lan set 1 access on
ipmitool -I open lan set 1 defgw ipaddr 192.168.1.1
# Setup IPMI user
ipmitool -I open user set name 2 admin
ipmitool -I open user set password 2 passwd
ipmitool -I open user enable 2

# Read IPMI configuration
ipmitool -I open lan print 1
ipmitool -I open user list 1

# Testing IPMI power control state
ipmitool -I lan -H ipmi_ip -U ipmi_user chassis power status

NB! Setup cluster fencing!

# Most important is to set up cluster fencing correctly -
# otherwise HA failover wont work

# List available fencing options
ccs -h localhost --lsfenceopts

# We are going to setup servers power control through
# standard IPMI compliant baseboard management controller -
# that modern servers all tend to have nowdays
# NB! Replace ipmi_ip, ipmi_user and ipmi_passwd with yours
ccs -h localhost --addfencedev node1fence agent=fence_ipmilan ipaddr=ipmi_ip login=ipmi_user passwd=ipmi_passwd auth=password action=off
ccs -h localhost --addfencedev node2fence agent=fence_ipmilan ipaddr=ipmi_ip login=ipmi_user passwd=ipmi_passwd auth=password action=off

ccs -h localhost --addmethod IPMI node1-p.example.com
ccs -h localhost --addmethod IPMI node2-p.example.com

ccs -h localhost --addfenceinst node1fence node1-p.example.com IPMI
ccs -h localhost --addfenceinst node2fence node2-p.example.com IPMI

ccs -h localhost --sync --activate

Setup Quorum Disk for avoiding split-brain situations with 2-node cluster

# While strictly not needed - it is highly recommended to setup
# quorum disk - especially with 2-node clusters

# Create qdisk on a shared block device (iSCSI or SAN) -
# 64MB disk size is sufficient.
# NB! Clustered LVM LVs wont work - has to plain shared block device!
# Exec once on single cluster node
mkqdisk -c /dev/mapper/mpathX -l VZHAQ
# Check on both nodes that you see the quorum device
mkqdisk -L -d

ccs -h localhost --setquorumd interval=2 label=VZHAQ tko=5 votes=1
# Token interval should be longer than (tko+1)*interval(qdisk)
ccs -h localhost --settotem token=33000
ccs -h localhost --sync --activate

# Startup gdiskd
ccs -h localhost --stopall
ccs -h localhost --startall
ccs -h localhost --setcman expected_votes="3" two_node="0"
ccs -h localhost --sync --activate

Adding simple KVM HA into mix!

Please give us feedback about your OpenNode installation and register it by dropping us a note to the following email address: info@opennodecloud.com

In return we provide you with instructions and code how to add simple KVM HA service into mix!