Getting Started¶

This page will help you take your first steps with OpenSVC services setup.

It will guide you through the sequence of tasks to achieve a simple but working dual-node failover cluster.

Prerequisites¶

The demonstration environment is composed of:

A Suse Linux Enterprise Server 11 SP3 (SLES11SP3) named sles1 => will act as first cluster node
A Suse Linux Enterprise Server 11 SP3 (SLES11SP3) named sles2 => will act as second cluster node
A Storage Array capable of exporting block devices to both nodes.

In this guide, we use iSCSI luns exported from an OpenFiler instance (http://www.openfiler.com)

FC Luns exported from high end arrays (EMC, HDS, IBM, ...) would also work, as long as the server share the same logical units

As we plan to create 2 OpenSVC services, we need 2 IP adresses, one for each service :

p26.opensvc.com <=> 37.59.71.26
p27.opensvc.com <=> 37.59.71.27

iSCSI Target Configuration¶

The OpenFiler configuration being web-based, we can easilly create the following objects:

2 iSCSI Targets
2 x 32 MBytes Logical Units

And finally:

Map both luns to both iscsi targets
Allow nodes sles1 and sles2 access through both iscsi targets

This setup serves each lun through 2 paths, thus simulating lun access redundancy.

iSCSI Initiator Configuration¶

First, we need the iscsi initiator software installed on the SLES Servers. The open-iscsi package will be used for this setup:

On both nodes:

sles1:/ # zypper install open-iscsi

Then, we have to specify iscsi initiator name for each node:

On both nodes:

sles1:/ # cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1994-05.com.suse:sles1

sles2:/ # cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1994-05.com.suse:sles2

We start iscsi services, and enable the daemon for boot-time start-up:

On both nodes:

# /etc/init.d/open-iscsi restart
Stopping iSCSI initiator service: Closing all iSCSI connections:    done
Starting iSCSI initiator service:                                   done
Setting up iSCSI targets:                                           unused

# chkconfig --add open-iscsi
open-iscsi                0:off  1:off  2:off  3:on   4:off  5:on   6:off

It's now time to discover the target ports serving our iscsi luns:

On both nodes:

# lsscsi
[0:0:0:0]    cd/dvd  QEMU     QEMU DVD-ROM     0.12  /dev/sr0
[1:0:0:0]    cd/dvd  QEMU     QEMU DVD-ROM     0.12  /dev/sr1

# iscsiadm --mode discovery --type sendtargets --portal openfiler.opensvc.com
37.59.71.21:3260,1 iqn.2006-01.com.openfiler:tsn.sles.2
37.59.71.21:3260,1 iqn.2006-01.com.openfiler:tsn.sles.1

# iscsiadm -m node --login
Logging in to [iface: default, target: iqn.2006-01.com.openfiler:tsn.sles.1, portal: 37.59.71.21,3260] (multiple)
Logging in to [iface: default, target: iqn.2006-01.com.openfiler:tsn.sles.2, portal: 37.59.71.21,3260] (multiple)
Login to [iface: default, target: iqn.2006-01.com.openfiler:tsn.sles.1, portal: 37.59.71.21,3260] successful.
Login to [iface: default, target: iqn.2006-01.com.openfiler:tsn.sles.2, portal: 37.59.71.21,3260] successful.

# lsscsi
[0:0:0:0]    cd/dvd  QEMU     QEMU DVD-ROM     0.12  /dev/sr0
[1:0:0:0]    cd/dvd  QEMU     QEMU DVD-ROM     0.12  /dev/sr1
[2:0:0:0]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdb
[2:0:0:1]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdd
[3:0:0:0]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sda
[3:0:0:1]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdc

As we have multiple paths to the same luns, through multiple targets, we have to setup linux native multipath software :

On both nodes:

# chkconfig --add multipathd
multipathd                0:off  1:off  2:off  3:on   4:off  5:on   6:off

# multipath -l
Feb 17 13:15:47 | DM multipath kernel driver not loaded

# /etc/init.d/multipathd start
Starting multipathd                                           done

# multipath -l
14f504e46494c45524d46646433322d476348562d33724c44 dm-0 OPNFILER,VIRTUAL-DISK
size=32M features='0' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| `- 2:0:0:0 sdb 8:16 active undef running
`-+- policy='service-time 0' prio=0 status=enabled
  `- 3:0:0:0 sda 8:0  active undef running
14f504e46494c45526461484d656c2d5a6f416f2d33596b52 dm-1 OPNFILER,VIRTUAL-DISK
size=32M features='0' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| `- 2:0:0:1 sdd 8:48 active undef running
`-+- policy='service-time 0' prio=0 status=enabled
  `- 3:0:0:1 sdc 8:32 active undef running

The shared storage setup is operational.

Storage Configuration¶

We use Linux LVM to manage our storage. As we plan to create 2 services, we assign 1 lun to each OpenSVC service.

On sles1 node

Physical volume creation:

sles1:/ # pvcreate /dev/mapper/14f504e46494c45524d46646433322d476348562d33724c44
  Physical volume "/dev/mapper/14f504e46494c45524d46646433322d476348562d33724c44" successfully created
sles1:/ # pvcreate /dev/mapper/14f504e46494c45526461484d656c2d5a6f416f2d33596b52
  Physical volume "/dev/mapper/14f504e46494c45526461484d656c2d5a6f416f2d33596b52" successfully created

Volume group creation:

sles1:/ # vgcreate vgsvc1 /dev/mapper/14f504e46494c45524d46646433322d476348562d33724c44
  Volume group "vgsvc1" successfully created
sles1:/ # vgcreate vgsvc2 /dev/mapper/14f504e46494c45526461484d656c2d5a6f416f2d33596b52
  Volume group "vgsvc2" successfully created

Logical volume creation for the first service:

sles1:/ # lvcreate -L 10M -n lvdatasvc1 vgsvc1
  Rounding up size to full physical extent 12,00 MiB
  Logical volume "lvdatasvc1" created
sles1:/ # lvcreate -L 10M -n lvappsvc1 vgsvc1
  Rounding up size to full physical extent 12,00 MiB
  Logical volume "lvappsvc1" created

Logical volume creation for the second service:

sles1:/ # lvcreate -L 10M -n lvdatasvc2 vgsvc2
  Rounding up size to full physical extent 12,00 MiB
  Logical volume "lvdatasvc2" created
sles1:/ # lvcreate -L 10M -n lvappsvc2 vgsvc2
  Rounding up size to full physical extent 12,00 MiB
  Logical volume "lvappsvc2" created

Filesystem creation for both services:

sles1:/ # mkfs.ext3 -m 0 /dev/mapper/vgsvc1-lvappsvc1
sles1:/ # mkfs.ext3 -m 0 /dev/mapper/vgsvc1-lvdatasvc1
sles1:/ # mkfs.ext3 -m 0 /dev/mapper/vgsvc2-lvappsvc2
sles1:/ # mkfs.ext3 -m 0 /dev/mapper/vgsvc2-lvdatasvc2

On both nodes

Mountpoint creation for both services:

sles1:/ # mkdir -p /svc1/app /svc1/data
sles1:/ # mkdir -p /svc2/app /svc2/data

OpenSVC Installation¶

We will follow the steps described in Nodeware installation

Install the OpenSVC Agent on both cluster nodes.

On both nodes:

# wget -O /tmp/opensvc.latest.rpm https://repo.opensvc.com/rpms/current
# rpm -Uvh /tmp/opensvc.latest.rpm
# rpm -qa | grep opensvc
opensvc-1.5-10303

The OpenSVC agent is now operational.

SSH Keys Setup¶

Cluster members communicate through ssh. Each node must trust its peer through key-based authentication to allow these communications.

sles1 will be able to connect to sles2 as root.
sles2 will be able to connect to sles1 as root.

Note

It is also possible for the agent to login on a peer cluster node using an unprivileged user, using the ruser node.conf parameter. In this case, the remote user needs sudo priviles to run the following commands as root: nodemgr, svcmgr and rsync.

On sles1:

sles1:/ # scp /root/.ssh/id_dsa.pub root@sles2:/tmp/

On sles2:

sles2:/ # scp /root/.ssh/id_dsa.pub root@sles1:/tmp/

On sles1 AND sles2:

cat /tmp/id_dsa.pub >> /root/.ssh/authorized_keys2

On sles1:

sles1:/ # ssh sles2 hostname
sles2

On sles2:

sles2:/ # ssh sles1 hostname
sles1

Set Host Mode¶

As we are in a lab environment, we do not need to specify the host mode : "TST" is the default value, and is adequate.

For other purposes than testing, we would have defined on both nodes the relevant mode with the method described here.

Service Creation¶

The OpenSVC service can be created using one of the following two methods:

wizard : svcmgr create with interactive option (-i)
manual : build config file from templates (located in <OSVCDOC>)
provisioning

We will describe the second, manual option, for a better understanding of what happens.

Step 1 : Service configuration file¶

The expected file name is servicename.env

The DEFAULT section in the service file describes the service itself: human readable name, nodes where the service is expected to run on, default node, ...

Every other section defines a ressource managed by the service.

The following configuration describes a service named p26.opensvc.com, running on the primary node sles1, failing-over to node sles2, using one IP address named p26.opensvc.com (name to ip resolution is done by the OpenSVC agent), one LVM volume group vgsvc1, and two filesystems hosted in logical volumes /dev/mapper/vgsvc1-lvappsvc1 and /dev/mapper/vgsvc1-lvdatasvc1.

On sles1 node:

sles1:/ # cd /etc/opensvc

sles1:/etc/opensvc # cat p26.opensvc.com.env
[DEFAULT]                                       # Global section for service description
app = MyApp                                     # service application friendly name
service_type = TST                              # specify is service runs production, test, dev, ...
autostart_node = sles1                          # default running node, name returned by « hostname » command
nodes = sles1 sles2                             # cluster nodes where the service is able to run on

[ip#0]                                          # Ressource Section for ip address
ipname = p26.opensvc.com                        # specify the ip address on which the service will be bound
disable = False                                 # the ip address will be enabled at service startup
optional = False                                # mandatory ressource, the service can't work without it
ipdev = eth0                                    # the physical network device on which the ip address will be stacked

[vg#0]                                          # Ressource Section for volume group
vgname=vgsvc1                                   # volume group name

[fs#0]                                          # Ressource Section for filesystem
type = ext3                                     # filesystem type
disable = False                                 # filesystem is enabled at service startup
mnt = /svc1/app                                 # filesystem mountpoint
optional = False                                # mandatory ressource, the service can't work without it
dev = /dev/mapper/vgsvc1-lvappsvc1              # block device where the filesystem is hosted

[fs#1]
type = ext3
disable = False
mnt = /svc1/data
optional = False
dev = /dev/mapper/vgsvc1-lvdatasvc1

Step 2 : Service startup scripts directory¶

As services are used to manage application, we need to specify a directory where all applications startup scripts can be grouped.

As an example, if we want to build a LAMP service, we would use 2 scripts: one for the mysql database, and another for the apache webserver. Those scripts have to be located in the service startup scripts directory

sles1:/etc/opensvc # mkdir p26.opensvc.com.dir
sles1:/etc/opensvc # ln -s p26.opensvc.com.dir p26.opensvc.com.d

We will see later in this tutorial that /etc/opensvc/p26.opensvc.com.dir may not be the best place for hosting the launchers. Anyway, the symlink p26.opensvc.com.d is the only place where OpenSVC actually search for application launchers defined as basenames.

For now, we just will just create this directory and the symlink. No script is added yet.

Step 3 : Service management facility¶

To make service management easy, we create a symlink to OpenSVC core service management command

sles1:/etc/opensvc # ln -s /usr/bin/svcmgr p26.opensvc.com

Without this symlink, we have to use the svcmgr command with arguments to manage our service

sles1:/ # svcmgr -s p26.opensvc.com print status

With this symlink, we can directly use

sles1:/ # p26.opensvc.com print status

Step 4 : Service configuration check¶

As a final check, we can list all entries that match our p26.opensvc.com service

sles1:/etc/opensvc # ls -lart | grep p26
total 20
drwxr-xr-x 9 root root 4096 16 févr. 11:14 ..
-rw-r--r-- 1 root root  423 17 févr. 14:12 p26.opensvc.com.env
drwxr-xr-x 2 root root 4096 17 févr. 14:14 p26.opensvc.com.dir
lrwxrwxrwx 1 root root   19 17 févr. 14:15 p26.opensvc.com.d -> p26.opensvc.com.dir
lrwxrwxrwx 1 root root   23 17 févr. 14:15 p26.opensvc.com -> /usr/bin/svcmgr
drwxr-xr-x 3 root root 4096 17 févr. 14:15 .

You should be able to see:

the service configuration file (service.env)
the directory where are stored the applications launchers (service.dir)
a symlink to the service.dir (service.d)
a symlink to the /usr/bin/svcmgr command (service)

At this point, we have configured a single service with no application launcher on node sles1.

Service Testing¶

Query service status¶

Our first service is now ready to use. We can query its status.

On sles1:

sles1:/ # p26.opensvc.com print status
p26.opensvc.com
overall                   warn
|- avail                  warn
|  |- vg#0           .... up       vgsvc1
|  |- fs#0           .... down     /dev/mapper/vgsvc1-lvappsvc1@/svc1/app
|  |- fs#1           .... down     /dev/mapper/vgsvc1-lvdatasvc1@/svc1/data
|  |- ip#0           .... down     p26.opensvc.com@eth0
|  '- app            .... n/a      app
|- sync                   warn
|  '- sync#i0        .... warn     rsync svc config to drpnodes, nodes
|                                  # passive node needs update
'- hb                     n/a

This command collects and displays status for each service ressource :

overall status is warn due to the fact that all ressources are not in up status
ressource vg#0 is up because the volume group is activated (which is the expected status after vgcreate)
sync resources are in warn status because no synchronisation happened yet
all other ressources are down or non available n/a

Start service¶

The use of OpenSVC for your services management saves a lot of time and effort. Once the service is described on a node, you just need one command to start the overall application.

Let's start the service

sles1:/ # p26.opensvc.com start
14:40:06 INFO    P26.OPENSVC.COM.IP#0    checking 37.59.71.26 availability
14:40:11 INFO    P26.OPENSVC.COM.IP#0    ifconfig eth0:1 37.59.71.26 netmask 255.255.255.224 up
14:40:11 INFO    P26.OPENSVC.COM.IP#0    arping -U -c 1 -I eth0 -s 37.59.71.26 0.0.0.0
ARPING 0.0.0.0 from 37.59.71.26 eth0
Sent 1 probes (1 broadcast(s))
Received 0 response(s)
14:40:11 INFO    P26.OPENSVC.COM.VG#0    vgsvc1 is already up
14:40:11 INFO    P26.OPENSVC.COM.FS#0    create missing mountpoint /svc1/app
14:40:11 INFO    P26.OPENSVC.COM.FS#0    e2fsck -p /dev/mapper/vgsvc1-lvappsvc1
14:40:11 INFO    P26.OPENSVC.COM.FS#0    output:
/dev/mapper/vgsvc1-lvappsvc1: clean, 11/3072 files, 1530/12288 blocks

14:40:11 INFO    P26.OPENSVC.COM.FS#0    mount -t ext3 /dev/mapper/vgsvc1-lvappsvc1 /svc1/app
14:40:11 INFO    P26.OPENSVC.COM.FS#1    create missing mountpoint /svc1/data
14:40:11 INFO    P26.OPENSVC.COM.FS#1    e2fsck -p /dev/mapper/vgsvc1-lvdatasvc1
14:40:11 INFO    P26.OPENSVC.COM.FS#1    output:
/dev/mapper/vgsvc1-lvdatasvc1: clean, 11/3072 files, 1530/12288 blocks

14:40:11 INFO    P26.OPENSVC.COM.FS#1    mount -t ext3 /dev/mapper/vgsvc1-lvdatasvc1 /svc1/data

The startup sequence reads as:

check if service IP address is not already used somewhere
bring up service ip address
volume group activation (if not already in the correct state)
fsck + mount of each filesystem

Manual filesystem mount check:

sles1:/ # mount | grep svc1
/dev/mapper/vgsvc1-lvappsvc1 on /svc1/app type ext3 (rw)
/dev/mapper/vgsvc1-lvdatasvc1 on /svc1/data type ext3 (rw)

Manual ip address plumbing check on eth0 (p26.opensvc.com is 37.59.71.26):

sles1:/ # ip addr list eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:db:db:29 brd ff:ff:ff:ff:ff:ff
    inet 37.59.71.22/27 brd 37.59.71.31 scope global eth0
    inet 37.59.71.26/27 brd 37.59.71.31 scope global secondary eth0:1
    inet6 fe80::5054:ff:fedb:db29/64 scope link
       valid_lft forever preferred_lft forever

We can confirm everything is OK with the service's print status command:

sles1:/ # p26.opensvc.com print status
p26.opensvc.com
overall                   warn
|- avail                  up
|  |- vg#0           .... up       vgsvc1
|  |- fs#0           .... up       /dev/mapper/vgsvc1-lvappsvc1@/svc1/app
|  |- fs#1           .... up       /dev/mapper/vgsvc1-lvdatasvc1@/svc1/data
|  |- ip#0           .... up       p26.opensvc.com@eth0
|  '- app            .... n/a      app
|                                  # no checkup scripts
|- sync                   down
|  '- sync#i0        .... down     rsync svc config to drpnodes, nodes
|                                  # sles2 need update
'- hb                     n/a

At this point, we have a running service, configured to run on sles1 node.

Application Integration¶

We have gone through the setup of a single service, but it does not start applications yet. Let's add an application to our service now.

We will use a very simple example : a tiny webserver with a single index.html file to serve

Applications launcher directory¶

The OpenSVC service integration enables service relocation amongst nodes. The per-service launchers hosting directory layout is a consequence of this relocation feature. The service has an implicit synchronisation resource to replicate the <OSVCETC>/<service>* files using rsync.

As a refinement, for services with dedicated shared disks, we can relocate the application launchers directory to a filesystem resource hosted in one such disk. The original location was <OSVCETC>/p26.opensvc.dir. Let's move it to /svc1/app/init.d:

sles1:/etc/opensvc # ls -lart | grep p26
total 20
drwxr-xr-x 9 root root 4096 16 févr. 11:14 ..
-rw-r--r-- 1 root root  423 17 févr. 14:12 p26.opensvc.com.env
drwxr-xr-x 2 root root 4096 17 févr. 14:14 p26.opensvc.com.dir
lrwxrwxrwx 1 root root   19 17 févr. 14:15 p26.opensvc.com.d -> p26.opensvc.com.dir
lrwxrwxrwx 1 root root   23 17 févr. 14:15 p26.opensvc.com -> /usr/bin/svcmgr
drwxr-xr-x 3 root root 4096 17 févr. 14:15 .

sles1:/etc/opensvc # rm -f p26.opensvc.com.d
sles1:/etc/opensvc # rmdir p26.opensvc.com.dir

sles1:/etc/opensvc # mkdir /svc1/app/init.d
sles1:/etc/opensvc # ln -s /svc1/app/init.d p26.opensvc.com.d

sles1:/etc/opensvc # ls -lart | grep p26
total 12
lrwxrwxrwx 1 root root  23 17 févr. 14:15 p26.opensvc.com -> /usr/bin/svcmgr
lrwxrwxrwx 1 root root  16 17 févr. 16:48 p26.opensvc.com.d -> /svc1/app/init.d
-rw-r--r-- 1 root root 396 17 févr. 14:21 p26.opensvc.com.env

Application Binary¶

In the service directory structure, we put a standalone binary of the Mongoose web server (https://code.google.com/p/mongoose/)

sles1:/ # cd /svc1/app

sles1:/svc1/app # wget -O /svc1/app/webserver http://cesanta.com/downloads/mongoose-lua-sqlite-ssl-static-x86_64-5.2
--2014-02-18 14:35:12--  http://cesanta.com/downloads/mongoose-lua-sqlite-ssl-static-x86_64-5.2
Resolving cesanta.com... 54.194.65.250
Connecting to cesanta.com|54.194.65.250|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1063420 (1.0M) [text/plain]
Saving to: `/svc1/app/webserver'

100%[================================================================================================>] 1,063,420    210K/s   in 5.3s

2014-02-18 14:35:18 (197 KB/s) - `/svc1/app/webserver' saved [1063420/1063420]

sles1:/svc1/app # ls -l /svc1/app/webserver
-rwxr-xr-x 1 root root 1063420 Feb  1 18:11 /svc1/app/webserver

And create a dummy web page in /svc1/data/, to be served by our webserver:

sles1:/svc1/app # cd /svc1/data/

sles1:/svc1/data # cat index.html
<html><body>It Works !</body></html>

Applications launcher script¶

We have to create a management script for our web application. At minimum, this script must support the start argument.

As a best practice, the script should also support the additional arguments:

stop
status
info

Of course, we will store our script named weblauncher in the directory previsouly created for this purpose:

sles1:/ # cd /svc1/app/init.d

sles1:/svc1/app/init.d # cat weblauncher
#!/bin/bash

SVCROOT=/svc1
APPROOT=${SVCROOT}/app
DAEMON=${APPROOT}/webserver
DAEMON_BASE=$(basename $DAEMON)
DAEMONOPTS="-document_root ${SVCROOT}/data -index_files index.html -listening_port 8080"

function status {
        pgrep $DAEMON_BASE >/dev/null 2>&1
}

case $1 in
restart)
        killall $DAEMON_BASE
        $DAEMON
        ;;
start)
        status && {
                echo "already started"
                exit 0
        }
        nohup $DAEMON $DAEMONOPTS >> /dev/null 2>&1 &
        ;;
stop)
        killall $DAEMON_BASE
        ;;
info)
        echo "Name: webserver"
        ;;
status)
        status
        exit $?
        ;;
*)
        echo "unsupported action: $1" >&2
        exit 1
        ;;
esac

Make sure the script is working fine outside of the OpenSVC context:

sles1:/svc1/app # ./weblauncher status
sles1:/svc1/app # echo $?
1
sles1:/svc1/app # ./weblauncher start
sles1:/svc1/app # ./weblauncher status
sles1:/svc1/app # echo $?
0
sles1:/svc1/app # ./weblauncher stop
sles1:/svc1/app # ./weblauncher status
sles1:/svc1/app # echo $?
1

Now we can instruct OpenSVC to handle this script for service application management

sles1:/svc1/app/init.d # ln -s weblauncher S10weblauncher
sles1:/svc1/app/init.d # ln -s weblauncher K90weblauncher
sles1:/svc1/app/init.d # ln -s weblauncher C10weblauncher

sles1:/svc1/app/init.d # ls -l
total 1
lrwxrwxrwx 1 root root  11 Feb 17 16:49 C10weblauncher -> weblauncher
lrwxrwxrwx 1 root root  11 Feb 17 16:48 K90weblauncher -> weblauncher
lrwxrwxrwx 1 root root  11 Feb 17 16:47 S10weblauncher -> weblauncher
-rwxr-xr-x 1 root root 570 Feb 17 16:45 weblauncher

This configuration tells OpenSVC to call the weblauncher script with :

start argument when OpenSVC service starts (symlink S10weblauncher)
stop argument when OpenSVC service stops (symlink K90weblauncher)
status argument when OpenSVC service needs status on application (symlink C10weblauncher)

When integrating multiple software into an OpenSVC service, you can to use the digits after [SKC] in the symlink name to specify the scripts execution sequencing for start/stop/check actions.

Now we can give a try to our launcher script, using OpenSVC commands:

sles1:~ # p26.opensvc.com start
16:52:31 INFO    P26.OPENSVC.COM.IP#0    checking 37.59.71.26 availability
16:52:36 INFO    P26.OPENSVC.COM.IP#0    ifconfig eth0:1 37.59.71.26 netmask 255.255.255.224 up
16:52:36 INFO    P26.OPENSVC.COM.IP#0    arping -U -c 1 -I eth0 -s 37.59.71.26 0.0.0.0
ARPING 0.0.0.0 from 37.59.71.26 eth0
Sent 1 probes (1 broadcast(s))
Received 0 response(s)
16:52:36 INFO    P26.OPENSVC.COM.VG#0    vgchange --addtag @sles1 vgsvc1
16:52:37 INFO    P26.OPENSVC.COM.VG#0    output:
  Volume group "vgsvc1" successfully changed

16:52:37 INFO    P26.OPENSVC.COM.VG#0    vgchange -a y vgsvc1
16:52:37 INFO    P26.OPENSVC.COM.VG#0    output:
  2 logical volume(s) in volume group "vgsvc1" now active

16:52:37 INFO    P26.OPENSVC.COM.FS#0    e2fsck -p /dev/mapper/vgsvc1-lvappsvc1
16:52:37 INFO    P26.OPENSVC.COM.FS#0    output:
/dev/mapper/vgsvc1-lvappsvc1: clean, 19/3072 files, 2579/12288 blocks

16:52:37 INFO    P26.OPENSVC.COM.FS#0    mount -t ext3 /dev/mapper/vgsvc1-lvappsvc1 /svc1/app
16:52:37 INFO    P26.OPENSVC.COM.FS#1    e2fsck -p /dev/mapper/vgsvc1-lvdatasvc1
16:52:37 INFO    P26.OPENSVC.COM.FS#1    output:
/dev/mapper/vgsvc1-lvdatasvc1: clean, 13/3072 files, 1532/12288 blocks

16:52:37 INFO    P26.OPENSVC.COM.FS#1    mount -t ext3 /dev/mapper/vgsvc1-lvdatasvc1 /svc1/data
16:52:37 INFO    P26.OPENSVC.COM.APP     spawn: /etc/opensvc/p26.opensvc.com.d/S10weblauncher start
16:52:37 INFO    P26.OPENSVC.COM.APP     start done in 0:00:00.007657 - ret 0 - logs in /var/tmp/opensvc/svc_p26.opensvc.com_S10weblauncher.log

We can see that OpenSVC is now calling our startup script after mounting filesystems.

Querying the service status, the app ressource is now reporting up:

sles1:~ # p26.opensvc.com print status
p26.opensvc.com
overall                   warn
|- avail                  up
|  |- vg#0           .... up       vgsvc1
|  |- fs#0           .... up       /dev/mapper/vgsvc1-lvappsvc1@/svc1/app
|  |- fs#1           .... up       /dev/mapper/vgsvc1-lvdatasvc1@/svc1/data
|  |- ip#0           .... up       p26.opensvc.com@eth0
|  '- app            .... up       app
|- sync                   down
|  '- sync#i0        .... down     rsync svc config to drpnodes, nodes
|                                  # sles2 need update
'- hb                     n/a

Let's check if that is really the case:

sles1:/ # ps auxww|grep web
root      5902  0.0  0.1   4596  2304 pts/2    S    16:52   0:00 /svc1/app/webserver -document_root /svc1/data -index_files index.html -listening_port 8080
root      5958  0.0  0.0   7780   888 pts/2    S+   16:53   0:00 grep web

sles1:~ # wget -qO - http://p26.opensvc.com:8080/
<html><body>It Works !</body></html>

Now we can stop our service:

sles1:/ # p26.opensvc.com stop
32:31 INFO    P26.OPENSVC.COM.APP     spawn: /etc/opensvc/p26.opensvc.com.d/K90weblauncher stop
32:31 INFO    P26.OPENSVC.COM.APP     stop done in 0:00:00.004676 - ret 0 - logs in /var/tmp/opensvc/svc_p26.opensvc.com_K90weblauncher.log
32:32 INFO    P26.OPENSVC.COM.FS#1    umount /svc1/data
32:32 INFO    P26.OPENSVC.COM.FS#0    umount /svc1/app
32:32 INFO    P26.OPENSVC.COM.VG#0    vgchange --deltag @sles1 vgsvc1
32:32 INFO    P26.OPENSVC.COM.VG#0    output:
  Volume group "vgsvc1" successfully changed

32:32 INFO    P26.OPENSVC.COM.VG#0    kpartx -d /dev/vgsvc1/lvappsvc1
32:32 INFO    P26.OPENSVC.COM.VG#0    kpartx -d /dev/vgsvc1/lvdatasvc1
32:32 INFO    P26.OPENSVC.COM.VG#0    vgchange -a n vgsvc1
32:32 INFO    P26.OPENSVC.COM.VG#0    output:
logical volume(s) in volume group "vgsvc1" now active

32:32 INFO    P26.OPENSVC.COM.IP#0    ifconfig eth0:1 down
32:32 INFO    P26.OPENSVC.COM.IP#0    checking 37.59.71.26 availability

Once again, a single command:

brings down the application
unmounts filesystems
deactivates the volume group
disables the service ip address

The overall status is now reported as being down

sles1:/ # p26.opensvc.com print status
p26.opensvc.com
overall                   down
|- avail                  down
|  |- vg#0           .... down     vgsvc1
|  |- fs#0           .... down     /dev/mapper/vgsvc1-lvappsvc1@/svc1/app
|  |- fs#1           .... down     /dev/mapper/vgsvc1-lvdatasvc1@/svc1/data
|  |- ip#0           .... down     p26.opensvc.com@eth0
|  '- app            .... n/a      app
|- sync                   down
|  '- sync#i0        .... down     rsync svc config to drpnodes, nodes
|                                  # sles2 need update
'- hb                     n/a

Let's restart the service to continue this tutorial:

sles1:/ # p26.opensvc.com start
15:53:44 INFO    P26.OPENSVC.COM.IP#0    checking 37.59.71.26 availability
15:53:48 INFO    P26.OPENSVC.COM.IP#0    ifconfig eth0:1 37.59.71.26 netmask 255.255.255.224 up
15:53:48 INFO    P26.OPENSVC.COM.IP#0    arping -U -c 1 -I eth0 -s 37.59.71.26 0.0.0.0
ARPING 0.0.0.0 from 37.59.71.26 eth0
Sent 1 probes (1 broadcast(s))
Received 0 response(s)
15:53:49 INFO    P26.OPENSVC.COM.VG#0    vgchange --addtag @sles1 vgsvc1
15:53:49 INFO    P26.OPENSVC.COM.VG#0    output:
  Volume group "vgsvc1" successfully changed

15:53:49 INFO    P26.OPENSVC.COM.VG#0    vgchange -a y vgsvc1
15:53:49 INFO    P26.OPENSVC.COM.VG#0    output:
  2 logical volume(s) in volume group "vgsvc1" now active

15:53:49 INFO    P26.OPENSVC.COM.FS#0    e2fsck -p /dev/mapper/vgsvc1-lvappsvc1
15:53:49 INFO    P26.OPENSVC.COM.FS#0    output:
/dev/mapper/vgsvc1-lvappsvc1: clean, 19/3072 files, 2579/12288 blocks

15:53:49 INFO    P26.OPENSVC.COM.FS#0    mount -t ext3 /dev/mapper/vgsvc1-lvappsvc1 /svc1/app
15:53:49 INFO    P26.OPENSVC.COM.FS#1    e2fsck -p /dev/mapper/vgsvc1-lvdatasvc1
15:53:49 INFO    P26.OPENSVC.COM.FS#1    output:
/dev/mapper/vgsvc1-lvdatasvc1: clean, 13/3072 files, 1532/12288 blocks

15:53:49 INFO    P26.OPENSVC.COM.FS#1    mount -t ext3 /dev/mapper/vgsvc1-lvdatasvc1 /svc1/data
15:53:49 INFO    P26.OPENSVC.COM.APP     spawn: /etc/opensvc/p26.opensvc.com.d/S10weblauncher start
15:53:49 INFO    P26.OPENSVC.COM.APP     start done in 0:00:00.008936 - ret 0 - logs in /var/tmp/opensvc/svc_p26.opensvc.com_S10weblauncher.log

At this point, we have a running service on node sles1, with a webserver application embedded.

Service Failover¶

Our service is running fine, but what happens if the sles1 node fails ? Our p26.opensvc.com service will also fail. That's why we want to extend the service configuration to declare sles2 as a failover node for this service. After this change, the service configuration needs replication to the sles2 node.

First we check <OSVCETC> on sles2, it should be empty because we've done a fresh install:

sles1:/etc/opensvc # ssh sles2 ls /etc/opensvc/ | grep p26.opensvc.com
sles1:/etc/opensvc #

The configuration replication will be possible if the following conditions are met:

the new node is declared in the service configuration file <OSVCETC>/p26.opensvc.com.env (parameter "nodes" in the .env file)
the node sending config files (sles1) is trusted on the new node (sles2) (as described in a previous chapter of this tutorial)
the node sending config files (sles1) must be running the service (the service availability status, apps excluded, is up).
the previous synchronisation is older than the configured minimum delay, or the --force option is set to bypass the delay check.

Let's replicate the configuration files:

sles1:/ # svcmgr -s p26.opensvc.com syncnodes
17:20:37 INFO    P26.OPENSVC.COM.SYNC#I0 skip sync: not in allowed period (['03:59', '05:59'])

sles1:/ # svcmgr -s p26.opensvc.com syncnodes --force
17:20:41 INFO    P26.OPENSVC.COM         exec 'svcmgr -s p26.opensvc.com --waitlock 3600 postsync' on node sles2

sles1:/ # ssh sles2 ls -l /etc/opensvc | grep p26.opensvc.com
total 8
lrwxrwxrwx 1 root root  23 17 févr. 14:15 p26.opensvc.com -> /usr/bin/svcmgr
lrwxrwxrwx 1 root root  16 17 févr. 16:48 p26.opensvc.com.d -> /svc1/app/init.d
-rw-r--r-- 1 root root 396 17 févr. 14:21 p26.opensvc.com.env

We can see that the sles2 node is now ready to start our service.

On sles1:

sles1:/ # svcmgr -s p26.opensvc.com print status
p26.opensvc.com
overall                   up
|- avail                  up
|  |- vg#0           .... up       vgsvc1
|  |- fs#0           .... up       /dev/mapper/vgsvc1-lvappsvc1@/svc1/app
|  |- fs#1           .... up       /dev/mapper/vgsvc1-lvdatasvc1@/svc1/data
|  |- ip#0           .... up       p26.opensvc.com@eth0
|  '- app            .... up       app
|- sync                   up
|  '- sync#i0        .... up       rsync svc config to drpnodes, nodes
'- hb                     n/a

Note that the sync#i0 ressource is now up, due to both nodes being in sync from a service configuration point of view.

We can now try to start the service on sles2, after stopping it on sles1:

sles1:/ # svcmgr -s p26.opensvc.com stop
07:40 INFO    P26.OPENSVC.COM.APP     spawn: /etc/opensvc/p26.opensvc.com.d/K90weblauncher stop
07:40 INFO    P26.OPENSVC.COM.APP     stop done in 0:00:00.004513 - ret 0 - logs in /var/tmp/opensvc/svc_p26.opensvc.com_K90weblauncher.log
07:40 INFO    P26.OPENSVC.COM.FS#1    umount /svc1/data
07:40 INFO    P26.OPENSVC.COM.FS#0    umount /svc1/app
07:40 INFO    P26.OPENSVC.COM.VG#0    vgchange --deltag @sles1 vgsvc1
07:41 INFO    P26.OPENSVC.COM.VG#0    output:
  Volume group "vgsvc1" successfully changed

07:41 INFO    P26.OPENSVC.COM.VG#0    kpartx -d /dev/vgsvc1/lvappsvc1
07:41 INFO    P26.OPENSVC.COM.VG#0    kpartx -d /dev/vgsvc1/lvdatasvc1
07:41 INFO    P26.OPENSVC.COM.VG#0    vgchange -a n vgsvc1
07:41 INFO    P26.OPENSVC.COM.VG#0    output:
logical volume(s) in volume group "vgsvc1" now active

07:41 INFO    P26.OPENSVC.COM.IP#0    ifconfig eth0:1 down
07:41 INFO    P26.OPENSVC.COM.IP#0    checking 37.59.71.26 availability

On sles2:

sles2:~ # p26.opensvc.com start
16:08:38 INFO    P26.OPENSVC.COM.IP#0    checking 37.59.71.26 availability
16:08:41 INFO    P26.OPENSVC.COM.IP#0    ifconfig eth0:1 37.59.71.26 netmask 255.255.255.224 up
16:08:41 INFO    P26.OPENSVC.COM.IP#0    arping -U -c 1 -I eth0 -s 37.59.71.26 0.0.0.0
ARPING 0.0.0.0 from 37.59.71.26 eth0
Sent 1 probes (1 broadcast(s))
Received 0 response(s)
16:08:42 INFO    P26.OPENSVC.COM.VG#0    vgchange --addtag @sles2 vgsvc1
16:08:43 INFO    P26.OPENSVC.COM.VG#0    output:
  Volume group "vgsvc1" successfully changed

16:08:43 INFO    P26.OPENSVC.COM.VG#0    vgchange -a y vgsvc1
16:08:43 INFO    P26.OPENSVC.COM.VG#0    output:
  2 logical volume(s) in volume group "vgsvc1" now active

16:08:43 INFO    P26.OPENSVC.COM.FS#0    e2fsck -p /dev/mapper/vgsvc1-lvappsvc1
16:08:43 INFO    P26.OPENSVC.COM.FS#0    output:
/dev/mapper/vgsvc1-lvappsvc1: clean, 19/3072 files, 2579/12288 blocks

16:08:43 INFO    P26.OPENSVC.COM.FS#0    mount -t ext3 /dev/mapper/vgsvc1-lvappsvc1 /svc1/app
16:08:43 INFO    P26.OPENSVC.COM.FS#1    e2fsck -p /dev/mapper/vgsvc1-lvdatasvc1
16:08:43 INFO    P26.OPENSVC.COM.FS#1    output:
/dev/mapper/vgsvc1-lvdatasvc1: clean, 13/3072 files, 1532/12288 blocks

16:08:43 INFO    P26.OPENSVC.COM.FS#1    mount -t ext3 /dev/mapper/vgsvc1-lvdatasvc1 /svc1/data
16:08:43 INFO    P26.OPENSVC.COM.APP     spawn: /etc/opensvc/p26.opensvc.com.d/S10weblauncher start
16:08:43 INFO    P26.OPENSVC.COM.APP     start done in 0:00:00.009601 - ret 0 - logs in /var/tmp/opensvc/svc_p26.opensvc.com_S10weblauncher.log

sles2:~ # p26.opensvc.com print status
p26.opensvc.com
overall                   up
|- avail                  up
|  |- vg#0           .... up       vgsvc1
|  |- fs#0           .... up       /dev/mapper/vgsvc1-lvappsvc1@/svc1/app
|  |- fs#1           .... up       /dev/mapper/vgsvc1-lvdatasvc1@/svc1/data
|  |- ip#0           .... up       p26.opensvc.com@eth0
|  '- app            .... up       app
|- sync                   up
|  '- sync#i0        .... up       rsync svc config to drpnodes, nodes
'- hb                     n/a

Service p26.opensvc.com is now running on node sles2. Service relocation operational, easy as that.

Now, what happens if I try to start my service on sles1 while already running on sles2 ?:

sles1:/ # p26.opensvc.com start
19:39 INFO    P26.OPENSVC.COM.IP#0    checking 37.59.71.26 availability
19:39 ERROR   P26.OPENSVC.COM         'start' action stopped on execution error: start aborted due to resource ip#0 conflict
19:39 INFO    P26.OPENSVC.COM         skip rollback start: no resource activated

Fortunately, OpenSVC IP address check prevent the service from starting on sles1.

Note

At this point, we have a 2-node failover cluster. Although this setup meets most needs, the failover is _manual_, so does not qualify as a high availability cluster.

To learn how to meet HA requirements with OpenSVC, we will now describe the OpenHA heartbeat setup.

OpenHA Integration¶

This chapters presents the steps to upgrade a service from "manual failover" to "automated failover". It follows the instructions from High Availability setup

OpenSVC Heartbeat Ressource¶

A HA OpenSVC service handles a special resource: the heartbeat resource, which reports the service status from the point of view of the heartbeat. No action is handled by this resource type.

The following section is appended to the p26.opensvc.com.env file on node sles1:

[hb#0]
type = OpenHA

The name parameter can be set if the OpenSVC service name is different from the OpenHA service name. In this example, we use the same service name, so we omitted this parameter.

The next svcmon or print status action will automatically complete the <OSVCETC> directory with 2 new symlinks:

sles1:/ # svcmgr -s p26.opensvc.com print status
send /etc/opensvc/p26.opensvc.com.env to collector ... OK
update /var/lib/opensvc/p26.opensvc.com.push timestamp ... OK
p26.opensvc.com
11:19:37 INFO    P26.OPENSVC.COM.HB#0  /etc/opensvc/p26.opensvc.com.cluster: not regular file nor symlink. fix.
11:19:37 INFO    P26.OPENSVC.COM.HB#0  /etc/opensvc/p26.opensvc.com.stonith: not regular file nor symlink. fix.
overall                   warn
|- avail                  up
|  |- vg#0           .... up       vgsvc1
|  |- fs#0           .... up       /dev/mapper/vgsvc1-lvappsvc1@/svc1/app
|  |- fs#1           .... up       /dev/mapper/vgsvc1-lvdatasvc1@/svc1/data
|  |- ip#0           .... up       p26.opensvc.com@eth0
|  '- app            .... up       app
|- sync                   up
|  '- sync#i0        .... up       rsync svc config to drpnodes, nodes
'- hb                     warn
   '- hb#0           .... warn     hb.openha
                                   # open-ha daemons are not running

sles1:/etc/opensvc # ls -lart | grep p26
lrwxrwxrwx 1 root root   23 17 févr. 14:15 p26.opensvc.com -> /usr/bin/svcmgr
lrwxrwxrwx 1 root root   16 17 févr. 16:48 p26.opensvc.com.d -> /svc1/app/init.d
-rw-r--r-- 1 root root  428 19 févr. 08:29 p26.opensvc.com.env.before.openha
-rw-r--r-- 1 root root  450 19 févr. 08:30 p26.opensvc.com.env
lrwxrwxrwx 1 root root   13 19 févr. 11:19 p26.opensvc.com.stonith -> /usr/bin/svcmgr
lrwxrwxrwx 1 root root   13 19 févr. 11:19 p26.opensvc.com.cluster -> /usr/bin/svcmgr

The new service configuration must now be pushed to the peer node sles2:

sles1:/ # svcmgr -s p26.opensvc.com syncnodes --force
11:55:50 INFO    P26.OPENSVC.COM         exec '/etc/opensvc/p26.opensvc.com --waitlock 3600 postsync' on node sles2

sles1:/ # ssh sles2 svcmgr -s p26.opensvc.com print status
18:18:56 INFO    P26.OPENSVC.COM.HB#0    /etc/opensvc/p26.opensvc.com.cluster: not regular file nor symlink. fix.
18:18:56 INFO    P26.OPENSVC.COM.HB#0    /etc/opensvc/p26.opensvc.com.stonith: not regular file nor symlink. fix.
p26.opensvc.com
overall                   down
|- avail                  down
|  |- vg#0           .... down     vgsvc1
|  |- fs#0           .... down     /dev/mapper/vgsvc1-lvappsvc1@/svc1/app
|  |- fs#1           .... down     /dev/mapper/vgsvc1-lvdatasvc1@/svc1/data
|  |- ip#0           .... down     p26.opensvc.com@eth0
|  '- app            .... n/a      app
|- sync                   up
|  '- sync#i0        .... up       rsync svc config to drpnodes, nodes
'- hb                     warn
   '- hb#0           .... warn     hb.openha
                                   # open-ha daemons are not running

OpenHA Installation¶

Install the OpenHA package on both cluster nodes.

On both nodes:

# wget -O /tmp/openha.latest.rpm https://repo.opensvc.com/rpms/deps/el6/openha-0.3.6.osvc2-0.x86_64.rpm
# rpm -Uvh /tmp/openha.latest.rpm
# rpm -qa | grep openha
openha-0.3.6.osvc2-0
# ls /usr/local/cluster
bin  conf  doc  env.sh  ezha.init  log  services

As specified in the documentation, we have to set environment variables to be able to use OpenHA commands. You can either set them system-wide (/etc/profile), or just set them when needed:

# export EZ=/usr/local/cluster
# . /usr/local/cluster/env.sh

OpenHA Configuration¶

First, we describe the cluster nodes in the file /usr/local/cluster/conf/nodes

On both nodes:

# cat /usr/local/cluster/conf/nodes
sles1
sles2

In this example, we implement two heartbeats:

A network multicast ip heartbeat
A shared disk heartbeat (a new lun has been provisionned from the OpenFiler host : /dev/mapper/14f504e46494c45526967724d32682d553243692d4f336a4c)

The heartbeat configuration file /usr/local/cluster/conf/monitor contains the following lines on both nodes:

On both nodes:

# cat /usr/local/cluster/conf/monitor
sles1 net eth0 239.131.50.10 1234 10
sles1 dio /dev/mapper/14f504e46494c45526967724d32682d553243692d4f336a4c 0 10
sles2 net eth0 239.131.50.10 4321 10
sles2 dio /dev/mapper/14f504e46494c45526967724d32682d553243692d4f336a4c 2 10

These lines mean:

sles1 node will send heartbeat through eth0 on multicast IP 239.131.50.10 port 1234, with a 10 seconds timeout
sles1 node will write heartbeat on the first block of disk /dev/mapper/14f504e46494c45526967724d32682d553243692d4f336a4c, with a 10 seconds timeout
sles1 will listen heartbeat through eth0 on multicast IP 239.131.50.10 port 4321, with a 10 seconds timeout
sles1 node will read heartbeat on the third block of disk /dev/mapper/14f504e46494c45526967724d32682d553243692d4f336a4c, with a 10 seconds timeout

OpenHA also requires monitored services to be declared :

On both nodes:

# $EZ_BIN/service -a p26.opensvc.com /etc/opensvc/p26.opensvc.com.cluster sles1 sles2 /bin/true
Creating service p26.opensvc.com :
Make of services directory done
Done.

Please note that the configuration applied does not include any stonith callout, as the stonith is best handled through OpenSVC.

The last setup step concerns OpenHA start/stop scripts.

On both nodes:

# ln -s /usr/local/cluster/ezha /etc/rc.d/rc3.d/S99cluster
# ln -s /usr/local/cluster/ezha /etc/rc.d/rc0.d/K01cluster
# ln -s /usr/local/cluster/ezha /etc/rc.d/rcS.d/K01cluster

OpenHA Testing¶

Once this setup is in place, OpenHA takes over the OpenSVC service management.

Warning

In this example the service p26.opensvc.com was stopped when the OpenHA daemons were started. It's also possible to install-configure-start or stop-upgrade-start OpenHA while keeping the service operational, but these procedures are not covered in this tutorial.

We start the OpenHA agents:

On both nodes:

# /usr/local/cluster/ezha.init start

You can query the OpenHA service configuration and states with the $EZ_BIN/service -s command:

On both nodes:

# $EZ_BIN/service -s
1 service(s) defined:
Service: p26.opensvc.com
        Primary  : sles1, FROZEN_STOP
        Secondary: sles2, FROZEN_STOP

The double FROZEN_STOP status means that neither sles1 nor sles2 are capable of taking over the service.

We can also check for hearbeats status with the $EZ_BIN/hb -s command:

On both nodes:

# $EZ_BIN/hb -s
interface eth0:239.131.50.10:1234 pid 25633 status UP, updated at Feb 19 20:59:57
interface /dev/mapper/14f504e46494c45526967724d32682d553243692d4f336a4c:0 pid 25636 status UP, updated at Feb 19 20:59:57
interface eth0:239.131.50.10:4321 pid 23801 status UP, updated at Feb 19 20:59:57
interface /dev/mapper/14f504e46494c45526967724d32682d553243692d4f336a4c:2 pid 23804 status UP, updated at Feb 19 20:59:55

Everything is working as expected. We can now allow sles1 node to take over the service using the unfreeze command:

On sles1 node:

sles1:/usr/local/cluster/conf # /usr/local/cluster/bin/service -A p26.opensvc.com unfreeze

Querying the OpenHA service status at a ~1 second interval, we can see to status transitions:

On sles1 node:

sles1:/usr/local/cluster/conf # /usr/local/cluster/bin/service -s
1 service(s) defined:
Service: p26.opensvc.com
        Primary  : sles1, START_READY
        Secondary: sles2, FROZEN_STOP

=> The START_READY state means that sles1 node is ready to start the service, but waits for a couple of seconds to see if its peer node also transition to this same START_READY state. In this case OpenHA would start the service where it was previously running. In our case we keep sles2 in the FROZEN_STOP state, and a couple of seconds later we observe:

On sles1 node:

sles1:/usr/local/cluster/conf # /usr/local/cluster/bin/service -s
1 service(s) defined:
Service: p26.opensvc.com
        Primary  : sles1, STARTING
        Secondary: sles2, FROZEN_STOP

=> The STARTING state means that sles1 node have initiated the service startup by calling the script <OSVCETC>/p26.opensvc.com.cluster specified in OpenHA service configuration with the start parameter.

On sles1 node:

sles1:/usr/local/cluster/conf # /usr/local/cluster/bin/service -s
1 service(s) defined:
Service: p26.opensvc.com
        Primary  : sles1, STARTED
        Secondary: sles2, FROZEN_STOP

=> The STARTED state means that sles1 node has finished the startup of the service, and the script return code was 0.

We can confirm that the service is running by querying its state through OpenSVC commands:

On sles1 node:

sles1:/ # p26.opensvc.com print status
p26.opensvc.com
overall                   up
|- avail                  up
|  |- vg#0           .... up       vgsvc1
|  |- fs#0           .... up       /dev/mapper/vgsvc1-lvappsvc1@/svc1/app
|  |- fs#1           .... up       /dev/mapper/vgsvc1-lvdatasvc1@/svc1/data
|  |- ip#0           .... up       p26.opensvc.com@eth0
|  '- app            .... up       app
|- sync                   up
|  '- sync#i0        .... up       rsync svc config to drpnodes, nodes
'- hb                     up
   '- hb#0           .... up       hb.openha

The second node sles2 is still in the FROZEN_STOP state. We have to allow it to take over the service, if need be.

On sles2 node:

sles2:/ # /usr/local/cluster/bin/service -A p26.opensvc.com unfreeze

sles2:/usr/local/cluster/log # /usr/local/cluster/bin/service -s
1 service(s) defined:
Service: p26.opensvc.com
        Primary  : sles1, STARTED
        Secondary: sles2, STOPPED

=> The sles2 node is not ready to take over the service, which is accurately reported as STOPPED.

The OpenSVC service management is now delegated to the OpenHA agents. OpenSVC makes sure administrators can not bypass the heartbeat daemon to submit actions directly to the OpenSVC service:

On sles1 node:

sles1:/ # p26.opensvc.com stop
21:34:10 INFO    P26.OPENSVC.COM         this service is managed by a clusterware, thus direct service manipulation is disabled. the --cluster option circumvent this safety net.
sles1:/ #