Getting Started¶

This page will help you take your first steps with OpenSVC services setup.

It will guide you through the sequence of tasks to achieve a simple but working dual-node failover cluster.

Prerequisites¶

The demonstration environment is composed of:

A pair of Centos7 servers named node1 and node2, respectively acting as first and second cluster node
Both nodes are connected to the same network segment 192.168.121.0/24
root access is mandatory

OpenSVC Installation¶

We will follow the steps described in Nodeware installation

Install the OpenSVC Agent on both cluster nodes.

On both nodes:

$ wget -O /tmp/opensvc.latest.rpm https://repo.opensvc.com/rpms/current

$ sudo yum -y install /tmp/opensvc.latest.rpm

$ sudo rpm -qa | grep opensvc
opensvc-1.9-906.noarch

$ sudo systemctl start opensvc-agent.service

$ sudo systemctl is-active opensvc-agent.service
active

We can also check for proper daemon behaviour:

$ sudo svcmon
Threads                           node1
 listener  running 0.0.0.0:1214
 monitor   running
 scheduler running

Nodes                             node1
  15m                            | 0.1
  state                          |

Services                          node1

The Threads section is explained here Daemon

The OpenSVC agent is now operational.

SSH Keys Setup¶

Cluster members needs ssh mutual authentication to exchange some OpenSVC configuration files. Each node must trust its peer through key-based authentication to allow these communications.

node1 will be able to connect to node2 as root.
node2 will be able to connect to node1 as root.

Note

It is also possible for the agent to login on a peer cluster node using an unprivileged user, using the ruser node.conf parameter. In this case, the remote user needs sudo priviles to run the following commands as root: nodemgr, svcmgr and rsync.

On node1:

node1:/ # ssh-copy-id root@node2

On node2:

node2:/ # ssh-copy-id root@node1

On node1:

node1:/ # ssh node2 hostname
node2

On node2:

node2:/ # ssh node1 hostname
node1

Set Host Environment¶

As we are in a lab environment, we do not need to specify the host environment : "TST" is the default value, and is adequate.

For other purposes than testing, we would have defined on both nodes the relevant mode with the method described here Set Node Environment

Cluster Build¶

As our first setup consist in a dual node cluster, we have to follow the steps described here Cluster Configuration

On node1:

$ om cluster set --param hb#1.type --value unicast

  $ om mon

  Threads                            node1
   hb#1.rx   running 0.0.0.0:10000 | /
   hb#1.tx   running               | /
   listener  running 0.0.0.0:1214
   monitor   running
   scheduler running

  Nodes                              node1
    15m                             | 0.1
    state                           |

  Services                           node1

Service Creation¶

The OpenSVC service can be created using one of the following two methods:

provisioning
manual : build config file from templates (located in <OSVCDOC>)

We will describe the second, manual option, for a better understanding of what happens.

Step 1 : Service creation¶

A simple command is needed to create an empty service named svc1:

$ om svc1 create

The expected file name is svc1.conf located in <OSVCETC> At this time, the configuration file should be empty, except its unique id. You have to edit it in order to define your service.

We are going to define a service running on the primary node node node1, failing-over to node node2, using one IP address named svc1.opensvc.com (name to ip resolution is done by the OpenSVC agent), one LVM volume group vgsvc1 and two filesystems hosted in logical volumes /dev/mapper/vgsvc1-lvappsvc1 and /dev/mapper/vgsvc1-lvdatasvc1.

On node1 node:

$ om svc1 edit config

[DEFAULT]
app = MyApp
nodes = node1 node2
id = c450ecfa-2c02-4d4b-95a3-c543d4389ec0

[ip#0]
ipname = svc1.opensvc.com
ipdev = eth0

[disk#0]
type = vg
name = vgsvc1
pvs = /dev/loop0

[fs#app]
type = ext4
dev = /dev/mapper/vgsvc1-lvappsvc1
mnt = /svc1/app

[fs#data]
type = ext4
dev = /dev/mapper/vgsvc1-lvdatasvc1
mnt = /svc1/data

The DEFAULT section in the service file describes the service itself: human readable name, nodes where the service is expected to run on...

Every other section defines a ressource managed by the service.

Step 4 : Service configuration check¶

As a final check, we can list all entries that match our svc1 service

node1:/etc/opensvc # ls -lart | grep svc1
-rw-r--r--.  1 root root  287 janv.  2 15:15 svc1.conf

You should be able to see:

the service configuration file (svc1.conf)

At this point, we have configured a single service with no application launcher on node node1.

Service Testing¶

Query service status¶

Our first service is now ready to use. We can query its status.

On node1:

$ om svc1 print status
svc1                           down
`- instances
   |- node2                    undef      daemon down
   `- node1                    warn       warn, frozen, idle
      |- ip#0           ...... down       svc1.opensvc.com@eth0
      |- disk#0         ...... up         vg vgsvc1
      |- fs#app         ...... down       ext4 /dev/mapper/vgsvc1-lvappsvc1@/svc1/app
      |- fs#data        ...... down       ext4 /dev/mapper/vgsvc1-lvdatasvc1@/svc1/data
      `- sync#i0        ..O./. n/a        rsync svc config to drpnodes, nodes
                                                                              info: paused, service not up

This command collects and displays status for each service ressource :

overall status is warn due to the fact that all ressources are not in up status
ressource vg#0 is up because the volume group is activated (which is the expected status after vgcreate)
all other ressources are down or non available n/a

Start service¶

The use of OpenSVC for your services management saves a lot of time and effort. Once the service is described on a node, you just need one command to start the overall application.

Let's start the service on the local node:

node1:/ # om svc1 start --local
node1.svc1.ip#0        checking 192.168.121.42 availability
node1.svc1.ip#0        ifconfig eth0:1 192.168.121.42 netmask 255.255.255.0 up
node1.svc1.ip#0        arping -U -c 1 -I eth0 -s 192.168.121.42 192.168.121.42
node1.svc1.disk#0      vgchange --addtag @node1 vgsvc1
node1.svc1.disk#0      output:
node1.svc1.disk#0        Volume group "vgsvc1" successfully changed
node1.svc1.disk#0
node1.svc1.disk#0      vg vgsvc1 is already up
node1.svc1.fs#app      e2fsck -p /dev/mapper/vgsvc1-lvappsvc1
node1.svc1.fs#app      output:
node1.svc1.fs#app      /dev/mapper/vgsvc1-lvappsvc1: clean, 12/25688 files, 8898/102400 blocks
node1.svc1.fs#app
node1.svc1.fs#app      mount -t ext4 /dev/mapper/vgsvc1-lvappsvc1 /svc1/app
node1.svc1.fs#data     e2fsck -p /dev/mapper/vgsvc1-lvdatasvc1
node1.svc1.fs#data     output:
node1.svc1.fs#data     /dev/mapper/vgsvc1-lvdatasvc1: clean, 12/23616 files, 8637/94208 blocks
node1.svc1.fs#data
node1.svc1.fs#data     mount -t ext4 /dev/mapper/vgsvc1-lvdatasvc1 /svc1/data

The startup sequence reads as:

check if service IP address is not already used somewhere
bring up service ip address
volume group activation (if not already in the correct state)
fsck + mount of each filesystem

Manual filesystem mount check:

node1:/ # mount | grep svc1
/dev/mapper/vgsvc1-lvappsvc1 on /svc1/app type ext4 (rw,relatime,seclabel,data=ordered)
/dev/mapper/vgsvc1-lvdatasvc1 on /svc1/data type ext4 (rw,relatime,seclabel,data=ordered)

Manual ip address plumbing check on eth0 (svc1.opensvc.com is 192.168.121.42):

node1:/ # ip addr list eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:a6:c3:d7 brd ff:ff:ff:ff:ff:ff
    inet 192.168.121.249/24 brd 192.168.121.255 scope global dynamic eth0
       valid_lft 2205sec preferred_lft 2205sec
    inet 192.168.121.42/24 brd 192.168.121.255 scope global secondary eth0:1
       valid_lft forever preferred_lft forever

We can confirm everything is OK with the service's print status command:

node1:/ # om svc1 print status
svc1                           up
`- instances
   |- node2                    undef      daemon down
   `- node1                    up         frozen, idle,
      |                                   started
      |- ip#0           ...... up         192.168.121.42@eth0
      |- disk#0         ...... up         vg vgsvc1
      |- fs#app         ...... up         ext4 /dev/mapper/vgsvc1-lvappsvc1@/svc1/app
      |- fs#data        ...... up         ext4 /dev/mapper/vgsvc1-lvdatasvc1@/svc1/data
      `- sync#i0        ..O./. n/a        rsync svc config to drpnodes, nodes
                                                                                    info: paused, service not up

At this point, we have a running service, configured to run on node1 node.

Application Integration¶

We have gone through the setup of a single service, but it does not start applications yet. Let's add an application to our service now.

We will use a very simple example : a tiny webserver with a single index.html file to serve

Application Binary¶

In the service directory structure, we put a standalone binary of the Mongoose web server (https://code.google.com/p/mongoose/)

node1:/ # cd /svc1/app

node1:/svc1/app # wget -O /svc1/app/webserver https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/mongoose/mongoose-lua-sqlite-ssl-static-x86_64-5.1
--2018-01-02 15:56:28--  https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/mongoose/mongoose-lua-sqlite-ssl-static-x86_64-5.1
Resolving storage.googleapis.com (storage.googleapis.com)... 216.58.204.112, 2a00:1450:4007:80a::2010
Connecting to storage.googleapis.com (storage.googleapis.com)|216.58.204.112|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2527016 (2,4M) [application/octet-stream]
Saving to: ‘/svc1/app/webserver’

100%[=====================================================================>] 2 527 016   8,96MB/s   in 0,3s

2018-01-02 15:56:29 (8,96 MB/s) - ‘/svc1/app/webserver’ saved [2527016/2527016]

node1:/svc1/app # ls -l /svc1/app/webserver
-rwxr-xr-x 1 root root 1063420 Feb  1 18:11 /svc1/app/webserver

And create a dummy web page in /svc1/data/, to be served by our webserver:

node1:/svc1/app # cd /svc1/data/

node1:/svc1/data # cat index.html
<html><body>It Works !</body></html>

Applications launcher script¶

We have to create a management script for our web application. At minimum, this script must support the start argument.

As a best practice, the script should also support the additional arguments:

stop
status
info

Of course, we will store our script named weblauncher in the directory previsouly created for this purpose:

node1:/ # cd /svc1/app/init.d

node1:/svc1/app/init.d # cat weblauncher
#!/bin/bash

SVCROOT=/svc1
APPROOT=${SVCROOT}/app
DAEMON=${APPROOT}/webserver
DAEMON_BASE=$(basename $DAEMON)
DAEMONOPTS="-document_root ${SVCROOT}/data -index_files index.html -listening_port 8080"

function status {
        pgrep $DAEMON_BASE >/dev/null 2>&1
}

case $1 in
restart)
        killall $DAEMON_BASE
        $DAEMON
        ;;
start)
        status && {
                echo "already started"
                exit 0
        }
        nohup $DAEMON $DAEMONOPTS >> /dev/null 2>&1 &
        ;;
stop)
        killall $DAEMON_BASE
        ;;
info)
        echo "Name: webserver"
        ;;
status)
        status
        exit $?
        ;;
*)
        echo "unsupported action: $1" >&2
        exit 1
        ;;
esac

Make sure the script is working fine outside of the OpenSVC context:

node1:/svc1/app # ./weblauncher status
node1:/svc1/app # echo $?
1
node1:/svc1/app # ./weblauncher start
node1:/svc1/app # ./weblauncher status
node1:/svc1/app # echo $?
0
node1:/svc1/app # ./weblauncher stop
node1:/svc1/app # ./weblauncher status
node1:/svc1/app # echo $?
1

Now we need to instruct OpenSVC to handle this script for service application management

# om svc1 edit config

(...)
[app#web]
script = weblauncher
start = 10
check = 10
stop = 90

This configuration tells OpenSVC to call the weblauncher script with :

start argument when OpenSVC service starts
stop argument when OpenSVC service stops
status argument when OpenSVC service needs status on application

Now we can give a try to our launcher script, using OpenSVC commands:

node1:~ # om svc1 start --local
node1.svc1.ip#0        192.168.121.42 is already up on eth0
node1.svc1.disk#0      vg vgsvc1 is already up
node1.svc1.fs#app      ext4 /dev/mapper/vgsvc1-lvappsvc1@/svc1/app is already mounted
node1.svc1.fs#data     ext4 /dev/mapper/vgsvc1-lvdatasvc1@/svc1/data is already mounted
node1.svc1.app#web     exec /svc1/app/init.d/weblauncher start as user root
node1.svc1.app#web     start done in 0:00:00.009874 ret 0

We can see that OpenSVC is now calling our startup script after mounting filesystems.

Querying the service status, the app ressource is now reporting up:

node1:~ # om svc1 print status
svc1                           up
`- instances
   |- node2                    undef      daemon down
   `- node1                    up         frozen, idle,
      |                                   started
      |- ip#0           ...... up         192.168.121.42@eth0
      |- disk#0         ...... up         vg vgsvc1
      |- fs#app         ...... up         ext4 /dev/mapper/vgsvc1-lvappsvc1@/svc1/app
      |- fs#data        ...... up         ext4 /dev/mapper/vgsvc1-lvdatasvc1@/svc1/data
      |- app#web        ..../. up         weblauncher
      `- sync#i0        ..O./. n/a        rsync svc config to drpnodes, nodes
                                                                                          info: paused, service not up

Let's check if that is really the case:

node1:/ # ps auxww | grep web
root     18643  0.0  0.0   2540   320 pts/0    S    16:13   0:00 /svc1/app/webserver -document_root /svc1/data -index_files index.html -listening_port 8080

node1:~ # wget -qO - http://svc1.opensvc.com:8080/
<html><body>It Works !</body></html>

Now we can stop our service:

node1:/ # om svc1 stop --local
node1.svc1.app#web     exec /svc1/app/init.d/weblauncher stop as user root
node1.svc1.app#web     stop done in 0:00:00.010940 ret 0
node1.svc1.fs#data     umount /svc1/data
node1.svc1.fs#app      umount /svc1/app
node1.svc1.disk#0      vgchange --deltag @node1 vgsvc1
node1.svc1.disk#0      output:
node1.svc1.disk#0        Volume group "vgsvc1" successfully changed
node1.svc1.disk#0
node1.svc1.disk#0      vgchange -a n vgsvc1
node1.svc1.disk#0      output:
node1.svc1.disk#0        0 logical volume(s) in volume group "vgsvc1" now active
node1.svc1.disk#0
node1.svc1.ip#0        ifconfig eth0:1 down
node1.svc1.ip#0        checking 192.168.121.42 availability

Once again, a single command:

brings down the application
unmounts filesystems
deactivates the volume group
disables the service ip address

The overall status is now reported as being down

node1:/ # om svc1 print status
svc1                           down       warn
`- instances
   |- node2                    undef      daemon down
   `- node1                    down       warn, frozen, idle
      |- ip#0           ...... down       192.168.121.42@eth0
      |- disk#0         ...... down       vg vgsvc1
      |- fs#app         ...... down       ext4 /dev/mapper/vgsvc1-lvappsvc1@/svc1/app
      |- fs#data        ...... down       ext4 /dev/mapper/vgsvc1-lvdatasvc1@/svc1/data
      |- app#web        ..../. down       weblauncher
      `- sync#i0        ..O./. warn       rsync svc config to drpnodes, nodes
                                                                                    warn: passive node needs update

Let's restart the service to continue this tutorial:

node1:/ # om svc1 start --local

At this point, we have a running service on node node1, with a webserver application embedded.

Service Failover¶

Our service is running fine, but what happens if the node1 node fails ? Our svc1 service will also fail. That's why we want to extend the service configuration to declare node2 as a failover node for this service. After this change, the service configuration needs replication to the node2 node.

First, we are going to add node2 in the same cluster than node1

On node1:

$ om cluster get --kw cluster.secret
7e801abaefc611e780a2525400a6c3d7

On node2:

$ om daemon join --secret 7e801abaefc611e780a2525400a6c3d7 --node node1
node2        freeze local node
node2        add heartbeat hb#1
node2        join node node1
node2        thaw local node

$ om mon

Threads                            node1 node2
 hb#1.rx   running 0.0.0.0:10000 | O     /
 hb#1.tx   running               | O     /
 listener  running 0.0.0.0:1214
 monitor   running
 scheduler running

Nodes                              node1 node2
 15m                             | 0.1   0.1
 state                           |       *

Services                           node1 node2
 svc1      up!     failover      | O!*

OpenSVC will synchronize configuration files for your service since this one should be able to run on node1 or node2. In order to force it now, run on node1

# om svc1 sync nodes

The configuration replication will be possible if the following conditions are met:

the new node is declared in the service configuration file <OSVCETC>/svc1.conf (parameter "nodes" in the .conf file)
the node sending config files (node1) is trusted on the new node (node2) (as described in a previous chapter of this tutorial)
the node sending config files (node1) must be running the service (the service availability status, apps excluded, is up).
the previous synchronisation is older than the configured minimum delay, or the --force option is set to bypass the delay check.

On node1

We can now try to start the service on node2, after stopping it on node1:

node1:/ # om svc1  stop

On node2:

node2:~ # om svc1 start --local

Service svc1 is now running on node node2. Service relocation operational is easy as that.

Now, what happens if I try to start my service on node1 while already running on node2 ?

node1:/ # om svc1 start --local
node1.svc1.ip#0        checking 192.168.121.42 availability
node1.svc1           E start aborted due to resource ip#0 conflict
node1.svc1             skip rollback start: no resource activated

Fortunately, OpenSVC IP address check prevent the service from starting on node1.

Note

At this point, we have a 2-node failover cluster. Although this setup meets most needs, the failover is _manual_, so does not qualify as a high availability cluster.

High Availability¶

Now, we have to configure your service to be able to failover without any intervention. You only have to change the orchestration mode to ha. For more information about orchestration : Orchestration

On the node currently running your service, add orchestrate = ha in the DEFAULT section:

# om svc1 edit config

[DEFAULT]
app = MyApp
nodes = node1 node2
orchestrate = ha
(...)

Once this setup is in place, OpenSVC will be able to failover your service.

The last needed step is to define some resources that will trigger relocation. Those resources have to be marked as monitor=True in the service configuration file.

For example:

# om svc1 edit config
(...)
[app#web]
monitor = True
script = weblauncher
start = 10
check = 10
stop = 90

Unfreeze your service to allow the daemon to orchestrate your service:

# om svc1 thaw

Now, if your webserver resource failed, OpenSVC will relocate the service on the other node without any human intervention.