SlapOS Comp1 HA

Purpose

Nexedi SlapOS is a distributed, service oriented, operating system. It is composed of 2 kinds of components, master and slave nodes. SlapOS can afford the loss of master node for a few hours, but the loss of a slave worker node involves a service outage.

This survival guide describes how to deploy and manage a highly available SlapOS slave node, using OpenSVC cluster and Linbit DRBD data replication.

The software stack installation is automated using an Ansible playbook, which configures OpenSVC cluster, deploy Re6st component (ipv6 mesh network, needed by SlapOS), and then deploy SlapOS.

The SlapOS comp1 component is embedded in an OpenSVC service, allowing SlapOS administrator to move the service from one cluster node to another, or also to survive to a server issue (crash, poweroff, …)

Prerequisites

Cluster

Note

for a better understanding, the example logs hereafter will show cluster nodes with hostname demo1 and demo2

Ansible control node

  • Any os supporting ansible

  • Ansible >= 2.13

Note

for a better understanding, the example logs hereafter will show an ansible control node with hostname ansible

Installing

Download example playbook

On ansible control node:

user@ansible $ wget https://raw.githubusercontent.com/opensvc/ansible-collection-osvc/master/examples/slapos/playbook-slap-comp-1.yml

Warning

change the playbook tunables to suits your needs (tokens, hostname, size, …). Check the ansible role documentation.

Install Ansible prerequisites

The collection opensvc.app has to be installed on the control node.

On ansible control node:

user@ansible $ sudo ansible-galaxy collection install opensvc.app
Process install dependency map
Starting collection install process
Installing 'opensvc.app:1.0.0' to '/root/.ansible/collections/ansible_collections/opensvc/app'
Installing 'ansible.posix:1.5.4' to '/root/.ansible/collections/ansible_collections/ansible/posix'
Installing 'opensvc.cluster:1.2.1' to '/root/.ansible/collections/ansible_collections/opensvc/cluster'

Prepare your ansible inventory file to match the target cluster nodes (see example below)

On ansible control node:

[clusternodes]
demo1.acme.com ansible_host="5.200.201.202" ansible_ssh_private_key_file="ssh.private.key" ansible_user=ubuntu ansible_become=true
demo2.acme.com ansible_host="5.200.201.203" ansible_ssh_private_key_file="ssh.private.key" ansible_user=ubuntu ansible_become=true

Running playbook

On ansible control node:

user@ansible $ sudo ansible-playbook -i inventory playbook-slap-comp-1.yml

The control node execute the main playbook, which deals with the following high level tasks

  • assemble the 2 nodes into an OpenSVC cluster

  • configure DRBD on both nodes

  • create the OpenSVC service for SlapOS comp1 feature

  • execute re6st playbook on first node

  • moves the OpenSVC service on second node

  • execute re6st playbook on second node

  • move back the service on first node

  • execute SlapOS playbook on first node

  • moves the OpenSVC service on second node

  • execute SlapOS playbook on second node

  • move back the service on first node

Note

a tarball containing playbook execution logs can be downloaded here

At the end of playbook execution, you should have an operational service:

Checking Status

Cluster

Cluster status can be checked with command om mon

Threads                                 demo1        demo2
daemon           running             |
hb#1.rx          running  [::]:10000 | /            O
hb#1.tx          running             | /            O
listener         running       :1214
monitor          running
scheduler        running

Nodes                                   demo1        demo2
 score                                | 69           70
  load 15m                            | 0.0          0.0
  mem                                 | 15/98%:3.82g 9/98%:3.82g
  swap                                | -            -
 state                                |

*/svc/*                                 demo1        demo2
slapos/svc/comp1 up      ha    1/1   | O^           S

Service

Service status can be checked with command om slapos/svc/comp1 print status

slapos/svc/comp1                      up        
`- instances
|- demo2                           stdby up   idle
`- demo1                           up         idle, started
    |- volume#0            ........ up         comp1-cfg
    |- disk#0              ......S. stdby up   loop /opt/comp1.slapos.svc.hyperopenx.img
    |- disk#1              ......S. stdby up   vg comp1.slapos.svc.hyperopenx
    |- disk#2              ......S. stdby up   lv comp1.slapos.svc.hyperopenx/comp1
    |- disk#3              ......S. stdby up   drbd comp1.slapos.svc.hyperopenx
    |                                          info: Primary
    |- fs#0                ........ up         ext4 /dev/drbd0@/srv/comp1.slapos.svc.hyperopenx
    |- fs#flag             ........ up         fs.flag
    |- fs:binds
    |  |- fs#1             ........ up         bind /srv/comp1.slapos.svc.hyperopenx/re6st/etc/re6stnet@/etc/re6stnet
    |  |- fs#2             ........ up         bind /srv/comp1.slapos.svc.hyperopenx/re6st/var/log/re6stnet@/var/log/re6stnet
    |  |- fs#3             ........ up         bind /srv/comp1.slapos.svc.hyperopenx/re6st/var/lib/re6stnet@/var/lib/re6stnet
    |  |- fs#4             ........ up         bind /srv/comp1.slapos.svc.hyperopenx/slapos/srv/slapgrid@/srv/slapgrid
    |  `- fs#5             ........ up         bind /srv/comp1.slapos.svc.hyperopenx/slapos/etc/opt@/etc/opt
    |- app:re6st
    |  `- app#0            ...../.. up         forking: re6st
    |- app:slapos
    |  `- app#1            ...../.. up         forking: slapos
    |- sync#i0             ...O./.. up         rsync svc config to nodes
    `- task:admin                              //
        |- task#addpart     ...O./.. up         task.host
        |- task#chkaddip    ...O./.. up         task.host
        |- task#collect     ...O./.. up         task.host
        |- task#delpart     ...O./.. up         task.host
        `- task#software    ...O./.. up         task.host

Note

add option -r to force immediate ressource status evaluation (om slapos/svc/comp1 print status -r)

Tasks

SlapOS component need cron jobs to be executed. They have been integrated into OpenSVC tasks. Tasks schedule can be displayed with om slapos/svc/comp1 print schedule

Action              Last Run             Next Run             Config Parameter          Schedule Definition
|- compliance_auto  -                    2023-11-10 03:48:52  DEFAULT.comp_schedule     ~00:00-06:00
|- push_resinfo     -                    2023-11-09 14:34:16  DEFAULT.resinfo_schedule  @60
|- status           2023-11-09 14:25:36  2023-11-09 14:35:36  DEFAULT.status_schedule   @10
|- run              2023-11-09 14:34:10  2023-11-09 14:35:10  task#addpart.schedule     @1m
|- run              2023-11-09 14:28:10  2023-11-09 15:28:10  task#chkaddip.schedule    @60m
|- run              2023-11-09 14:34:10  2023-11-09 14:35:10  task#collect.schedule     @1m
|- run              2023-11-09 14:28:10  2023-11-09 15:28:10  task#delpart.schedule     @60m
|- run              2023-11-09 14:34:10  2023-11-09 14:35:10  task#software.schedule    @1m
`- sync_all         2023-11-09 14:05:58  2023-11-09 15:05:58  sync#i0.schedule          @60

Management commands

Starting service

om slapos/svc/comp1 start

Relocating service

om slapos/svc/comp1 switch

Stopping service

om slapos/svc/comp1 stop

Fetching service config

om slapos/svc/comp1 print config

Editing service config

om slapos/svc/comp1 edit config

Notes

  • This deployment is still work in progress and need to be reworked

    • add more storage options

    • check ipv6 routes prerequisite for slapos installer

    • container implementation (lxc ? docker?)

    • configure api for external management

    • add more heartbeats