Sozu HA

Purpose

Sozu is a http reverse proxy from Clever Cloud, used by Cloud providers and PaaS provider to drive their moving Web infrastructure.

This survival guide describes how to deploy and manage a highly available Sozu setup, using OpenSVC cluster.

Prerequisites

Cluster

  • 2 linux nodes Ubuntu 22.04 LTS

  • docker/podman

  • inotify-tools

Note

opensvc cluster installation is not covered in this howto

Installing

Deploy the sozu config map

A config map is used to store sozu related files like configuration, admin script.

om igw/cfg/sozu create
om igw/cfg/sozu add --key config.toml --from https://raw.githubusercontent.com/opensvc/opensvc_templates/main/sozu/config.toml
om igw/cfg/sozu add --key watch_directory.sh --from https://raw.githubusercontent.com/opensvc/opensvc_templates/main/sozu/watch_directory.sh
om igw/cfg/sozu add --key state.json

Deploy the sozu secret map

A secret is used to host the ssl certificates served by the sozu deployment.

om igw/sec/sozu create
om igw/sec/sozu add --key cert.pem --from /path/to/cert.pem
om igw/sec/sozu add --key key.pem --from /path/to/key.pem
om igw/sec/sozu add --key chain.pem --from /path/to/chain.pem

Note

Many certs can be added in this secret object. They will be available in the /certs folder inside the container

Deploy the sozu service

om igw/svc/sozu deploy --config https://raw.githubusercontent.com/opensvc/opensvc_templates/main/sozu/sozu.conf

How it works

The sozu HA deployment is an active/active opensvc service. One sozu instance per cluster node.

The initial configuration is loaded from config map key « config.toml ». The sozu configuration changes are stored in the state.json file on disk (automatic_state_save = true in config.toml file), which is copied back into the configmap object, and then replicated to other cluster nodes. Finally the remote sozu instances load the new state.json file to apply the changed configuration. This mecanism allow both instances to stay in sync.

Avertissement

this ha setup is working but has not been tested on large scale deployments

Checking Status

Cluster

Cluster status can be checked with command om mon

Threads                                      demo1       demo2
daemon         running                    | O
collector      running                    | O
dns            running
hb#1.rx        running         [::]:10000 | /           O
hb#1.tx        running                    | /           O
hb#2.rx        running relay2.opensvc.com | /           O
hb#2.tx        running                    | /           O
listener       running              :1214
monitor        running
scheduler      running

Nodes                                        demo1       demo2
 score                                     | 70          70
  load 15m                                 | 0.0         0.0
  mem                                      | 9/98%:3.82g 9/98%:3.82g
  swap                                     | -           -
 state                                     |

*/svc/*                                      demo1       demo2
igw/svc/sozu   up             ha    2/2   | O^          O^
system/svc/vip up             ha    1/1   | O^          X

Service

Service status can be checked with command om igw/svc/sozu print status

    igw/svc/sozu               up        
    `- instances
    |- demo2                   up         idle, started
    `- demo1                   up         idle, started
    |- volume#cfg     ........ up         sozu-cfg
    |- volume#scripts ........ up         sozu-scripts
    |- fs#flag        ........ up         fs.flag
    |- container#0    ...../.. up         docker google/pause
    |- container#1    ...../.3 up         docker clevercloud/sozu:d7b23c9fe877394cc3f2130d2fe5e76274dbf6c0
    |- app#watch      ...../.. up         forking: watch_directory.sh
    |- sync#i0        ..DO./.. n/a        rsync svc config to nodes
    `- task#stateload ...O./.. n/a        docker clevercloud/sozu:d7b23c9fe877394cc3f2130d2fe5e76274dbf6c0

Note

add option -r to force immediate ressource status evaluation (om igw/svc/sozu print status -r)

Management commands

Starting service

om igw/svc/sozu start

Stopping service (all instances)

om igw/svc/sozu stop

Stopping service (1 instance)

om igw/svc/sozu stop --local

Fetching service config

om igw/svc/sozu print config

Editing service config

om igw/svc/sozu edit config

Listing config map keys

om igw/cfg/sozu keys

Editing config map key

om igw/cfg/sozu edit --key config.toml

Example

In the logs below, we can see that a config change on first node is replicated to the other sozu instance in a few seconds.

On node demo1

root@demo1:~# om igw/cfg/sozu decode --key state.json | wc -l
0

root@demo1:~# om igw/svc/sozu enter --rid container#1
OCI runtime exec failed: exec failed: unable to start container process: exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown
/ # sozu -c /etc/sozu/config.toml cluster list
2024-07-01T14:07:24.203948Z 1719842844203948235 24 CTL INFO     Ran the query successfully
Success: Ran the query successfully
┌────────────┬─────────────────────┬─────────────────────┬─────────────────────┬────────────────┐
│ cluster id │ worker 0            │ worker 1            │ worker main         │ desynchronized │
├────────────┼─────────────────────┼─────────────────────┼─────────────────────┼────────────────┤
│ MyCluster  │ 7980951202874738186 │ 7980951202874738186 │ 7980951202874738186 │                │
├────────────┼─────────────────────┼─────────────────────┼─────────────────────┼────────────────┤
│ TcpTest    │ 4135859621253794451 │ 4135859621253794451 │ 4135859621253794451 │                │
└────────────┴─────────────────────┴─────────────────────┴─────────────────────┴────────────────┘

/ # sozu -c /etc/sozu/config.toml cluster add --id NEW_CLUSTER --load-balancing-policy round_robin
2024-07-01T14:07:55.387784Z 1719842875387784851 25 CTL INFO     Successfully executed the request on all workers
Success: Successfully executed the request on all workers
No content

/ # sozu -c /etc/sozu/config.toml cluster list
2024-07-01T14:07:59.218130Z 1719842879218130199 26 CTL INFO     Ran the query successfully
Success: Ran the query successfully
┌─────────────┬──────────────────────┬──────────────────────┬──────────────────────┬────────────────┐
│ cluster id  │ worker 0             │ worker 1             │ worker main          │ desynchronized │
├─────────────┼──────────────────────┼──────────────────────┼──────────────────────┼────────────────┤
│ NEW_CLUSTER │ 17225215009394938232 │ 17225215009394938232 │ 17225215009394938232 │                │
├─────────────┼──────────────────────┼──────────────────────┼──────────────────────┼────────────────┤
│ MyCluster   │ 7980951202874738186  │ 7980951202874738186  │ 7980951202874738186  │                │
├─────────────┼──────────────────────┼──────────────────────┼──────────────────────┼────────────────┤
│ TcpTest     │ 4135859621253794451  │ 4135859621253794451  │ 4135859621253794451  │                │
└─────────────┴──────────────────────┴──────────────────────┴──────────────────────┴────────────────┘
/ # exit

root@demo1:~# om igw/cfg/sozu decode --key state.json | wc -l
16
root@demo1:~#

On node demo2

root@demo2:~# om igw/svc/sozu enter --rid container#1
OCI runtime exec failed: exec failed: unable to start container process: exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown
/ #
/ # sozu -c /etc/sozu/config.toml cluster list
2024-07-01T14:08:09.812256Z 1719842889812256391 24 CTL INFO     Ran the query successfully
Success: Ran the query successfully
┌─────────────┬──────────────────────┬──────────────────────┬──────────────────────┬────────────────┐
│ cluster id  │ worker 0             │ worker 1             │ worker main          │ desynchronized │
├─────────────┼──────────────────────┼──────────────────────┼──────────────────────┼────────────────┤
│ NEW_CLUSTER │ 17225215009394938232 │ 17225215009394938232 │ 17225215009394938232 │                │
├─────────────┼──────────────────────┼──────────────────────┼──────────────────────┼────────────────┤
│ TcpTest     │ 4135859621253794451  │ 4135859621253794451  │ 4135859621253794451  │                │
├─────────────┼──────────────────────┼──────────────────────┼──────────────────────┼────────────────┤
│ MyCluster   │ 7980951202874738186  │ 7980951202874738186  │ 7980951202874738186  │                │
└─────────────┴──────────────────────┴──────────────────────┴──────────────────────┴────────────────┘
/ #