Events¶

Event Types¶

The OpenSVC daemon generates two types of events.

Event kind “patch”¶

Changes in the cluster monitors data, presented as json patches.

Those are generated upon merging cluster nodes data. They are the most frequent kind of payload exchanged through the heartbeats.

Example:

{
  "nodename": "aubergine",                            # cluster node
  "kind": "patch",
  "data": [
    [
      ["services", "status", "tstscaler", "monitor"], # key
      {                                               # new value
        "status": "scaling",
        "status_updated": 1539074932.4393582,
        "global_expect_updated": 1539074931.857869,
        "global_expect": null,
        "placement": "leader"
      }
    ]
  ]
}

Event kind “event”¶

These events are not cluster-wide. They are generated by the daemon threads on critical state changes and orchestration decisions on its local objects.

These events have a dictionnary in the “data” key, with the following sub-keys:

id: The event id
reason: An event id can be triggered for different reasons, in which case the reason key might be provided to explain the situation.
svcname: if specified, the event concerns a service, in which case a snapshot of the service data is also provided in the “service” and “instance” keys.
monitor: a snapshot of the node monitor states

Example:

{
  "nodename": "aubergine",
  "kind": "event",
  "data": {
    "id": "instance_thaw",                       # event id
    "reason": "target",                          # event reason
    "svcname": "ha1",
    "monitor": {                                 # node monitor states
      "status": "idle",
      "status_updated": 1539074255.1265483
    },
    "service": {                                 # service aggregated states
      "avail": "up",
      "frozen": "frozen",
      "overall": "warn",
      "placement": "optimal",
      "provisioned": true
    },
    "instance": {                                # service instance states
      "updated": "2018-10-09T08:59:00.317291Z",
      "mtime": 1539075540.317291,
      "app": "default",
      "env": "DEV",
      "placement": "spread",
      "topology": "flex",
      "provisioned": true,
      "running": [],
      "flex_min_nodes": 1,
      "flex_max_nodes": 2,
      "frozen": true,
      "orchestrate": "ha",
      "status_group": {
        "fs": "n/a",
        "ip": "up",
        "task": "n/a",
        "app": "n/a",
        "sync": "n/a",
        "disk": "n/a",
        "container": "n/a",
        "share": "n/a"
      },
      "overall": "warn",
      "avail": "up",
      "optional": "n/a",
      "csum": "95b8b5a953d16be504999612d0159949",
      "monitor": {                               # service instance monitor states
        "status": "idle",
        "status_updated": 1539074254.7616527,
        "global_expect_updated": 1539075568.6204853,
        "local_expect": "started",
        "global_expect": "thawed",
        "placement": ""
      }
    }
  }
}

Daemon Events¶

Id `arbitrator_up`¶

arbitrator {arbitrator} is now reachable

Id `arbitrator_down`¶

arbitrator {arbitrator} is no longer reachable

Id `blacklist_add`¶

sender {sender} blacklisted

Id `hb_beating`¶

node {nodename} hb status stale => beating

Id `hb_stale`¶

node {nodename} hb status beating => stale

Id `node_config_change`¶

node config change

Id `node_thaw`¶

thaw node

Id `max_resource_restart`¶

max restart ({restart}) reached for resource {rid} ({resource.label})

Id `max_stdby_resource_restart`¶

max restart ({restart}) reached for standby resource {rid} ({resource.label})

Id `monitor_started`¶

monitor started

Id `resource_toc`¶

toc for resource {rid} ({resource.label}) {resource.status} {resource.log}

Id `resource_degraded`¶

resource {rid} ({resource.label}) degraded to {resource.status} {resource.log}

Id `resource_restart`¶

restart resource {rid} ({resource.label}) {resource.status} {resource.log}, try {try}/{restart}

Id `stdby_resource_restart`¶

start standby resource {rid} ({resource.label}) {resource.status} {resource.log}, try {try}/{restart}

Id `service_config_installed`¶

config fetched from node {from} is now installed

Id `scale_up`¶

misses {delta} instance to reach scale target {instance.scale}

Id `scale_down`¶

exceeds {delta} instance to reach scale target {instance.scale}

Id `crash`, Reason `split`¶

cluster is split, we don’t have quorum: {node_votes}+{arbitrator_votes}/{voting} votes {pro_voters}

Id `forget_peer`, Reason `no_rx`¶

no rx thread still receive from node {peer} and maintenance grace period expired. flush its data

Id `instance_abort`, Reason `target`¶

abort {instance.topology} {instance.avail} instance {instance.monitor.local_expect} action to satisfy the {instance.monitor.global_expect} target

Id `instance_delete`, Reason `target`¶

delete {instance.topology} {instance.avail} instance to satisfy the {instance.monitor.global_expect} target

Id `instance_freeze`, Reason `install`¶

freeze instance on install

Id `instance_freeze`, Reason `merge_frozen`¶

freeze instance on rejoin because instance on {peer} is frozen

Id `instance_freeze`, Reason `target`¶

freeze instance to satisfy the {instance.monitor.global_expect} target

Id `instance_provision`, Reason `target`¶

provision {instance.topology} {instance.avail} instance to satisfy the {instance.monitor.global_expect} target

Id `instance_purge`, Reason `target`¶

purge {instance.topology} {instance.avail} instance to satisfy the {instance.monitor.global_expect} target

Id `instance_start`, Reason `from_ready`¶

start {instance.topology} {instance.avail} instance ready for {since} seconds

Id `instance_start`, Reason `single_node`¶

start idle single node {instance.avail} instance

Id `instance_start`, Reason `target`¶

start {instance.topology} {instance.avail} instance to satisfy the {instance.monitor.global_expect} target

Id `instance_stop`, Reason `flex_threshold`¶

stop {instance.topology} {instance.avail} instance to meet threshold constraints: {up}/{instance.flex_target}

Id `instance_stop`, Reason `target`¶

stop {instance.topology} {instance.avail} instance to satisfy the {instance.monitor.global_expect} target

Id `instance_thaw`, Reason `target`¶

thaw instance to satisfy the {instance.monitor.global_expect} target

Id `instance_unprovision`, Reason `target`¶

unprovision {instance.topology} {instance.avail} instance to satisfy the {instance.monitor.global_expect} target

Id `node_freeze`, Reason `kern_freeze`¶

freeze node due to kernel cmdline flag.

Id `node_freeze`, Reason `merge_frozen`¶

freeze node, node {peer} was frozen while we were down

Id `node_freeze`, Reason `rejoin_expire`¶

freeze node, the cluster is not complete on rejoin grace period expiration

Id `node_freeze`, Reason `target`¶

freeze node

Id `node_freeze`, Reason `upgrade`¶

freeze node for upgrade until the cluster is complete

Id `node_thaw`, Reason `upgrade`¶

thaw node after upgrade, the cluster is complete

Id `reboot`, Reason `split`¶

cluster is split, we don’t have quorum: {node_votes}+{arbitrator_votes}/{voting} votes {pro_voters}

Id `resource_would_toc`, Reason `no_candidate`¶

would toc for resource {rid} ({resource.label}) {resource.status} {resource.log}, but no node is candidate for takeover.

Hooks¶

Custom scripts can be executed on events. These hooks are defined in the node configuration file.

Example:

[hook#1]
events = all
command = /root/on_event

Events are specified by id only. The keyword accepts multiple ids formatted as a comma-separated list, or the all special value.

The script referenced by the command keyword can get the whole event data on stdin.

Hooks executions are logged in the node log.

Watching Events¶

In human readable format:

om node events

In machine readable format:

om node events --format json

Waiting for data change events¶

# test the filter
$ om daemon status --filter "monitor.nodes.nuc-cva.frozen"
0

# already on target => return immediately
$ om node wait --filter "monitor.nuc-cva.frozen=0" --duration 1s

$ echo $?
0

# not going on target => timeout
$ om node wait --filter "monitor.nuc-cva.frozen" --duration 1s
timeout

$ echo $?
1

Events¶

Event Types¶

Event kind “patch”¶

Event kind “event”¶

Daemon Events¶

Id arbitrator_up¶

Id arbitrator_down¶

Id blacklist_add¶

Id hb_beating¶

Id hb_stale¶

Id node_config_change¶

Id node_thaw¶

Id max_resource_restart¶

Id max_stdby_resource_restart¶

Id monitor_started¶

Id resource_toc¶

Id resource_degraded¶

Id resource_restart¶

Id stdby_resource_restart¶

Id service_config_installed¶

Id scale_up¶

Id scale_down¶

Id crash, Reason split¶

Id forget_peer, Reason no_rx¶

Id instance_abort, Reason target¶

Id instance_delete, Reason target¶

Id instance_freeze, Reason install¶

Id instance_freeze, Reason merge_frozen¶

Id instance_freeze, Reason target¶

Id instance_provision, Reason target¶

Id instance_purge, Reason target¶

Id instance_start, Reason from_ready¶

Id instance_start, Reason single_node¶

Id instance_start, Reason target¶

Id instance_stop, Reason flex_threshold¶

Id instance_stop, Reason target¶

Id instance_thaw, Reason target¶

Id instance_unprovision, Reason target¶

Id node_freeze, Reason kern_freeze¶

Id node_freeze, Reason merge_frozen¶

Id node_freeze, Reason rejoin_expire¶

Id node_freeze, Reason target¶

Id node_freeze, Reason upgrade¶

Id node_thaw, Reason upgrade¶

Id reboot, Reason split¶

Id resource_would_toc, Reason no_candidate¶

Hooks¶

Watching Events¶

Waiting for data change events¶

Id `arbitrator_up`¶

Id `arbitrator_down`¶

Id `blacklist_add`¶

Id `hb_beating`¶

Id `hb_stale`¶

Id `node_config_change`¶

Id `node_thaw`¶

Id `max_resource_restart`¶

Id `max_stdby_resource_restart`¶

Id `monitor_started`¶

Id `resource_toc`¶

Id `resource_degraded`¶

Id `resource_restart`¶

Id `stdby_resource_restart`¶

Id `service_config_installed`¶

Id `scale_up`¶

Id `scale_down`¶

Id `crash`, Reason `split`¶

Id `forget_peer`, Reason `no_rx`¶

Id `instance_abort`, Reason `target`¶

Id `instance_delete`, Reason `target`¶

Id `instance_freeze`, Reason `install`¶

Id `instance_freeze`, Reason `merge_frozen`¶

Id `instance_freeze`, Reason `target`¶

Id `instance_provision`, Reason `target`¶

Id `instance_purge`, Reason `target`¶

Id `instance_start`, Reason `from_ready`¶

Id `instance_start`, Reason `single_node`¶

Id `instance_start`, Reason `target`¶

Id `instance_stop`, Reason `flex_threshold`¶

Id `instance_stop`, Reason `target`¶

Id `instance_thaw`, Reason `target`¶

Id `instance_unprovision`, Reason `target`¶

Id `node_freeze`, Reason `kern_freeze`¶

Id `node_freeze`, Reason `merge_frozen`¶

Id `node_freeze`, Reason `rejoin_expire`¶

Id `node_freeze`, Reason `target`¶

Id `node_freeze`, Reason `upgrade`¶

Id `node_thaw`, Reason `upgrade`¶

Id `reboot`, Reason `split`¶

Id `resource_would_toc`, Reason `no_candidate`¶