Events¶
Event Types¶
The OpenSVC daemon generates two types of events.
Event kind "patch"¶
Changes in the cluster monitors data, presented as json patches.
Those are generated upon merging cluster nodes data. They are the most frequent kind of payload exchanged through the heartbeats.
Example:
{
"nodename": "aubergine", # cluster node
"kind": "patch",
"data": [
[
["services", "status", "tstscaler", "monitor"], # key
{ # new value
"status": "scaling",
"status_updated": 1539074932.4393582,
"global_expect_updated": 1539074931.857869,
"global_expect": null,
"placement": "leader"
}
]
]
}
Event kind "event"¶
These events are not cluster-wide. They are generated by the daemon threads on critical state changes and orchestration decisions on its local objects.
These events have a dictionnary in the "data" key, with the following sub-keys:
- id: The event id
- reason: An event id can be triggered for different reasons, in which case the reason key might be provided to explain the situation.
- svcname: if specified, the event concerns a service, in which case a snapshot of the service data is also provided in the "service" and "instance" keys.
- monitor: a snapshot of the node monitor states
Example:
{
"nodename": "aubergine",
"kind": "event",
"data": {
"id": "instance_thaw", # event id
"reason": "target", # event reason
"svcname": "ha1",
"monitor": { # node monitor states
"status": "idle",
"status_updated": 1539074255.1265483
},
"service": { # service aggregated states
"avail": "up",
"frozen": "frozen",
"overall": "warn",
"placement": "optimal",
"provisioned": true
},
"instance": { # service instance states
"updated": "2018-10-09T08:59:00.317291Z",
"mtime": 1539075540.317291,
"app": "default",
"env": "DEV",
"placement": "spread",
"topology": "flex",
"provisioned": true,
"running": [],
"flex_min_nodes": 1,
"flex_max_nodes": 2,
"frozen": true,
"orchestrate": "ha",
"status_group": {
"fs": "n/a",
"ip": "up",
"task": "n/a",
"app": "n/a",
"sync": "n/a",
"disk": "n/a",
"container": "n/a",
"share": "n/a"
},
"overall": "warn",
"avail": "up",
"optional": "n/a",
"csum": "95b8b5a953d16be504999612d0159949",
"monitor": { # service instance monitor states
"status": "idle",
"status_updated": 1539074254.7616527,
"global_expect_updated": 1539075568.6204853,
"local_expect": "started",
"global_expect": "thawed",
"placement": ""
}
}
}
}
Daemon Events¶
Id blacklist_add
¶
sender {sender} blacklisted
Id hb_stale
¶
node {nodename} hb status beating => stale
Id max_stdby_resource_restart
¶
max restart ({restart}) reached for standby resource {rid} ({resource.label})
Id resource_degraded
¶
resource {rid} ({resource.label}) degraded to {resource.status} {resource.log}
Id resource_restart
¶
restart resource {rid} ({resource.label}) {resource.status} {resource.log}, try {try}/{restart}
Id arbitrator_up
¶
arbitrator {arbitrator} is now reachable
Id monitor_started
¶
monitor started
Id scale_up
¶
misses {delta} instance to reach scale target {instance.scale}
Id scale_down
¶
exceeds {delta} instance to reach scale target {instance.scale}
Id service_config_installed
¶
config fetched from node {from} is now installed
Id stdby_resource_restart
¶
start standby resource {rid} ({resource.label}) {resource.status} {resource.log}, try {try}/{restart}
Id arbitrator_down
¶
arbitrator {arbitrator} is no longer reachable
Id max_resource_restart
¶
max restart ({restart}) reached for resource {rid} ({resource.label})
Id node_thaw
¶
thaw node
Id node_config_change
¶
node config change
Id hb_beating
¶
node {nodename} hb status stale => beating
Id resource_toc
¶
toc for resource {rid} ({resource.label}) {resource.status} {resource.log}
Id crash
, Reason split
¶
cluster is split, we don't have quorum: {node_votes}+{arbitrator_votes}/{voting} votes {pro_voters}
Id forget_peer
, Reason no_rx
¶
no rx thread still receive from node {peer} and maintenance grace period expired. flush its data
Id instance_abort
, Reason target
¶
abort {instance.topology} {instance.avail} instance {instance.monitor.local_expect} action to satisfy the {instance.monitor.global_expect} target
Id instance_delete
, Reason target
¶
delete {instance.topology} {instance.avail} instance to satisfy the {instance.monitor.global_expect} target
Id instance_freeze
, Reason install
¶
freeze instance on install
Id instance_freeze
, Reason merge_frozen
¶
freeze instance on rejoin because instance on {peer} is frozen
Id instance_freeze
, Reason target
¶
freeze instance to satisfy the {instance.monitor.global_expect} target
Id instance_provision
, Reason target
¶
provision {instance.topology} {instance.avail} instance to satisfy the {instance.monitor.global_expect} target
Id instance_purge
, Reason target
¶
purge {instance.topology} {instance.avail} instance to satisfy the {instance.monitor.global_expect} target
Id instance_start
, Reason from_ready
¶
start {instance.topology} {instance.avail} instance ready for {since} seconds
Id instance_start
, Reason single_node
¶
start idle single node {instance.avail} instance
Id instance_start
, Reason target
¶
start {instance.topology} {instance.avail} instance to satisfy the {instance.monitor.global_expect} target
Id instance_stop
, Reason flex_threshold
¶
stop {instance.topology} {instance.avail} instance to meet threshold constraints: {up}/{instance.flex_target}
Id instance_stop
, Reason target
¶
stop {instance.topology} {instance.avail} instance to satisfy the {instance.monitor.global_expect} target
Id instance_thaw
, Reason target
¶
thaw instance to satisfy the {instance.monitor.global_expect} target
Id instance_unprovision
, Reason target
¶
unprovision {instance.topology} {instance.avail} instance to satisfy the {instance.monitor.global_expect} target
Id node_freeze
, Reason kern_freeze
¶
freeze node due to kernel cmdline flag.
Id node_freeze
, Reason merge_frozen
¶
freeze node, node {peer} was frozen while we were down
Id node_freeze
, Reason rejoin_expire
¶
freeze node, the cluster is not complete on rejoin grace period expiration
Id node_freeze
, Reason target
¶
freeze node
Id node_freeze
, Reason upgrade
¶
freeze node for upgrade until the cluster is complete
Id node_thaw
, Reason upgrade
¶
thaw node after upgrade, the cluster is complete
Id resource_would_toc
, Reason no_candidate
¶
would toc for resource {rid} ({resource.label}) {resource.status} {resource.log}, but no node is candidate for takeover.
Hooks¶
Custom scripts can be executed on events. These hooks are defined in the node configuration file.
Example:
[hook#1]
events = all
command = /root/on_event
Events are specified by id only. The keyword accepts multiple ids formatted as a comma-separated list, or the all
special value.
The script referenced by the command
keyword can get the whole event data on stdin.
Hooks executions are logged in the node log.
Watching Events¶
In human readable format:
om node events
In machine readable format:
om node events --format json
Waiting for data change events¶
# test the filter
$ om daemon status --filter "monitor.nodes.nuc-cva.frozen"
0
# already on target => return immediately
$ om node wait --filter "monitor.nuc-cva.frozen=0" --duration 1s
$ echo $?
0
# not going on target => timeout
$ om node wait --filter "monitor.nuc-cva.frozen" --duration 1s
timeout
$ echo $?
1