.. _agent.daemon.heartbeats: Agent Heartbeats **************** Hearbeats serve the following purposes: * Exchange data between cluster nodes * Detect stale nodes * Execute the quorum race when a peer becomes stale for its last heartbeat OpenSVC supports multiple parallel running heartbeats. Exercizing different code paths and infrastructure data paths (network and storage switchs and site interconnects) helps limiting split-brain situation occurences. Heartbeats are declared in ``/node.conf`` or ``/cluster.conf``, each in a dedicated section named :kw:`[hb#]`. A heartbeat definition should work on all nodes, using scoped keywords if necessary, as the definition are served by the joined node to the joining nodes. Each heartbeat runs two threads janitored by the agent daemon: **tx** and **rx**. The :cmd:`om daemon status` and :cmd:`om mon` commands display the heartbeats status, statistics and each peer state .. raw:: html
	$ sudo svcmon
	Threads                                 aubergine clementine nuc 
	 hb#1.rx    running 224.3.29.71:10001 | /         X          X   
	 hb#1.tx    running 224.3.29.71:10001 | /         O          O   
	 hb#2.rx    running 0.0.0.0:10004     | /         X          O   
	 hb#2.tx    running                   | /         X          O   
	...

	# X: Stale
	# O: Sent/Received
	# /: Not Applicable
	
Maintenance ----------- The heartbeat threads are restarted by the agent daemon if they exit uncleanly. .. container:: lvl1 Any command causing a timestamp change on ``/node.conf`` or ``/cluster.conf`` triggers heartbeats reconfiguration: * Modified parameters are applied * Heartbeats removed in the configuration are stopped * Heartbeats added in the configuration are started Stop a heartbeat thread +++++++++++++++++++++++ :: om daemon stop --thread-id hb#1.tx Stop the ``hb#1`` tx thread. The thread state transitions from ``running`` to ``stopped``. Start a heartbeat thread ++++++++++++++++++++++++ :: om daemon start --thread-id hb#1.tx Start the ``hb#1`` tx thread. The thread state transitions from ``stopped`` to ``running``. Set a heartbeat timeout +++++++++++++++++++++++ :: om cluster set --kw hb#1.timeout=20 Drop a heartbeat ++++++++++++++++ :: om cluster delete --rid hb#1 .. rst-class:: lvl1 Threads ------- Tx ++ * Transmit data prepared by the monitor thread, regularly or as soon as possible after a data change * The data is pre-encrypted by the monitor thread Rx ++ * Receive data regularly (disk) or in response to a send (unicast and multicast) * Update the received peer data in the cluster data * Wake the monitor thread * Merge the updated peer data to update the cluster data attributes produced by node data aggregation * Run the node orchestrator: handle freeze/thaw requests propagated through the heartbeats * Run each service orchestrator: handle service instance start/stop/freeze/thaw transitions, and monitored resources restart * Fetch service configuration files from node node with the most recent version * If no heartbeat is received from a peer during :kw:`.timeout`, the peer is marked stale in this rx thread. The default timeout is 15 seconds. * If all rx threads have a peer marked stale, the peer data is purged from the cluster data * immediately if the stale peer has not advertized a maintenance * after :kw:`node.maintenance grace_period` if the stale peer has advertized a maintenance .. seealso:: :ref:`agent.cluster.data` Heartbeat Drivers ----------------- Unicast +++++++ This driver sends and receives using TCP unicast packets. :: [hb#1] type = unicast With this simplest definition: * The rx thread listens on 0.0.0.0:10000 * The tx thread sends to :10000 A most precise definition would be :: [hb#1] type = unicast intf@node1 = eth0 intf@node2 = eth2 addr@node1 = 1.2.3.4 addr@node2 = 1.2.3.5 port@node1 = 10001 port@node2 = 10002 timeout = 15 Multicast +++++++++ This driver sends and receive using UDP multicast packets. :: [hb#2] type = multicast With this simplest definition: * The rx thread listens on all interfaces on port 10000 * The tx thread sends to 224.3.29.71:10000 A most precise definition would be :: [hb#2] type = multicast intf@node1 = eth0 intf@node2 = eth2 addr = 224.3.29.71 port = 10001 timeout = 15 Addr and port are not scopable. Disk ++++ This driver reads and writes on a dedicated disk, using O_DIRECT|O_SYNC|O_DSYNC on a block device on Linux. Other operating systems must use raw char device. * The rx thread loops over peer nodes and for each reads its heartbeat data at its reserved slot device offset * The tx thread writes to its reserved slot offset on the device :: [hb#2] type = disk dev = /dev/mapper/3123412312412414214 timeout = 15 .. container:: lvl2 When the tx and rx threads are started or reconfigured, they parse a metadata segment at the head of the device and prepare a : hash. The metadata zone maximum size is 4MB. A node metadata slot size is 4k, and contains the cluster node name. Limits: * 1000 nodes (metadata zone size/slot meta data size) * nodenames are limited to 4k characters (slot meta data size) * A -nodes cluster requires a (+1)*4MB device * The heartbeat data (which is gziped) must not exceed 4MB (slot size). A 10 services cluster usually produces ~3k messages. If a the local nodename is not found in any slot, the thread allocates one. Relay +++++ This driver reads and writes on a remote opensvc agent memory. The relay listener
: must be reachable from all cluster nodes in normal operations. A relay should be located in a site hosting no other node of the cluster, so this heartbeat can avoid a split when the sites hosting cluster nodes are isolated, but can still reach the relay's site. The same relay can be used as heartbeat in different clusters. The relay host can also be used as an arbitrator. * The rx thread loops over peer nodes and for each requests its heartbeat data from the relay * The tx thread sends to the relay :: [hb#2] type = relay relay = relay1 timeout = 15 secret = 1023102310230123123