Forest Bus Administration

Configuration and monitoring of Forest Bus

Topics

The following topics are covered by this administration guide:

forest-bus-server startup options
forest-admin usage
Clustering considerations
Monitoring

forest-bus-server

The forest-bus-server command supports the following options:

-name	The network name and port (e.g :3000) for the server to listen for incoming connections on. Both other server connections (when configured as a cluster) and Go client connections are served on this port. Required
-path	The location of the configuration and data storage path to be used by this forest-bus-server. Currently all data for all topics is stored under this path, in future releases distributing the topic data across different paths will be supported. Required
-id	The Cluster ID that this server is part of. Once set this cannot be easily changed. If the server is started up with a different Cluster ID to the one recorded under -path from a previous run, forest-bus-server will not start. The use of Cluster ID is optional but recommended.
-cbor	This is the network name and port (e.g. :5000) that forest-bus-server will listen on for incoming CBOR RPC connections. Java and Python clients connect to forest-bus-server using CBOR RPC. If you are only using Go clients and / or the forest-send and forest-get clients then this is not required.
-http	The network name and port to publish server metrics on (e.g. :8000). A single URL, `/debug/vars`, is published at this address and contains a JSON object with server variable and performance information. This is used by the Forest Bus Monitor and can be used for other monitoring tool integration.
-gob	An optional alternative network name and port (e.g. :6000) for Go clients to connect to using GOB RPC. This is optional and only required if you want to separate forest-bus-server peer connections from client connections.

forest-admin

The forest-admin command is used to distribute Forest Bus configuration information to one or more servers.

General Usage

The overall usage of the forest-admin command is: forest-admin [-id <clusterID>] [-node node1[,...]] command [arguments]

The general usage of the -id and -node flags is:

-id	The Cluster ID of the server(s) that the configuration change is going to be made on. This needs to match the Cluster ID given on the command line to the forest-bus-server.
-node	A comma separated list of forest-bus-server nodes (e.g. localhost:3000,localhost:3001) that the sub-command should be sent to.

peers

forest-admin -id <clusterID>] [-node node1[,...]] peers peer1[,...]

The peers subcommand is used to tell a server the list of nodes that makes up a cluster. It takes one parameter, a . Even if a cluster only has one memeber, it needs to have the forest-admin peers command run against it. If the -id option is used with peers, a file will be created called ClusterID.nodelist containing the list of peers passed to forest-admin.

There are two special cases of using forest-admin:

Removing a server from a cluster. To do this it is important that the server being removed receives the updated list of peers, not just those that remain. If -node is used to specify the servers then only those listed with -node will receive the updated peer list. If a cluster ID has been used and a .nodelist file exists, forest-admin will send the new peerlist to all nodes in the .nodelist file as well as those listed in the peerlist. Once forest-admin has been used to remove a server from a cluster, that forest-bus-server can be stopped.
Adding a server to a cluster. In this case both the new server and existing nodes need to know that the new server is being added. If adding servers to an existing cluster it is recommended to do this one at a time. See the next section on Clustering for other considerations to take into account.

topic

forest-admin -id <clusterID>] [-node node1[,...]] topic <topicName> \ [-segmentSize=<target size in MB>] \ [-sgementCleanupAge=<age in hours>]

The topic subcommand allows the administrator to:

Create a new topic. If the topicName doesn't already exist on any of the given nodes, this command will create it.
Change the target segmentSize for a topic. Forest Bus stores topic data in a series of segment files. As these files grow, Forest Bus will close off one file and add another. The default size for triggering the creation of a new file is 1024MB (1GB). If a large volume of large messages are being added to a topic it is beneficial to increase this value to reduce the number of segments files required.
Set a cleanup policy. The default policy is that all messages sent to a topic are retained indefinitely. If a -segmentCleanupAge is set, then messages older than the number of hours specified will be removed. Messages are removed one segment at a time, and the two latest segments are never removed. Set the segmentCleanAge to -1 to disable cleanup.

removetopic

Introduced in version 1.1

forest-admin -id <clusterID>] [-node node1[,...]] removetopic <topicName>

The removetopic subcommand allows the administrator to remove topics from the nodes. The topic will be shutdown gracefully and the configuration removed. The data can then be deleted from the file system.

If the data has not been deleted, the topic can be restored using the forest-admin topic command.

Clustering

Message Distribution

Each topic in Forest Bus elects a leader node from within the cluster. This leader node accepts messages from clients, writes them to disk and then sends them to all the other nodes in the cluster. When a follower node has recieved a message, it also writes the message to disk and then sends back an acknowledment to the leader node.

The RAFT algorithm depends on a majority of the nodes in a cluster acknowledging the receipt of a new message for the message to be deemed committed. This means that for a cluster of two nodes, both need to be up and running for writes to be committed. Reads of already committed messages can be supported by any number of nodes.

As a consquence of this, three nodes are recommended as the minimum set for clustering. This allows any one node to fail while the remaining two provide write service. Four nodes extends the number of replicas of data made, but doesn't increase write resilience as three nodes need to be working for a message to be committed. Five nodes allows for up to two failures while still providing write service, but at the cost of the leader node having to send messages to four other servers and needing two acknowledgments for it to be committed.

Restarts

When all nodes in a Forest Bus cluster is restarted it needs to determine which messages are committed across the nodes. If all of the nodes in the cluster are running and are up to date, the commit index will be determined automatically. If one or more of the nodes is down at startup the commit index can only be determined once the first message has been sent to each topic.

Monitoring

Forest Bus Monitor connects to a forest-bus-server's http interface and presents the internal state as a web page. The rest of this section will describe the key elements displayed and their meaning.

Server

The Server section on the right displays overall statistics for this server.

Peer Connection Status

This section shows the status of this forest-bus-server's connections to the other peers in the cluster. For each peer the forest-bus-server establishes a connection pool of 10 connections which are shared by all of the topics on the server. The table shows for each peer:

Connections Spare - How many connections were in the pool when the snapshot was taken. 10 is normal for a lightly loaded server, fewer are expected as the number of topics grows.
Broken Connections - How many connections are broken and are in the process of being retried. If most or all of the 10 connections are in this state then either the target peer is down, or the connectivity between them is broken.

Server Performance

The Current value shows the snapshot absolute figure and the TPS value is the transactions per second over the last 5s interval.

Total Messages - This is the total number of messages across all topics on this server. The TPS for this represents the total message through-put on this node.
Total Committed Messages - This is the total number of messages across all topics that are committed.
Commit Lag - This is the difference between the total messages the server has and the total committed, across all topics. Taken with the TPS for the total committed messages this gives a sense for the total latency for messages across the cluster.

Topics

Each topic on the server has it's own section that can expanded by clicking on the name of the topic. A number of tabs are shown depending on whether this server is currently the leader for a topic or a follower.

Info

High level information regaring the topic is shown:

Topic State - Whether this node is a leader or follower.
Election Term - If this is increasing it means that a series of elections is being held to determine a leader. If connectivity to other nodes is down this will continue to increase until it is restored.
Voted For - The name of the node that this server tried to elect as leader.
Previously Seen Leader - The name of the last leader seen by this server.
First Available Index - Starts at 1 and will increase if older messages have been cleaned up.

Leader Stats

If this server is the leader of this topic, additional stats are shown here.

Messages Received

This section shows how many messages are being received for this topic and how effectively the server is being able to batch them together for maximum through-put.

Batches Recieved - This is the number of batches of messages sent to this topic. If a client connects and sends one set of messages (whether one or more) it will be counted once against this measure.
Messages Recieved - This is the total number of messages being sent to this topic.
Aggregated Enqueues - This is the number of times the server has aggregated messages together and written them to disk. When multiple different clients are sending to the same topic, this reflects the effectiveness of the aggregation.
Average Batch Size - This shows the average of how many messages are recieved from clients in one batch over the last 5s. If each client is sending one message at a time this will be 1, otherwise it will reflect how effectively client side batching is being done.
Average Aggregation - This shows the average number of aggregated messages included in a batch that is written to disk.

Follower Status

This section shows how the follower nodes are performing from a leaders perspective.

Last Index - This is the last index that the leader knows the peer has written to disk.
TPS - The rate at which this peer is taking messages - should be close to the leaders TPS for Messages Recieved for followers that are keeping up.
Lag - The number of messages that this peer is behind in replicating messages that the leader has.

Performance

This section shows how well the server is performing for a given topic.

Head Cache

The Head Cache is a small cache of messages that have been recently appended to a topic. It is used to service internal requests for information about the latest messages, leaders sending recent messages to followers and clients requesting messages.

Total Cache Size - The number of messages in the cache. This grows and shrinks as the overall through-put on the topic changes.
Hits - The number of cache hits.
Misses - The number of cache misses. If both Hits and Misses are increasing then it means that one or more followers or clients are reading older messages at the same time as others are keeping up and reading the latest messages in the topic.

Topic Appends

Highest Index - The greatest index number appended to this topic.
Commit Index - The greated index known to be committed.
Commit Lag - How far behind the commit is from the greatest index known.

Storage

This section shows details of how the topic is being stored to disk.

Storage Info

Target Segment Size - This is the currently configured target size for segments to be capped at.
Segment Cleanup Age - If this is 0 then no cleanup has been configured. Otherwise it is the age after which messages are candidates to be deleted.
Total Segments - The number of segments in the topic. If this becomes large consider incresing the Target Segment Size and / or setting a Segment Cleanup age.

Segment Stats

Hover the mouse over each row to see the path to the segment file.

Open - Whether this segment is currently open by the server. Older segments are closed if they are not used for a while and then re-open on demand.
First Index - The first index stored in this segment.
Last Modified - The time that any messages were last written to this segment.
Get Messages - How many requests to get one or more messages from this segment have been made since it was opened. This is the count of the number of requests to the O/S for file data that have been made.
Append Messages - How many times have one or more messages been appended to this segment since it was opened.
Seek Count - The number of times that the location within the file has changed since it was opened. Large and rapidly increasing numbers indicate that gets are missing the head cache and hitting the segment.
Seek Reads - How many messages have had to be read to align to a known index boundary when serving a read request.

Sections

Software

The full list of my published Software