Scalability

How Forest Bus scales with load

Forest Bus is designed to enable scaling in a number of ways, starting with the behaviour of the client.

All of the client libraries provide an interface that takes an array or slice of messages to be sent to the cluster. By increasing the number of messages included in a single batch the client can significantly increase through-put of messages on a given topic. The amount of time taken to process one message is very similar to processing 100, so doubling the number sent in a particular batch can (almost) double through-put.

The server is also designed to aggregate messages coming from multiple clients into even larger batches that are written to disk. This also grows close to linearly at first, so doubling the number of clients from 100 to 200 results in a doubling of through-put.

To reach the highest levels of through-put on a topic the two batching technique combine - having a large number of messages sent at once by each client and a large number of clients results in a high number of messages per second being written.

Reading messages from a Froest Bus cluster can be done from any node. By spreading the connections randomly across the peers, multiple clients can spread the read load and avoid hot servers.

Finally for very busy topics it may be worth partitioning the topic. In this case multiple topics are created (e.g. test-1, test-2, test-3) and messages are distributed by the clients writing to them. As topic leadership election is randomised the leaders of these topics will be distributed across the cluster, spreading the load of leadership across nodes.

Future Development

There are a number of future feature developments that will also contribute to the scalability of Forest Bus:

Data Paths - being able to assign different topics to different paths to allocate topics across disks.
Non-voting peers - allowing new nodes to catch up without being counted in elections or in the commit index.

Sections

Software

The full list of my published Software