What is Solace Cluster?

PubSub+ event broker services can be deployed in High-Availability (HA) redundancy groups. HA redundancy provides 1:1 event broker sparing to provide fault tolerance and increase overall service availability. If one of the event brokers fails or is taken out of service, the other event broker automatically takes over and provides service to the clients that were previously served by the now-out-of-service event broker. There is a brief interruption of less than one minute that occurs during the HA activity failover. In comparison, outages of 15-30 minutes occur for Developer and standalone event broker services because they do not have HA redundancy.

PubSub+ implements HA using an Active/Standby model with an arbiter node (Monitoring Node) for split-brain detection. This requires three nodes each running the event broker:

Primary node
Backup node
Monitoring node

Figure 1 : Solace Cluster architecture

The primary and backup nodes both run the software event broker under the messaging node role, while the monitoring node runs it under the monitoring node role. Each of their respective roles is fixed by the configuration and never changes. The HA group is fronted by a network load balancer that routes traffic to-and-from the active node in the HA group (either the primary or backup).

When in operation, the messaging nodes will assume one of these Active/Standby roles: Primary or Backup. At any one time, one node is the primary and the other is the backup.

With this model, a primary event broker provides messaging services to clients, while a backup event broker waits in standby mode—it only provides service should the primary event broker fail. A third event broker acts as a monitoring node, to act as a tie-breaker and prevent split-brain scenarios that would otherwise cause both the primary and backup event broker to become active simultaneously.

Upon a failover, connections to the broker are switched over from the Primary to the Backup node automatically.

Subsequently, a failover occurs in the following sequence:

The backup event broker takes over messaging activity.
Once the failed primary event broker comes back on-line, it resynchronizes to match the currently active backup event broker.
The primary event broker takes on the “Standby” role, or, if auto-revert is enabled, messaging activity automatically switches back to the primary event broker.