Ignite Always Failover SPI Test
Apache Ignite comes with high degree of fault tolerance and supports automatic job failover. In case of node crash or job failure on a given node, jobs are automatically transferred to other available nodes for re-execution. The Always Failover SPI (Super interface) ensures that when a job from a compute task fails, an attempt is made to reroute the failed job to a node that has not executed any other job from the same task. If no such node is available, then an attempt is made to reroute the failed job to one of the nodes that may be running other jobs from the same task. If none of the above attempts succeeds, then the job is not failed over.
Always failover SPI is responsible for automatic failover and needs to be monitored to make sure it is working as expected.
This test monitors the Always Failover SPI to ensure that jobs are rerouted to other nodes in case of failover. In case jobs are failing but no jobs are being failed over, administrators may need to figure out if something is wrong with SPI.
Target of the test : Apache Ignite Server
Agent deploying the test : An internal or external agent
Outputs of the test : One set of results for each Apache Ignite Server
Parameter |
Description |
---|---|
Test period |
How often should the test be executed. |
Host |
Enter the IP address of the Apache Ignite cluster. |
Port |
Enter the port number on which JMX connector listens to incoming connections requests. |
JMX Remote Port |
In this text box, enter the name of a virtual warehouse that needs to be monitored. The JMX connector listens on 8686 by default. If it listens on different port in your environment then specify the same. |
JMX User |
Specify the credentials of the user who is authorized to use JMX. |
JMX Password |
Specify the password for the authorized user. |
Confirm Password |
Confirm the password by retyping it here. |
Measurement |
Description |
Measurement Unit |
Interpretation |
---|---|---|---|
Total failover jobs |
Indicates the total number of jobs failed over to other nodes apart from the node where they were originally executed. |
Number |
In case jobs are failing but no jobs are being failed over, administrators may need to figure out if something is wrong with SPI. |