This topic describes how high availability works in Dremio clusters.
Dremio clusters can be made highly available by configuring one active and multiple backup coordinator nodes (configured with the master-coordinator role) as standbys.
- The HA implementation supports automatic recovery. There's no guarantee of no visible user interruption, or no query failure.
- The HA model is a hot/cold model, that is, one node acts as master and a secondary node is on standby until the current master disappear.
- Coordination and election is done through Zookeeper. When a master fails, its entry disappears from Zookeeper when a session is closed or fails. At that point, one of the standby nodes is elected and becomes the new master.
- The metadata store (kvstore) is not distributed. It must be located on a shared volume visible from all master candidates.
- A shared network drive is used to ensure that all nodes can access system metadata. The locking support on the network drive as well as on Dremio's metadata store ensures there is only one active Dremio coordinator process.
[info] Kubernete Deployments
How HA Failover Works
When the active coordinator node fails:
- Dremio processes detect the failure, based on a set ZooKeeper timeout, and elect a new Dremio coordinator node.
- The new coordinator node, already on standby, completes the startup using the metadata on the network drive.
- The other cluster nodes then re-connect to the new coordinator node.
[info] Note: When there is a failure, Dremio processes are responsible for killing themselves.
Example: HA Failover
Two (2) coordinator nodes (NodeA and NodeB) are configured (with master-coordinator roles) and started.
- NodeA starts and NodeB remains waiting on standby until the current master disappears.
- NodeB is passive and not available until NodeA goes down.
- When NodeA goes down, NodeB completes the startup process and the other cluster nodes switch their master-coordinator node interaction from NodeA to NodeB.
After HA failover is complete:
- You need to restart queries that were being processed at the time of the failure. This is because the Dremio cluster can't execute new queries until the other cluster nodes are re-connected to the new coordinator node.
- You need to manually restart the failed coordinator nodes (after ensuring that it is usable). In this case, when it is restarted, it is brought back as a standby.
[info] To see whether a coordinator node is active or not, use the GET /server_status REST API endpoint or, alternatively, ping the node.
Web Application HA & Load Balancing
Dremio's web application can be made highly available by leveraging more than one coordinator node and a reverse proxy/load balancer.
All web clients connect to a single endpoint rather than directly connecting to an individual coordinator node. These connections are then distributed across available coordinator nodes.
ODBC/JDBC HA & Load Balancing
Dremio recommends connecting to the ZooKeeper quorum instead of a direct connection to a specific node when using ODBC and JDBC. The query is then routed to and planned by one of the available coordinator nodes.
Distributed File Systems
Dremio requires a distributed file system where the NFS server supports file range locking through the
fcntl POSIX API.
This usually requires a NFSv4 server, or a NFS server that supports the NLM protocol (NFS v3).