This topic describes how high availability works in Dremio clusters.
Dremio clusters can be made highly available by configuring one active and multiple backup coordinator nodes (configured with the master-coordinator role) as standbys.
- The HA implementation supports automatic recovery. There’s no guarantee of no visible user interruption, or no query failure.
- The HA model is a hot/cold model, that is, one node acts as master and a secondary node is on standby until the current master disappear.
- Coordination and election is done through Zookeeper. When a master fails, its entry disappears from Zookeeper when a session is closed or fails. At that point, one of the standby nodes is elected and becomes the new master.
- The metadata store (kvstore) is not distributed. It must be located on a shared volume visible from all master candidates.
- A shared network drive is used to ensure that all nodes can access system metadata. The locking support on the network drive as well as on Dremio’s metadata store ensures there is only one active Dremio coordinator process.
How HA Failover Works
When the active coordinator node fails:
- Dremio processes detect the failure, based on a set ZooKeeper timeout, and elect a new Dremio coordinator node.
- The new coordinator node, already on standby, completes the startup using the metadata on the network drive.
- The other cluster nodes then re-connect to the new coordinator node.
In the case of a failure, Dremio processes should terminate themselves automatically.
Example: HA Failover
Two (2) coordinator nodes (NodeA and NodeB) are configured (with master-coordinator roles) and started.
- NodeA starts and NodeB remains waiting on standby until the current master disappears.
- NodeB is passive and not available until NodeA goes down.
- When NodeA goes down, NodeB completes the startup process and the other cluster nodes switch their master-coordinator node interaction from NodeA to NodeB.
After HA failover is complete:
- You need to restart queries that were being processed at the time of the failure. This is because the Dremio cluster can’t execute new queries until the other cluster nodes are re-connected to the new coordinator node.
- You need to manually restart the failed coordinator nodes (after ensuring that it is usable). In this case, when it is restarted, it is brought back as a standby.
To see whether a coordinator node is active or not, use the GET /server_status REST API endpoint or, alternatively, ping the node.
Web Application HA & Load Balancing
Dremio’s web application can be made highly available by leveraging more than one coordinator node and a reverse proxy/load balancer.
All web clients connect to a single endpoint rather than directly connecting to an individual coordinator node. These connections are then distributed across available coordinator nodes.
ODBC/JDBC HA & Load Balancing
Dremio recommends that ODBC and JDBC drivers connect to a ZooKeeper quorum rather than a specific node in your deployment. Dremio then plans queries and routes them to an available coordinator node.
To distribute query planning for ODBC and JDBC connections, configure secondary coordinator nodes for your deployment.
Distributed File System (NAS) Requirements and Recommendations
Dremio requires that deployments configured for high availability use network-attached storage with locking support, high speed, and low latency for the metadata store. Dremio recommends a minimum cumulative read/write throughput for NAS of 30 MB/s. However, the requirements of your query workload and metadata refresh policies may require greater throughput. See I/O Performance for more information about the required baseline throughput for the metadata store for specific workloads.
Dremio recommends SSD drives rather than disk drives for lower latency performing random database reads.
Use the following
mount command options when mounting your NAS:
$ mount -t nfs -o rw,hard,sync,nointr,vers=4,proto=tcp <server>:<share> <mount path>
- Dremio supports both
- Dremio does not support NFS 2 or MapR NFS.