This topic describes how high availability works in Dremio clusters.
Dremio clusters can be made highly available by configuring one active and multiple backup coordinator nodes (configured with the master-coordinator role) as standbys.
Kubernete Deployments
HA implementation for Kubernetes deployment is different. Where as, high availability is dependent on the Kubernetes infrastructure. See Azure AKS and Amazon EKS for AKS/EKS deployment information.
When the active coordinator node fails:
Note: When there is a failure, Dremio processes are responsible for killing themselves.
Two (2) coordinator nodes (NodeA and NodeB) are configured (with master-coordinator roles) and started.
After HA failover is complete:
To see whether a coordinator node is active or not, use the GET /server_status REST API endpoint or, alternatively, ping the node.
Dremio’s web application can be made highly available by leveraging more than one coordinator node and a reverse proxy/load balancer.
All web clients connect to a single endpoint rather than directly connecting to an individual coordinator node. These connections are then distributed across available coordinator nodes.
Dremio recommends that ODBC and JDBC drivers connect to a ZooKeeper quorum rather than a specific node in your deployment. Dremio then plans queries and routes them to an available coordinator node.
Tip: To distribute query planning for ODBC and JDBC connections, configure secondary coordinator nodes for your deployment.
Dremio requires that deployments configured for high availability use network-attached storage with locking support, high speed, and low latency for the metadata store. Dremio recommends a minimum cumulative read/write throughput for NAS of 30 MB/s. However, the requirements of your query workload and metadata refresh policies may require greater throughput. See I/O Performance for more information about the required baseline throughput for the metadata store for specific workloads.
Tip
Dremio recommends SSD drives rather than disk drives for lower latency performing random database reads.
Use the following mount
command options when mounting your NAS:
$ mount -t nfs -o rw,hard,sync,nointr,vers=4,proto=tcp <server>:<share> <mount path>
4
and 3
for the vers
parameter.