Configuring High Availability
This topic describes how to configure for Dremio high availability.
To configure for Dremio HA, ensure the prerequisites are met and then perform the following:
- Configure Dremio including Dremio services, external metadata storage, and external zookeeper quorum.
- Start up the Dremio coordinator nodes.
- Start up the Dremio executor nodes.
Prerequisites
- Network drive (NFS) with locking support for Dremio's metadata store.
- Ensure that the store is high-speed, low latency (for spilling operations purposes).
- Ensure that all Dremio coordinator nodes have read/write access to the shared network drive.
- Ensure that the guidelines of the shared network drive are followed for consistent synchronous writes.
- External Zookeeper
- (Optional) nginx for Linux Web Application HA & Load Balancing
Step 1: Configure Dremio
To configure Dremio, modify the dremio.conf file on all of the coordinator and executor nodes in the Dremio cluster.
The following must be configured:
- Dremio services, with
- Two (2) or more coordinator nodes with the master-coordinator role.
- One (1) or more nodes with the executor role (preferably 3 or more).
- External Metadata store - A network drive (NFS) with locking support specified with the
paths.local
property. This property must be set on all Dremio coordinator nodes. - External Zookeeper(s) - One or more external Zookeeper quorum specified with the
zookeeper
property. This property must be set on all Dremio nodes.
Example Coordinator Configuration
To enable and configure the coordinator node, specify the following properties:
services.coordinator.enabled
is set totrue
.services.coordinator.master.enabled
is set totrue
.services.coordinator.master.embedded-zookeeper.enabled
is set tofalse
.services.executor.enabled
is set tofalse
.zookeeper
points to the external ZooKeeper quorum.paths.local
points to an external metadata storage location.
paths: {
local: "/data/<custom_location>"
}
services: {
coordinator.enabled: true,
coordinator.master.enabled: true,
coordinator.master.embedded-zookeeper.enabled: false,
executor.enabled: false
}
zookeeper: "<host1>:2181,<host2>:2181"
Example Executor Configuration
To enable and configure the executor node, specify the following properties:
services.coordinator.enabled
is set tofalse
.services.coordinator.master.enabled
is set tofalse
.services.coordinator.master.embedded-zookeeper.enabled
is set tofalse
.services.executor.enabled
is set totrue
.zookeeper
points to the external Zookeeper quorum.
services: {
coordinator.enabled: false,
coordinator.master.enabled: false,
coordinator.master.embedded-zookeeper.enabled: false,
executor.enabled: true
}
zookeeper: "<host1>:2181,<host2>:2181"
Step 2: Start Up the Coordinator Nodes
To start the coordinator nodes, start Dremio on the first coordinator node, then on the second coordinator node, and so on. See Start, Stop, and Status for more information.
For example, if two (2) coordinator nodes (NodeA and NodeB) are configured where
NodeA is started first and NodeB is started second, then the following occurs:
- NodeA starts and is the active node.
- NodeB starts but remains waiting on standby until the active coordinator node disappears.
- NodeB is passive and not available until NodeA goes down.
- When NodeA goes down, NodeB completes the startup process and the other cluster nodes switch their coordinator node interaction from NodeA to NodeB.
Step 3: Start Up the Executor Nodes
To start the executor nodes, start Dremio on each executor node in any order. See Start, Stop, and Status for more information.
Web Application HA & Load Balancing
Dremio's web application can be made highly available by leveraging more than one coordinator node and a reverse proxy/load balancer.
All web clients connect to a the load balancer rather than directly connecting to an individual coordinator node. The load balancer distributes these connections to the current active master node.
Example: nginx.conf file
In the following sample configuration, all Dremio coordinator nodes are added as upstream
servers and
are proxied through the nginx
server.
user nginx;
worker_processes 1;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
#tcp_nopush on;
keepalive_timeout 65;
#gzip on;
include /etc/nginx/conf.d/*.conf;
upstream myapp1 {
server dremio_coordinator_1:9047;
server dremio_coordinator_2:9047;
server dremio_coordinator_3:9047;
}
server {
listen 80;
location / {
proxy_pass http://myapp1;
}
}
}
Troubleshooting
If HA fails when the network is brought down on the running master node, there may be an issue with the mount.
For data consistency, your NFS should be mounted as a hard mount. For example:mount -t nfs -o rw,hard,sync,nointr,vers=4,proto=tcp <server>:<share> <mount path>