Configuration Overview

Dremio recommends a minimum of one coordinator node along with multiple executor nodes. One of the coordinator nodes will also act as the master node to perform metadata operations.

See the following configuration files for a consolidated references of available configuration options:

Configuring dremio.conf for cluster deployments

The dremio.conf file is the main configuration file for Dremio nodes. Users can specify various options related to node roles, metadata storage, distributed cache storage and more.

By default, this file is located under the /etc/dremio/ directory.

Default dremio.conf for RPM-based installations

paths: {
  # the local path for dremio to store data.
  local: ${DREMIO_HOME}"/data"

  # the distributed path Dremio data including job results, downloads, uploads, etc
  #dist: "pdfs://"${paths.local}"/pdfs"
}

services: {
  coordinator.enabled: true,
  coordinator.master.enabled: true,
  executor.enabled: true
}

Zookeeper

The Zookeeper property with the hostname and port must be added to the dremio.conf file on all nodes in the Dremio cluster. This is particularly important when Zookeeper is on an external node. Default port: 2181

zookeeper: "<host1>:2181,<host2>:2181"

[info]

The zookeeper host is the hostname (or ID address) where Zookeeper is located. See ZooKeeper for more information.

  • If Zookeeper is an embedded Zookeeper on the master node, then the Zookeeper hostname is the hostname of the master node.
    zookeeper: "<master-coordinator-host1>:2181,<master-coordinator-host2>:2181"
  • If Zookeeper is on an external node(s), then it is the hostname of the node(s) where it is located.
    zookeeper: "<zookeeper-host1>:2181,<zookeeper-host2>:2181"

Master node

Sample node role configuration in dremio.conf for a master node.

services: {
  coordinator.enabled: true,
  coordinator.master.enabled: true,
  executor.enabled: false
}
zookeeper: "<host1>:2181,<host2>:2181"

[info] High Availability

Coordinator nodes

Sample node role configuration in dremio.conf for non-master coordinator nodes.

services: {
  coordinator.enabled: true,
  coordinator.master.enabled: false,
  executor.enabled: false
}

zookeeper: "<host1>:2181,<host2>:2181"

Executor nodes

Sample node role configuration in dremio.conf file for executor nodes.

services: {
  coordinator.enabled: false,
  coordinator.master.enabled: false,
  executor.enabled: true
}
zookeeper: "<host1>:2181,<host2>:2181"

Metadata storage

The paths.local property in the dremio.conf file specifies the directory where Dremio holds metadata about users, spaces and datasets. The default location is the ${DREMIO_HOME}"/data directory.

[info] Metadata location

If you use RPM to install, the default metadata storage location is create for you. If you use a Tarball to install, you have to manually create the directory and add the location to the dremio.conf file. It is recommended that you use the default: /data/dremio

To setup a custom metadata storage location:

  1. Create your custom directory if it doesn't exist, for example: /data/customDremio
     sudo mkdir /data/customDremio && sudo chown dremio:dremio /data/customDremio
    
  2. Add the new location to the dremio.conf file in the local field under paths. This is done in the dremio.conf file on the master nodes(s) only.
     paths: {
       local: "/data/customDremio"
       }
    

Distributed storage

The paths.dist property in the dremio.conf file specifies the cache location where Dremio holds accelerator, CREATE TABLE AS tables, job result, download and upload data. This option needs to be update in dremio.conf for all nodes. See Distributed Storage for all available options.

By default Dremio uses the disk space on local Dremio nodes. You can indicate a different store such as HDFS, NAS, S3 or ADLS:

paths: {
  ...
  dist: "/path"
}

Environment Setup

Optionally, we can edit the /etc/dremio/dremio-env config file, which is for setting Java options and log directories. This configuration file needs to be changed on all nodes of interest.

Maximum memory
Setting total memory allows Dremio to automatically determine the best allocation between HEAP and DIRECT memory depending on the node type.

Modify this line to change the default maximum memory use

DREMIO_MAX_MEMORY_SIZE_MB=16384

Optional memory setup

Alternatively, HEAP and DIRECT can be set separately. If both are specified, then the maximum memory option will be ignored. If only one is configured, Dremio will automatically determine the other, based on memory left over.

Heap memory
Heap memory is used for running Dremio server.

Modify the following line to change the default maximum memory use.

DREMIO_MAX_HEAP_MEMORY_SIZE_MB=8192

Direct memory
Direct memory is used for query execution.

Modify the following line to change the default maximum memory use.

DREMIO_MAX_DIRECT_MEMORY_SIZE_MB=16384

If you see queries failing on an instance that's running out of memory, increasing the amount of memory that Dremio is able to consume may solve your problem.

Other environment setup

Logging and pid directories
These directories must be first created:

DREMIO_LOG_DIR=/var/log/dremio
DREMIO_PID_DIR=/var/run/dremio

Starting Dremio

You can start the Dremio daemon with the command below. This assumes Dremio is configured as a service.

$ sudo service dremio start

Completing Setup

Open a browser and navigate to http://<COORDINATOR_NODE>:9047.

  1. Create your first Admin user (the Dremio UI walks you through this process).
  2. Click the Admin button (at the top-right of the page) to confirm that each of the Dremio nodes that you set up during the install are functioning properly.

    [info] Each node's hostname or IP address should be listed, along with a green status light.


results matching ""

    No results matching ""