Configuration Overview

Dremio recommends a minimum of one coordinator node along with multiple executor nodes. One of the coordinator nodes will also act as the master node to perform metadata operations.

See the following configuration files for a consolidated references of available configuration options:

Configuring via dremio.conf

The dremio.conf file is the main configuration file for cluster deployment. Users can specify various options related to node roles, metadata storage, distributed cache storage and more. By default, dremio.conf is located under the /etc/dremio/ directory, however, if the file is absent from the configuration directory, Dremio uses a default configuration and starts up.

[info] The dremio.conf file uses the HOCON syntax.

Default dremio.conf for RPM-based installations

paths: {
  # the local path for dremio to store data.
  local: ${DREMIO_HOME}"/data"

  # the distributed path Dremio data including job results, downloads, uploads, etc
  #dist: "pdfs://"${paths.local}"/pdfs"
}

services: {
  coordinator.enabled: true,
  coordinator.master.enabled: true,
  executor.enabled: true
}

Zookeeper

The Zookeeper property with the hostname and port must be added to the dremio.conf file on all nodes in the Dremio cluster. This is particularly important when Zookeeper is on an external node. Default port: 2181

[info] Be sure that there are no spaces between the commas when adding multiple Zookeeper nodes.

zookeeper: "<host1>:2181,<host2>:2181"

[info]

The zookeeper host is the hostname (or IP address) where Zookeeper is located. See ZooKeeper for more information.

  • If Zookeeper is an embedded Zookeeper on the master node, then the Zookeeper hostname is the hostname of the master node.
    zookeeper: "<master-coordinator-host1>:2181,<master-coordinator-host2>:2181"
  • If Zookeeper is on an external node(s), then it is the hostname of the node(s) where it is located.
    zookeeper: "<zookeeper-host1>:2181,<zookeeper-host2>:2181"

Dremio Services

The Dremio services property determines whether the node is enabled as a master-coordinator, coordinator, or executer.

Example master-coordinator node configuration

The following is a sample node role configuration in dremio.conf for a master node.

services: {
  coordinator.enabled: true,
  coordinator.master.enabled: true,
  executor.enabled: false
}
zookeeper: "<host1>:2181,<host2>:2181"

[info] High Availability

Example coordinator node configuration

The following is a sample node role configuration in dremio.conf for non-master coordinator nodes.

services: {
  coordinator.enabled: true,
  coordinator.master.enabled: false,
  executor.enabled: false
}

zookeeper: "<host1>:2181,<host2>:2181"

Example executor node configuration

The following is a sample node role configuration in dremio.conf file for executor nodes.

services: {
  coordinator.enabled: false,
  coordinator.master.enabled: false,
  executor.enabled: true
}
zookeeper: "<host1>:2181,<host2>:2181"

Metadata Storage

The paths.local property in the dremio.conf file specifies the directory where Dremio holds metadata about users, spaces and datasets. The default location is the ${DREMIO_HOME}"/data directory.

[info] Metadata location

  • The metadata location must be located on local high-speed, low latency storage for spilling operations purposes.
  • For an RPM installation, the default metadata storage location is create for you at /var/lib/dremio, however, you can change this location by setting up a custom location.
  • For a Tarball installation, manually create the directory and add the location to the dremio.conf file. It is recommended that you use the default: /data/dremio

To setup a custom metadata storage location:

  1. Create your custom directory if it doesn't exist, for example: /data/customDremio
     sudo mkdir /data/customDremio && sudo chown dremio:dremio /data/customDremio
    
  2. Add the new location to the dremio.conf file in the local field under paths. This is done in the dremio.conf file on the master nodes(s) only.
     paths: {
       local: "/data/customDremio"
       }
    

Distributed Storage

The paths.dist property in the dremio.conf file specifies the cache location where Dremio holds accelerator, CREATE TABLE AS tables, job result, download, and upload data. If this property is updated, then it must be updated in the dremio.conf file on all nodes.

See Distributed Storage for all available options.

By default Dremio uses the disk space on local Dremio nodes. You can indicate a different store such as HDFS, NAS, S3 or ADLS:

paths: {
  ...
  dist: "/path"
}

Configuring via dremio-env

The /etc/dremio/dremio-env file is the configuration file for setting Java options and log directories. When this configuration file is changed from the default, it must be upated on all nodes of interest.

Setting the maximum memory size (recommended) allows Dremio to automatically determine the best allocation between HEAP and DIRECT memory depending on the node type. Alternatively, you can set HEAP and DIRECT memory instead of maximum memory.

Modify this line to change the default maximum memory use:

DREMIO_MAX_MEMORY_SIZE_MB=16384

Heap and Direct Memory

The maximum HEAP and DIRECT memory can be set separately.

  • If both are specified, then the maximum memory option will be ignored.
  • If only one is configured, Dremio automatically determines the other, based on left over memory.

Heap memory

Heap memory is used for running Dremio server.

Modify the following line to change the default maximum memory use.

DREMIO_MAX_HEAP_MEMORY_SIZE_MB=8192

Direct memory

Direct memory is used for query execution.

Modify the following line to change the default maximum memory use.

DREMIO_MAX_DIRECT_MEMORY_SIZE_MB=16384

If you see queries failing on an instance that's running out of memory, increasing the amount of memory that Dremio is able to consume may solve your problem.

[warning] Warning: For the DREMIO_MAX_DIRECT_MEMORY_SIZE_MB allocation, be sure to leave at least 1-2 GB of memory for the OS.

Logs and PID

If you customize where the logs and PID information are written, the log and PID directories must be first created. For example:

DREMIO_LOG_DIR=/var/log/dremio
DREMIO_PID_DIR=/var/run/dremio

Starting Dremio

You can start the Dremio daemon with the command below. This assumes Dremio is configured as a service.

$ sudo service dremio start

Completing Setup

Open a browser and navigate to http://<COORDINATOR_NODE>:9047.

  1. Create your first Admin user (the Dremio UI walks you through this process).
  2. Click the Admin button (at the top-right of the page) to confirm that each of the Dremio nodes that you set up during the install are functioning properly.

    [info] Each node's hostname or IP address should be listed, along with a green status light.


results matching ""

    No results matching ""