Configuring Standalone Clusters

This topic covers basic information for configuring standalone clusters.

To configure a standalone cluster, you need to determine your environment requirements such as the following factors.

  • Configurable via dremio.conf:
    • Number and type of Dremio services
    • Metadata storage location
    • Distributed store location
    • Zookeeper location
  • Configurable via dremio-env:
    • Memory size
    • Log and PID locations

Dremio recommends a minimum of one coordinator node (configured with the master-coordinator role) along with multiple executor nodes.

See the following configuration files for a consolidated references of available configuration options:

dremio.conf

The dremio.conf file is the main configuration file for cluster deployment. Users can specify various options related to node roles, metadata storage, distributed cache storage and more. By default, dremio.conf is located under the /etc/dremio/ directory, however, if the file is absent from the configuration directory, Dremio uses a default configuration and starts up.

[info] The dremio.conf file uses the HOCON syntax.

Dremio Services

The Dremio services properties determine whether the node is enabled with the master-coordinator or executor role.

[info] All Dremio clusters must have the following configured:

  • One or more coordinator nodes. See High Availability for a multiple coordinator node environment.
  • One or more executor nodes.

In a cluster environment (not a single node install), a node can only have a single role: as either a coordinator (with the master-coordinator role enabled) or an executor. In addition, a coordinator-only role is not supported.

Coordinator Node Configuration

The Dremio service configuration for a coordinator node is the following:

services: {
  coordinator.enabled: true,
  coordinator.master.enabled: true,
  executor.enabled: false
}

[info] High Availability

Executor Node Configuration

The Dremio service configuration for an executor node is the following:

services: {
  coordinator.enabled: false,
  coordinator.master.enabled: false,
  executor.enabled: true
}

Metadata Storage

To customize metadata (users, spaces and datasets information) storage configuration, modify the paths.local property. The default location is the ${DREMIO_HOME}"/data directory.

[info] Metadata location

  • The metadata location must be located on local high-speed, low latency storage for spilling operations purposes.
  • For an RPM installation, the default metadata storage location is create for you at /var/lib/dremio, however, you can change this location by setting up a custom location.
  • For a Tarball installation, manually create the directory and add the location to the dremio.conf file. It is recommended that you use the default: /data/dremio

To setup a custom metadata storage location:

  1. Create your custom directory if it doesn't exist, for example: /data/customDremio
     sudo mkdir /data/customDremio && sudo chown dremio:dremio /data/customDremio
    
  2. Add the new location to the dremio.conf file in the local field under paths. This is done in the dremio.conf file on the coordinator nodes(s) only.
     paths: {
       local: "/data/customDremio"
       }
    

Distributed Store

To configure distributed storage, modify the paths.dist property in the dremio.conf file on all of the nodes in your Dremio cluster. This paths.dist property must be the same across all nodes. This means that if local storage or NAS is used, the configured path must exist on, or be accessible from, all nodes. If the value of this property is changed, then it must be updated in the dremio.conf file on all nodes.

By default Dremio uses the disk space on local Dremio nodes. You can indicate a different store such as HDFS, NAS, S3 or ADLS:

paths: {
  ...
  dist: "/path"
}

See Configuring Distributed Storage for all available options.

Zookeeper

To configure Zeekeeper, specify the zookeeper property with the hostname and port in the dremio.conf file on all of the nodes in your Dremio cluster. Adding the Zookeeper configuration to the dremio.conf file on each node is particularly important when Zookeeper is on an external node. Default port: 2181

The following property shows the syntax for specifying Zookeeper where the zookeeper host is the hostname (or ID address) where Zookeeper is located.

[info] Be sure that there are no spaces between the commas when adding multiple Zookeeper nodes.

zookeeper: "<host1>:2181,<host2>:2181"
  • If Zookeeper is an embedded Zookeeper on the coordinator node, then the Zookeeper hostname is the hostname of the coordinator node.
    zookeeper: "<master-coordinator-host1>:2181,<master-coordinator-host2>:2181"
  • If Zookeeper is on an external node(s), then it is the hostname of the node(s) where it is located.
    zookeeper: "<zookeeper-host1>:2181,<zookeeper-host2>:2181"

Default dremio.conf for RPM-installs

The default RPM configuration assumes a single node installation. This default must be modified for cluster deployments. See Standalone Quickstart for single node installations.

paths: {
  # the local path for dremio to store data.
  local: ${DREMIO_HOME}"/data"

  # the distributed path Dremio data including job results, downloads, uploads, etc
  #dist: "pdfs://"${paths.local}"/pdfs"
}

services: {
  coordinator.enabled: true,
  coordinator.master.enabled: true,
  executor.enabled: true
}

dremio-env

The /etc/dremio/dremio-env file is the configuration file for specifying memory, Java options, and log directories. When this configuration file is changed from the default, it must be upated on all nodes of interest.

Memory

Setting the maximum memory size (recommended) allows Dremio to automatically determine the best allocation between HEAP and DIRECT memory depending on the node type.

[info] Note
If you see queries failing on an instance that's running out of memory, increasing the amount of memory that Dremio is able to consume may solve your problem.

To modify the default maximum memory, change the following property:

DREMIO_MAX_MEMORY_SIZE_MB=16384

To modify the default maximum heap memory, change the following property:

DREMIO_MAX_HEAP_MEMORY_SIZE_MB=8192

To modify the default maximum direct memory, change the following property:

DREMIO_MAX_DIRECT_MEMORY_SIZE_MB=16384

[warning] Warning

For the DREMIO_MAX_DIRECT_MEMORY_SIZE_MB allocation, be sure to leave at least 1-2 GB of memory for the OS.

Logs and PID

To customize where the logs and PID information are written:

  1. Create new Log and PID directories. For example: /var/log/dremio and /var/run/dremio.
  2. Uncomment the Log and PID variables and provide the new location. For example:
    DREMIO_LOG_DIR=/var/log/dremio
    DREMIO_PID_DIR=/var/run/dremio
    

Starting Dremio

To start Dremio on each node in your cluster depending on your installation (RPM vs Tarball). This assumes Dremio is configured as a service. See Start, Stop, and Status for more information.

Tarball: $ sudo <DREMIO-HOME>/bin/dremio start

RPM: $ sudo service dremio start

Completing Setup

Open a browser and navigate to http://<COORDINATOR_NODE>:9047.

  1. Create your first Admin user (the Dremio UI walks you through this process).
  2. Click the Admin button (at the top-right of the page) to confirm that each of the Dremio nodes that you set up during the install are functioning properly.

    [info] Each node's hostname or IP address should be listed, along with a green status light.

For More Information


results matching ""

    No results matching ""