Configuring Standalone Clusters
This topic covers basic information for configuring standalone clusters.
To configure a standalone cluster, you need to determine your environment requirements such as the following factors.
- Configurable via dremio.conf:
- Number and type of Dremio services
- Metadata storage location
- Distributed store location
- Zookeeper location
- Configurable via dremio-env:
- Memory size
- Log and PID locations
Dremio recommends a minimum of one coordinator node (configured with the master-coordinator role) along with multiple executor nodes.
See the following configuration files for a consolidated references of available configuration options:
dremio.conf
The dremio.conf file is the main configuration file for cluster deployment. Users can specify various options related to node roles, metadata storage, distributed cache storage and more. By default, dremio.conf is located under the /etc/dremio/ directory, however, if the file is absent from the configuration directory, Dremio uses a default configuration and starts up.
The dremio.conf file uses the HOCON syntax.
Dremio Services
The Dremio services properties determine whether the node is enabled with the master-coordinator or executor role. All Dremio clusters must have at least one coordinator node and one executor node; each node can have only one role: coordinator or executor.
In a Dremio cluster environment, a node can only have a single role: as either a coordinator (with the master-coordinator role enabled) or an executor. In addition, a coordinator-only role is not supported.
Coordinator Node Configuration
Each coordinator node must be configured as a master-coordinator.
- Coordinator-only roles are not supported.
- Multiple coordinator nodes are used only with HA. See High Availability for more information. Running multiple coordinators does not reduce planning time or increase processing speed.
The Dremio service configuration for a coordinator node is the following:
services: {
coordinator.enabled: true,
coordinator.master.enabled: true,
executor.enabled: false
}
High Availability:
- See Configuring High Availability for more information.
- See Configuring Zookeeper HA examples for more information.
Executor Node Configuration
The Dremio service configuration for an executor node is the following:
services: {
coordinator.enabled: false,
coordinator.master.enabled: false,
executor.enabled: true
}
Metadata Storage
To customize metadata (users, spaces and datasets information) storage configuration, modify the paths.local property. The default location is the ${DREMIO_HOME}"/data directory.
Metadata location:
- The metadata location must be located on local high-speed, low latency storage for spilling operations purposes.
- For an RPM installation, the default metadata storage location is create for you at /var/lib/dremio, however, you can change this location by setting up a custom location.
- For a Tarball installation, manually create the directory and add the location to the dremio.conf file. It is recommended that you use the default: /data/dremio
To setup a custom metadata storage location:
Create your custom directory if it doesn't exist, for example: /data/customDremio
sudo mkdir /data/customDremio && sudo chown dremio:dremio /data/customDremio
Add the new location to the dremio.conf file in the
local
field underpaths
. This is done in the dremio.conf file on the coordinator nodes(s) only.paths: {
local: "/data/customDremio"
}
Distributed Store
To configure distributed storage, modify the paths.dist
property in the dremio.conf file on all of the nodes in
your Dremio cluster.
This paths.dist
property must be the same across all nodes.
This means that if local storage or NAS is used, the configured path must exist on, or be accessible from, all nodes.
If the value of this property is changed, then it must be updated in the dremio.conf file on all nodes.
By default Dremio uses the disk space on local Dremio nodes. You can indicate a different store such as HDFS, NAS, S3 or ADLS:
paths: {
...
dist: "/path"
}
See Configuring Distributed Storage for all available options.
Zookeeper
To configure Zeekeeper, specify the zookeeper
property with the hostname and port
in the dremio.conf file on all of the nodes in
your Dremio cluster.
Adding the Zookeeper configuration to the dremio.conf file on each node is particularly important
when Zookeeper is on an external node.
Default port: 2181
The following property shows the syntax for specifying Zookeeper where
the zookeeper
host is the hostname (or ID address) where Zookeeper is located.
Be sure that there are no spaces between the commas when adding multiple Zookeeper nodes.
zookeeper: "<host1>:2181,<host2>:2181"
- If Zookeeper is an embedded Zookeeper on the coordinator node,
then the Zookeeper hostname is the hostname of the coordinator node.
zookeeper: "<master-coordinator-host1>:2181,<master-coordinator-host2>:2181"
- If Zookeeper is on an external node(s), then it is the hostname of the node(s) where it is located.
zookeeper: "<zookeeper-host1>:2181,<zookeeper-host2>:2181"
Default dremio.conf for RPM-installs
The default RPM configuration assumes a single node installation. This default must be modified for cluster deployments. See Standalone Quickstart for single node installations.
paths: {
# the local path for dremio to store data.
local: ${DREMIO_HOME}"/data"
# the distributed path Dremio data including job results, downloads, uploads, etc
#dist: "pdfs://"${paths.local}"/pdfs"
}
services: {
coordinator.enabled: true,
coordinator.master.enabled: true,
executor.enabled: true
}
dremio-env
The /etc/dremio/dremio-env file is the configuration file for specifying memory, Java options, and log directories. When this configuration file is changed from the default, it must be upated on all nodes of interest.
Memory
Setting the maximum memory size (recommended) allows Dremio to automatically determine the best allocation between HEAP and DIRECT memory depending on the node type.
If you see queries failing on an instance that's running out of memory, increasing the amount of memory that Dremio is able to consume may solve your problem.
To modify the default maximum memory, change the following property:
DREMIO_MAX_MEMORY_SIZE_MB=16384
To modify the default maximum heap memory, change the following property:
DREMIO_MAX_HEAP_MEMORY_SIZE_MB=8192
To modify the default maximum direct memory, change the following property:
DREMIO_MAX_DIRECT_MEMORY_SIZE_MB=16384
For the DREMIO_MAX_DIRECT_MEMORY_SIZE_MB
allocation, be sure to leave at least 1-2 GB of memory for the OS.
Logs and PID
To customize where the logs and PID information are written:
Create new Log and PID directories. For example: /var/log/dremio and /var/run/dremio.
Uncomment the Log and PID variables and provide the new location. For example:
DREMIO_LOG_DIR=/var/log/dremio
DREMIO_PID_DIR=/var/run/dremio
Starting Dremio
To start Dremio on each node in your cluster depending on your installation (RPM vs Tarball). This assumes Dremio is configured as a service. See Start, Stop, and Status for more information.
Tarball:
$ sudo <DREMIO-HOME>/bin/dremio start
RPM:
$ sudo service dremio start
Completing Setup
Open a browser and navigate to http://<COORDINATOR_NODE>:9047
.
- Create your first Admin user (the Dremio UI walks you through this process).
- Click the Admin button (at the top-right of the page) to confirm that each of the Dremio nodes that you set up during the install are functioning properly.
Each node's hostname or IP address should be listed, along with a green status light.
For More Information
- Configuring via dremio.conf
- Configuring via dremio-env
- Configuring Dremio Services
- Configuring Metadata Storage
- Configuring Distributed Storage
- Configuring ZooKeeper
- Configuring High Availability
- Configuring Wire Encryption
- Configuring Cloud Cache
- Configuring Single Sign On
- Configuring Memory
- Configuring Log and PID Locations