Skip to main content
Version: current [25.0.x]

Multiple AWS Clusters

note

Enterprise Edition only: Provisioning multiple execution clusters requires Workload Management available in the Enterprise Edition. Please contact Dremio if you would like to use this feature.

In AWS deployments, Dremio supports the ability to provision multiple separate execution clusters from a single Dremio coordinator node, dynamically schedule execution clusters to run idependently at different times and automatically start and stop based on workload requirements at runtime. This provides several benefits, including:

  • Workloads are isolated within their own set of CPU, Memory & C3 resources and not impacted by other workloads
  • Time-sensitive yet resource intensive workloads (nightly jobs, reflections refreshes, etc.) can be provisioned with the appropiate amount of resources to complete on time, but remain cost effective by only running when required
  • Track cost by team by running workloads on their own resources
  • Right size execution resources for each distinct workload, instead of implementing a one sized fits model
  • Easily experiment with different execution resource sizes, at any scale
  • Run execution resources in different regions for localization as required
drawing

Configuration

To provision multiple clusters in AWS the following steps are required to configure the AWS environment and Dremio.

  1. Create a S3 bucket for Distributed Storage
  2. Create an IAM Policy for Dremio
  3. Create an IAM Role for Dremio
  4. Launch an EC2 Instance for the Coordinator Node and Install Dremio
  5. Edit the dremio.conf file
  6. Add the core-site.xml file
  7. Start Dremio

Step 1: Create a S3 bucket for Distributed Storage

In AWS, create an S3 bucket that will be used by Dremio for Distributed Storage. Dremio will use this S3 bucket to store Reflections, File Uploads, and Job Result Downloads. Note: The S3 bucket should be created in the same AWS region the Dremio Coordinator node will run in.

Step 2: Create an IAM Policy for Dremio

Create a new IAM Policy to assign to the Dremio IAM Role and add the following permissions:

ServiceActionsResource
S3PutObject, GetObject, ListBucket, DeleteObject, GetBucketLocationThe ARN for the S3 bucket created in Step 1, e.g. "arn:aws:s3:::&ltS3 Bucket Name&gt/*"
S3ListAllMyBucketsAll Resources
EC2DeletePlacementGroup
DescribeInstances
TerminateInstances
CreatePlacementGroup
RunInstances
DescribePlacementGroups
All Resources
IAMPassRoleThe ARN for the IAM role to be created in Step 3, e.g. "arn:aws:iam::&ltAccountID&gt:role/dremioiamorole"

The IAM policy json file that can be used to define to policy is:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "distS3Access",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket",
"s3:DeleteObject",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::<S3 Bucket Name>",
"arn:aws:s3:::<S3 Bucket Name>/*"
]
},
{
"Sid": "getBucketsForS3Source",
"Effect": "Allow",
"Action": [
"s3:ListAllMyBuckets"
],
"Resource": "*"
},
{
"Sid": "ec2ManagementOps",
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:RunInstances",
"ec2:TerminateInstances",
"ec2:DeletePlacementGroup",
"ec2:CreatePlacementGroup",
"ec2:DescribePlacementGroups",
"ec2:CreateTags",
"ec2:DescribeImages"
],
"Resource": "*"
},
{
"Sid": "ec2AssignRowAllowed",
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::<AWS Accoount Number>:role/<IAM Role>"
}
]
}

Step 3: Create an IAM Role for Dremio

Create a new IAM Role for Dremio, select EC2 as the service to use the role and assign the policy created in Step 2 to the role. The role should be the same name specified in the IAM:PassRole action in Step 2 above. For example, the IAM Role name for the above policy would be dremioiamrole.

Step 4: Launch an EC2 Instance for the Coordinator Node and Install Dremio

Create and launch an EC2 Instance to use as the Dremio coordinator node, and assign the IAM Role created in Step 3 to the instance.

After starting the instance, follow the instructions to install dremio as a standalone deployment.

note

Make sure to download and install the required Java version.

Step 5: Edit the dremio.conf file

The following configuration settings are required in the dremio.conf file on the coordinator node Dremio was installed on in Step 4.

  1. Configure the embedded Zookeeper service on the coordinator node by adding the zookeeper: "<Coordinator IP Address>:2181" property to the dremio.conf file using the Private IP address of the coordinator's EC2 instance.
  2. Disable the executor service on the coordinator node by setting executor.enabled: false in the dremio.conf file. This instructs Dremio to only start coordinator node services and requires execution resources to be separately configured and launched in order to process queries.
  3. Configure Distributed Storage to use the S3 for accelerator, uploads & downloads. Specify the bucket name to be the S3 bucket created in Step 1.
  4. Ensure the local path is configured as /var/lib/dremio

After making the above changes the dremio.conf should contain the following.

paths: {
# the local path for dremio to store data.
local: "/var/lib/dremio"

# the distributed path Dremio data including job results, downloads, uploads, etc
accelerator: "dremioS3:///<bucket_name>/accelerator"
uploads: "dremioS3:///<bucket_name>/uploads"
downloads: "dremioS3:///<bucket_name>/downloads"
scratch: "dremioS3:///<bucket_name>/scratch"
}

services: {
coordinator.enabled: true,
coordinator.master.enabled: true,
executor.enabled: false
}

zookeeper: "<Coordinator IP Address>:2181"

Step 6: Add the core-site.xml file

In the same directory as dremio.conf is saved, create a new file named core-site.xml with the same permissions and add the following to the file

<?xml version="1.0"?>
<configuration>
<property>
<name>fs.dremioS3.impl</name>
<value>com.dremio.plugins.s3.store.S3FileSystem</value>
</property>
<property>
<name>fs.s3a.aws.credentials.provider</name>
<value>com.amazonaws.auth.InstanceProfileCredentialsProvider</value>
</property>
<property>
<name>dremio.s3.async</name>
<value>true</value>
</property>
</configuration>

Step 7: Set any additional configuration settings

Additional settings can be configured at this point, for example to configure Dremio to use High Availability or an External Zookeeper server.

Step 8: Start Dremio

Start Dremio with sudo service dremio start

After starting Dremio only the coordinator node will be active with no running execution resources. This can be seen by navigating to the Node Activity page Admin -> Cluster -> Node Activity, which shows only the coordinator node active.

At least one execution cluster needs to be configured and started to execute queries or access services such as UI previews.

Usage

Creating AWS Clusters

New execution clusters on AWS are configured by navigating to the Provisioning page Admin -> Cluster -> Provisioning. From the Provisioning page select Add New and then AWS Cluster.

The "Set Up AWS Cluster" popup page by default automatically selects EC2 configuration options used to launch the coordinator node (Security Group, VPC, EC2 Key Pair, etc). Assuming the execution cluster will use these defaults the main configuration options to specify are:

  • Cluster Name - The name for the cluster
  • Instance Type - The AWS Instance Type to use for each executor in the cluster, different clusters may be configured with different instance types
  • Node Count - The number of executors for this cluster

After specifying the Cluster Name, Instance Type and Node Count, create and start the execution cluster by selecting Save & Launch, which:

  1. Saves the execution cluster's configuration
  2. Starts the execution cluster and provisions EC2 nodes

Typically execution clusters complete AWS EC2 provisioning and startup in less than one minute, however timeframes can vary based on instance availablity in EC2.

Note: The IAM Role for S3 Access parameter should automatically match the ARN for the IAM role resource provided for the IAM:PassRole permission specified in Step 2, with the change that "role" changes to "instance-provifile". For example, if the ARN provided in Step 2 was arn:aws:iam::<AccountID>:role/dremioiamrole the ARN shown in the IAM Role for S3 Access field should be arn:aws:iam::<AccountID>:instance-profile/dremioiamrole

After launching an execution cluster, new EC2 instances are provisioned as executor nodes. The state for each node can be monitored on the Provisioning page, nodes start in the Pending state while waiting for AWS EC2 to identify resources, transition to the Provisioning state during startup, and finally transition to Active once the EC2 node and Dremio Software are fully available on the node.

The execution cluster is available for queries once at least 70% of the nodes are available. If that percentage is not available after 5 minutes Dremio cancels the operation and terminates the cluster, including any launched nodes.

Managing EC2 Clusters

Monitoring Clusters

EC2 execution clusters can be viewed and monitored from two administration pages.

The Provisioning admin page shows each execution cluster, the summary status for each node in the cluster and controls to start/stop the cluster, change the cluster's configuration or delete the cluster.

The Node Activity admin page shows each coordinator and executor node along with which cluster a node is associated with

Stopping Clusters

Running clusters are stopped by selecting Stop on the Provisioning page. Stopping a cluster immediately cancels all active queries on the cluster and deletes all resources provisioned for the cluster, this includes EC2 instances, EBS volumes, etc.

Starting Clusters

Stopped clusters are started by selecting Start on the Provisioning page. Starting a cluster initiates the same process to launch the cluster as when the cluster was first created and started with Save & Launch described above.

Routing Queries

There are two methods available to control which execution cluster queries run on.

  1. Workload Management based Routing
  2. Direct Routing

Workload Management based Routing

Workload Management can control the execution cluster each query runs on by defining Rules processed during runtime that determine the Queue for each individual query. In deployments with only one cluster all Queues share the same execution resources and route to the same single cluster. However, when provisioning multiple EC2 execution clusters each Queue can be scheduled on a different execution cluster.

To route queries to a specific execution cluster:

  1. Configure WLM Rules to route a set of queries to a specific Queue
  2. Goto Edit Queue and change the Queue's properties and set the Cluster Name to desired execution cluster

The following example demontrates routing high cost reflection rebuild queries to a specific cluster just for large reflection rebuilds.

The default Cluster Name for a queue is Any which allows queries in that WLM Queue to run on any cluster available at the time of execution. Once a specific Cluster Name is selected queries in that Queue will only run on the selected execution cluster and if the execution cluster is not available the query will fail with the error message Error: No executors are available even if other execution clusters are currently active.

note

Configurations that either have more than one execution cluster or use the Auto Start feature should specify a Cluster Name for all WLM Queues and make sure than no WLM Queues route to 'Any'. 'Any' is intended only for configuratations that do not make use of multiple execution clusters and for backward compatibility. Queues that route to 'Any' will not automatically start a cluster and can also route queries to clusters that are stopped.

Direct Routing

Direct Routing is used to specify the exact Queue and Execution Cluster to run queries on for a given ODBC or JDBC session. With Direct Routing WLM Rules are not considered and instead queries are routed directly to the specified Queue. Clients can be configured so that all queries run on a specific execution cluster or queries run on different execution clusters on a per-session basis.

To use Direct Routing add the Connection property ROUTING_QUEUE = <WLM Queue Name> to the ODBC or JDBC session parameters when connecting to Dremio. When set all queries for the session are automatically routed to the specified WLM Queue and the execution cluster selected for that Queue.

To disable Tag Routing set the Dremio support key dremio.wlm.direct_routing to false. By default Direct Routing is enabled.

Workload management provides details on how to configure ODBC and JDBC connections for Direct Routing

Automatic Cluster Management

Dremio supports automatic management of execution clusters and can automatically both start or stop clusters based on workload requirements. Automatic Cluster Management eliminates the need to manually provision and terminate execution clusters and helps users cost optimize Dremio by only running resources when workloads require them.

Auto Start

Auto Start instructs Dremio to automatically provision and start an execution cluster when new queries are issued to that cluster. When Auto Start is enabled in the execution cluster's Properties page, Dremio automatically starts the execution cluster under the following conditions:

  1. The execution cluster is selected as the queue's Cluster Name
  2. The execution cluster is currently stopped

Execution clusters are only automatically started if they are selected as the Cluster Name for a queue. If the Cluster Name for a queue is Any and no execution clusters are running, the query will fail with Error: No executors are available instead of selecting a random execution cluster to start.

Auto Stop

Auto Stop instructs Dremio to automatically stop an execution cluster after a period of inactivity. When you select a time period for Auto Stop in an engine's Properties page, the execution cluster will stop automatically if no queries are issued to that cluster for the selected time period. The default timeout for Auto Stop is two hours. To disable Auto Stop, select the Disable Auto Stop option.

Advanced Configuration

Cluster Properties

Additional cluster properties to configure are:

OptionDescription
NameName of the cluster
RegionAWS Region to deploy the cluster, supported regions are: us-east-1: US East (Northern Virginia), us-west-1: US West (Northern California), us-west-2 : US West (Oregon), eu-west-1 : EU (Ireland), ap-southeast-1 : Asia Pacific (Singapore)
Instance TypeAWS Instance Type used for execution nodes, supported types are: m5d.8xlarge (32c/128gb) - General Purpose (default), m5ad.8xlarge (32c/128gb) - General Purpose, r5d.4xlarge (16c/128gb) - Higher memory per unit of compute, c5d.18xlarge (72c/144gb) - Higher compute, i3.4xlarge (16c/122gb) - Higher local storage for caching
Instance CountNumber of execution nodes
EC2 Key Pair NameAWS Key Pair used to log onto server instances
Security Group IDThe Group ID of the Security Group to use
IAM Role for S3 AccessIAM Role used to access S3 buckets for Distributed Storage (optional)
Use Clustered PlacementWhether or not to use placement groups which locates nodes closer together. It is recommended to enable this option but can take longer for AWS to identify resources with larger clusters
Enable Auto StartAutomatically start the cluster when new queries are submitted to the cluster
Enable Auto StopAutomatically stop the cluster if no queries are submitted for a selected period of time
VPCThe VPC ID of the Virtual Private Network to use
Subnet IDThe Subnet ID of the Subnet to use
AMI IdentifierThe AMI of the execution nodes to use, optional. By default uses the latest version
AWS API Authentication ModeThe authentication mode used to provision EC2 nodes and stop EC2 nodes. Auto uses an IAM Role to management execution nodes. Alternatively an AWS Access and Secret Key can be used
Access KeyThe AWS Access Key if using Key/Secret authentication
SecretThe AWS Secret Key if using Key/Secret authentication
IAM Role for API OperationsThe IAM Role to assume to provision and manage execution nodes, by default the IAM Role of the coordinator node is used.
Extra Dremio Configuration PropertiesAdditional configuration options

Additional

Additional behaviors to be aware of:

  • When a cluster stops the EC2 nodes are terminated and deleted. As a result there are no addition expenses incurred when a cluster is stopped, however logs on the executor nodes are not maintained after the cluster is stopped.
  • Auto Stop only stops a cluster if there is at least one WLM Queue that connects to the cluster and routes queries to the cluster. Clusters not configured within WLM will not automatically stop.
  • Refreshing the node activity page could cause a cluster to start
  • If a cluster is in the process of stopping, a new query will not cause it to auto start and instead the query will be canceled with an error
  • Stopping the coordinator node does not stop execution clusters, execution clusters should be stopped prior to shutting down the coordinator node