Multiple AWS Clusters
Enterprise Edition only: Provisioning multiple execution clusters requires Workload Management available in the Enterprise Edition. Please contact Dremio if you would like to use this feature.
In AWS deployments, Dremio supports the ability to provision multiple separate execution clusters from a single Dremio coordinator node, dynamically schedule execution clusters to run idependently at different times and automatically start and stop based on workload requirements at runtime. This provides several benefits, including:
- Workloads are isolated within their own set of CPU, Memory & C3 resources and not impacted by other workloads
- Time-sensitive yet resource intensive workloads (nightly jobs, reflections refreshes, etc.) can be provisioned with the appropiate amount of resources to complete on time, but remain cost effective by only running when required
- Track cost by team by running workloads on their own resources
- Right size execution resources for each distinct workload, instead of implementing a one sized fits model
- Easily experiment with different execution resource sizes, at any scale
- Run execution resources in different regions for localization as required
Configuration
To provision multiple clusters in AWS the following steps are required to configure the AWS environment and Dremio.
- Create a S3 bucket for Distributed Storage
- Create an IAM Policy for Dremio
- Create an IAM Role for Dremio
- Launch an EC2 Instance for the Coordinator Node and Install Dremio
- Edit the dremio.conf file
- Add the core-site.xml file
- Start Dremio
Step 1: Create a S3 bucket for Distributed Storage
In AWS, create an S3 bucket that will be used by Dremio for Distributed Storage. Dremio will use this S3 bucket to store Reflections, File Uploads, and Job Result Downloads. Note: The S3 bucket should be created in the same AWS region the Dremio Coordinator node will run in.
Step 2: Create an IAM Policy for Dremio
Create a new IAM Policy to assign to the Dremio IAM Role and add the following permissions:
Service | Actions | Resource |
---|---|---|
S3 | PutObject, GetObject, ListBucket, DeleteObject, GetBucketLocation | The ARN for the S3 bucket created in Step 1, e.g. "arn:aws:s3:::<S3 Bucket Name>/*" |
S3 | ListAllMyBuckets | All Resources |
EC2 | DeletePlacementGroup DescribeInstances TerminateInstances CreatePlacementGroup RunInstances DescribePlacementGroups | All Resources | IAM | PassRole | The ARN for the IAM role to be created in Step 3, e.g. "arn:aws:iam::<AccountID>:role/dremioiamorole" |
The IAM policy json file that can be used to define to policy is:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "distS3Access",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket",
"s3:DeleteObject",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::<S3 Bucket Name>",
"arn:aws:s3:::<S3 Bucket Name>/*"
]
},
{
"Sid": "getBucketsForS3Source",
"Effect": "Allow",
"Action": [
"s3:ListAllMyBuckets"
],
"Resource": "*"
},
{
"Sid": "ec2ManagementOps",
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:RunInstances",
"ec2:TerminateInstances",
"ec2:DeletePlacementGroup",
"ec2:CreatePlacementGroup",
"ec2:DescribePlacementGroups",
"ec2:CreateTags",
"ec2:DescribeImages"
],
"Resource": "*"
},
{
"Sid": "ec2AssignRowAllowed",
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::<AWS Accoount Number>:role/<IAM Role>"
}
]
}
Step 3: Create an IAM Role for Dremio
Create a new IAM Role for Dremio, select EC2 as the service to use the role and assign the policy created in Step 2 to the role. The role should be the same name specified in the IAM:PassRole action in Step 2 above. For example, the IAM Role name for the above policy would be dremioiamrole
.