Prerequisites for Configuring Cloud Resources
Before running queries with Dremio Sonar, you must first provision the required resources in your cloud provider. You can then configure these cloud resources in the next step when adding your Sonar project.
See the prerequisites for your cloud provider: AWS or Azure.
AWS Prerequisites
For the configuration, you will need to address the following prerequisites:
-
Use an AWS account
-
Choose a supported region that you want to use
-
Create or use an existing Amazon Virtual Private Cloud (VPC) and subnets
-
Establish outbound connectivity from your VPC and subnets to allow query engines to communicate with Dremio Cloud
-
Grant permissions to Dremio Cloud for storage, compute, and network resources
Connecting Your AWS Account
If you don't have access to an AWS account with the required permissions, you can sign up for an AWS Free Tier account at https://aws.amazon.com/free/.
Selecting a VPC and Subnets
You can use an existing VPC and subnets, although if you don't have a VPC that meets the networking requirements, then you will need to create one. For steps, see Create a VPC and Subnets.
See the following guidelines for selecting subnets:
- Specify only private subnets or only public subnets. Mixing private and public subnets is not supported.
- Ensure that each subnet that you specify belongs to a separate availability zone. For example, if you specify subnet A and subnet B, they cannot both be in availability zone C, but they must be in separate availability zones.
- Ensure that subnet IDs are unique across all of the availability zones within a VPC.
Establishing Outbound Connectivity
Outbound connectivity is required to allow query engines to communicate with Dremio Cloud. Engines establish a connection with the Dremio Cloud control plane using port 443 (HTTPS). No other open ports are required in your VPC.
To establish this connection, you can use an internet gateway with a public IP address, NAT gateway, or AWS PrivateLink. If your VPC has internet connectivity, you can securely connect to the Dremio Cloud control plane via the internet gateway or NAT gateway. However, we recommend using PrivateLink as it provides secure connectivity to the Dremio Cloud control plane and also improves the overall security posture as it does not require the VPC to have internet connectivity. In addition, we provide a CloudFormation template to simplify the provisioning of the PrivateLink.
Verify Connectivity
Before getting started with Dremio Sonar, verify outbound connectivity from your subnets by running the following command from an EC2 instance in each subnet:
Command to verify connectivitycurl -v https://gw.dremio.cloud
Granting Permissions
You must also grant Dremio some permissions on your VPC. You can choose either to:
-
Grant them automatically by launching the CloudFormation template (CFT) from your AWS user account. The CFT is the recommended method, because the CFT will create the required resources for you. For a breakdown of the resources that will be created and the permissions that will be granted, see the annotated CloudFormation template.
-
Grant them manually by following the steps listed on Configuring Cloud Resources Manually if you prefer to create the required resources and Sonar project manually. For a breakdown of the resources that you will need to create, see Required Resources.
To use either method, you must have permissions to create the following required resources and grant the required access:
-
Storage creates an S3 bucket with read and write permissions to use as the project store. The project store contains all project-specific data, such as metadata and reflections.
-
Compute Access creates an AWS Identity and Access Management (IAM) role or user that will create and manage compute resources (Dremio engines). An IAM role or IAM user is grant access to the project store and enables Dremio Cloud to manage engines.
-
Network creates a security group with an outbound rule that allows connectivity from Dremio engines to the Dremio control plane via TLS.
Required Resources
If you choose to configure the cloud resources manually, you will need to create the following required resources. Otherwise, the CFT will create the required resources for you.
S3 Bucket
An encrypted S3 bucket is used for the project store that stores various types of project data, including:
- The data for reflections that are created in the project
- The default path for new tables that are used for data and manifests for datasets
- All of the tables that store records of events and other historical data
Security Group
A security group acts as a virtual firewall to control the traffic that is allowed to and from your resources, ensuring that only traffic from Dremio Cloud reaches the resources that you have allocated for your Dremio Cloud organization.
Outbound Rule
An outbound rule allows EC2 instances to connect to Dremio’s control plane by using TLS. For example, if the VPC for your organization is running in AWS, Dremio’s control plane deploys compute engines as AWS EC2 instances within your VPC.
IAM Role or IAM User
An IAM role is an IAM identity that you can create in your account that has specific permissions. In this case, the IAM roles are granted permissions on the resources that you specify for your Dremio Cloud organization, and these roles are assigned to Dremio Cloud.
An IAM user is an entity that you create in AWS to represent the person or application that uses it to interact with AWS. A user in AWS consists of a name and credentials. In this case, Dremio Cloud is given the access key ID and secret access key as credentials for connecting to your VPC to access the resources that you give it permission to use.
Policy Template to Grant Access to the Project Store
The following policy template is the minimum policy requirement to allow read and write access to the project store. It grants Dremio Cloud permissions, through IAM roles or IAM users, for storing metadata and views for the project in an S3 bucket in your Amazon VPC. The permissions are described in comments in the template. Replace BUCKET-NAME with the name of the S3 bucket you want to use as the Dremio Cloud project store:
Template for the Policy JSON{
"Version": "2012-10-17",
"Statement": [
# Allow Dremio to enumerate S3 buckets within the account.
{
"Effect": "Allow",
"Action": [
"s3:ListAllMyBuckets"
],
"Resource": "arn:aws:s3:::*"
},
# Allow Dremio R/W access to the Project Store bucket used to store housekeeping information such as metadata and reflections.
{
"Effect": "Allow",
"Action": [
"s3:DeleteObject",
"s3:GetObject",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::BUCKET-NAME/*"
]
},
# Allow Dremio to determine the region, list content and add tags on the Project Store bucket.
{
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:PutBucketTagging"
],
"Resource": [
"arn:aws:s3:::BUCKET-NAME"
]
},
# Allow Dremio read access to sample datasets used to get users started easily on the platform without connecting their own data.
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::ap-southwest-1.examples.dremio.com",
"arn:aws:s3:::eu-west-1.examples.dremio.com",
"arn:aws:s3:::us-east-1.examples.dremio.com",
"arn:aws:s3:::us-west-1.examples.dremio.com",
"arn:aws:s3:::us-west-2.examples.dremio.com"
]
}
]
}
Policy Template for Enabling Dremio Cloud to Manage Engines
The following policy enables Dremio Cloud to create and manage engines in your VPC. The permissions are described in comments in the template:
Template for the Policy JSON{
"Version": "2012-10-17",
"Statement": [
# Allow Dremio to terminate instances with the "dremio_managed" tag.
{
"Effect": "Allow",
"Action": "ec2:TerminateInstances",
"Resource": "arn:aws:ec2:*:*:instance/*",
"Condition": {
"StringEquals": {
"ec2:ResourceTag/dremio_managed": "true"
}
}
},
# Require the "dremio_managed" tag for instances/volumes when creating instances.
{
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": [
"arn:aws:ec2:*:*:volume/*",
"arn:aws:ec2:*:*:instance/*"
],
"Condition": {
"StringEquals": {
"aws:RequestTag/dremio_managed": "true"
}
}
},
# Allow creating instances without the "dremio_managed" tag on resources other than instances/volumes.
{
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": [
"arn:aws:ec2:*:*:launch-template/*",
"arn:aws:ec2:*:*:fleet/*",
"arn:aws:ec2:*::image/*",
"arn:aws:ec2:*:*:network-interface/*",
"arn:aws:ec2:*:*:security-group/*",
"arn:aws:ec2:*:*:subnet/*",
"arn:aws:ec2:*:*:placement-group/*"
]
},
# Allow Dremio to create tags on instances/volumes only upon the initial creation of an instance.
{
"Effect": "Allow",
"Action": "ec2:CreateTags",
"Resource": [
"arn:aws:ec2:*:*:instance/*",
"arn:aws:ec2:*:*:volume/*"
],
"Condition": {
"StringEquals": {
"ec2:CreateAction": "RunInstances"
}
}
},
# Allow Dremio to create tags on placement groups (PG) upon the initial creation of a PG.
{
"Effect": "Allow",
"Action": "ec2:CreateTags",
"Resource": "arn:aws:ec2:*:*:placement-group/*",
"Condition": {
"StringEquals": {
"ec2:CreateAction": "CreatePlacementGroup"
}
}
},
# Allow Dremio to create tags on a launch template (LT) upon the initial creation of a LT.
{
"Effect": "Allow",
"Action": "ec2:CreateTags",
"Resource": "arn:aws:ec2:*:*:launch-template/*",
"Condition": {
"StringEquals": {
"ec2:CreateAction": "CreateLaunchTemplate"
}
}
},
# Allow Dremio to create tags on a fleet upon the initial creation of the fleet.
{
"Effect": "Allow",
"Action": "ec2:CreateTags",
"Resource": "arn:aws:ec2:*:*:fleet/*",
"Condition": {
"StringEquals": {
"ec2:CreateAction": "CreateFleet"
}
}
},
# Allow Dremio to create fleet only when including the "dremio_managed" tag.
{
"Effect": "Allow",
"Action": "ec2:CreateFleet",
"Resource": "arn:aws:ec2:*:*:fleet/*",
"Condition": {
"StringEquals": {
"aws:RequestTag/dremio_managed": "true"
}
}
},
# Allow Dremio to create fleet with other resources without the "dremio_managed" tag.
{
"Effect": "Allow",
"Action": "ec2:CreateFleet",
"Resource": [
"arn:aws:ec2:*:*:instance/*",
"arn:aws:ec2:*:*:image/*",
"arn:aws:ec2:*:*:launch-template/*",
"arn:aws:ec2:*:*:network-interface/*",
"arn:aws:ec2:*:*:placement-group/*",
"arn:aws:ec2:*:*:security-group/*",
"arn:aws:ec2:*:*:subnet/*"
]
},
# Only allow Dremio to delete fleets with the "dremio_managed" tag.
{
"Effect": "Allow",
"Action": "ec2:DeleteFleets",
"Resource": "arn:aws:ec2:*:*:fleet/*",
"Condition": {
"StringEquals": {
"ec2:ResourceTag/dremio_managed": "true"
}
}
},
# Allow Dremio to create a launch template.
{
"Effect": "Allow",
"Action": "ec2:CreateLaunchTemplate",
"Resource": "arn:aws:ec2:*:*:launch-template/*"
},
# Only allow Dremio to delete launch templates with the "dremio_managed" tag.
{
"Effect": "Allow",
"Action": "ec2:DeleteLaunchTemplate",
"Resource": "arn:aws:ec2:*:*:launch-template/*",
"Condition": {
"StringEquals": {
"ec2:ResourceTag/dremio_managed": "true"
}
}
},
# Allow Dremio to describe fleets with the "dremio_managed" tag.
{
"Effect": "Allow",
"Action": "ec2:DescribeFleets",
"Resource": "arn:aws:ec2:*:*:fleet/*",
"Condition": {
"StringEquals": {
"ec2:ResourceTag/dremio_managed": "true"
}
}
},
# Only allow Dremio to delete placement groups with the "dremio_managed" tag.
{
"Effect": "Allow",
"Action": "ec2:DeletePlacementGroup",
"Resource": "arn:aws:ec2:*:*:placement-group/*",
"Condition": {
"StringEquals": {
"ec2:ResourceTag/dremio_managed": "true"
}
}
},
# Allow Dremio to create a placement group.
{
"Effect": "Allow",
"Action": "ec2:CreatePlacementGroup",
"Resource": "arn:aws:ec2:*:*:placement-group/*"
},
# Allow Dremio to enumerate resources in the account.
{
"Effect": "Allow",
"Action": [
"ec2:DescribeImages",
"ec2:DescribeLaunchTemplateVersions",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeVpcs",
"ec2:DescribeSubnets",
"ec2:DescribeTags",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInstances",
"ec2:DescribeInstanceStatus",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeNetworkInterfaceAttribute",
"ec2:DescribePlacementGroups",
"ec2:DescribeSecurityGroups",
"ec2:DescribeVpcEndpoints",
"ec2:DescribeVolumes"
],
"Resource": "*"
},
# This section appears only if you chose to create a cross-account IAM role in the previous step.
{
"Effect": "Allow",
"Action": [
"iam:PassRole",
"sts:AssumeRole"
],
"Resource": [
"<Role ARN from Step 1: Configure Storage Settings>"
]
}
]
}
Azure Prerequisites
Before connecting your Azure account to Dremio Cloud, you must provision appropriate resources within your subscription. The prerequisites differ depending on whether you use the manual or template-based onboarding.
- Copy the following IDs and save them in a location where they can be retrieved (template and manual connection methods):
- Tenant ID from the Microsoft Entra ID
- Subscription ID from the Subscriptions page
Prerequisites for Manual Azure Onboarding
Your Azure subscription that will be used to deploy the ARM template must have the following resource providers registered:
-
Microsoft.Compute
-
Microsoft.Network
-
Microsoft.Storage
Refer to Azure documentation on how to register a resource provider.
For manual onboarding, you can complete the prerequisites by provisioning with the Azure CLI if desired.
Enabling Disk Encryption
To protect your data and ensure organizational security and compliance needs, Azure allows disk encryption for the virtual machines (VMs) it launches in your environment. Specifically, Azure enables encryption at host functionality to provide end-to-end encryption of VM data.
To enable disk encryption for your Azure subscription, use the following command:
Azure CLI: Command to enable disk encryption for your Azure subscriptionaz feature register --name EncryptionAtHost --namespace Microsoft.Compute
For more information, follow these steps.
Checking Compute Quota
Sufficient quota needs to be assigned based on workloads and usage estimates as well as Dremio engine requirements.
Make sure your Azure subscription has the quota allocated to launch the required D16d_v5 or D32d_v5 VMs in the supported region that you plan to use, because Dremio supports these two VMs. Use the Ddv5 SKU to set your quota in Azure, and if you need to increase your quota, see Increase VM-family vCPU Quotas.
Also ensure that the type and number of Azure VMs align with the engine size you plan to use. An engine represents a Dremio Cloud entity that manages compute resources. For a query that is submitted to execute on an engine, the control plane assigns an engine replica to that query. An engine replica is a group of Azure VMs defined by the engine capacity.
Refer to the table below for the engine sizes that are mapped to Azure VMs and a fixed number of cores. The engine sizes are shown for one replica.
Engine Size | Number of Azure VMs | Number of Cores |
---|---|---|
XX_SMALL_V1 | 1 Standard_D16d_v5 | 16 |
X_SMALL_V1 | 1 Standard_D32d_v5 | 32 |
SMALL_V1 | 2 Standard_D32d_v5 | 64 |
MEDIUM_V1 | 4 Standard_D32d_v5 | 128 |
LARGE_V1 | 8 Standard_D32d_v5 | 256 |
X_LARGE_V1 | 16 Standard_D32d_v5 | 512 |
XX_LARGE_V1 | 32 Standard_D32d_v5 | 1024 |
XXX_LARGE_V1 | 64 Standard_D32d_v5 | 2048 |
Dremio uses the unutilized quota on the Ddv5 SKUs.
Azure images for Dremio Cloud executors are built on the Ubuntu 22.04 Linux distribution.
Creating or Using a Resource Group
You can use any previously created resource groups, but Dremio recommends creating a separate resource group, which will simplify the visibility into the resources created by Dremio. For creating or managing Azure resource groups, see the Azure Resource Manager.
After you configure an Azure resource group for Dremio Cloud, you cannot alter or delete that resource group from your Azure account, or else all projects associated with the resource group will become unusable.
When adding a Dremio cloud for Azure, you can specify the following resource groups:
Compute Resource Group | (Optional) Network Resource Group |
---|---|
This is the default resource group where Dremio Engines will be scaled, and the required network resources (as described in this prerequistes) would need to exist in this resource group. | This allows you to specify a separate resource group (distinct from the Compute Resource Group) focused on networking resources. |
This will allow you to have Cloud configurations as, for example:
Cloud Configuration | Compute Resource Group | Network Resource Group |
---|---|---|
Using Compute Resource Group Only |
| N/A |
Using Compute and Network Resource Groups |
|
|
So, depending on the configuration you want for your Dremio cloud, you will have to specify the following in Azure:
- A Compute Resource Group.
or - A Compute Resource Group and a Network Resource Group.
You will also have to grant permissions to the App-Registrations you plan to use to onboard onto Dremio in your Azure account at these resource group scopes. For more information, see the section Registering an Application on this page.
Creating or Using a VNet and Subnets
Outbound connectivity from your Azure VNet and subnets is required to allow query engines to communicate with Dremio Cloud. Engines establish a connection with the Dremio Cloud control plane using port 443 (HTTPS) outbound to the internet. No open ports are required in your Azure subnet, and neither subnets for incoming connections nor engines require public IP addresses.
-
Copy the names (not the resource IDs) of the VNet and subnets, and save them in a location where they can be retrieved.
-
Ensure that there are at least five IP addresses available in the subnets. If you configure an engine to use more than one replica, the engine will autoscale based on the load, and the number of available IP addresses in the subnets must support the scale.
-
It is recommended for the subnet size to be 10.0.0.0/16 (65536 addresses).
The Virtual Network (VNet) and Subnet must be in the correct resource group, depending on the resource configuration of your Dremio cloud (see the section in this topic about Creating or Using a Resource Group):
- Using Compute Resource Group Only
- Using Compute and Network Resource Groups
The Virtual Network (VNet) and Subnet must be in the Compute resource group.
The Virtual Network (VNet) and Subnet must be in the Network resource group.
Creating or Using a Network Security Group
A network security group (NSG) is required for connecting your Azure account and it must have internet access to communicate with the Dremio control plane and access the Azure storage account for storing metadata. You can decide either to:
-
Associate the VNet with a NSG before connecting your Azure account (as a prerequisite), because you will not need to provide the NSG as part of the process when connecting your account. To do so, copy the name (not the resource ID) of the NSG and save it in a location where it can be retrieved.
-
Provide the NSG as part of the process when connecting your Azure account, and Dremio would associate the NSG with the network interface card (NIC) created during VM creation. If a NSG is provided, then the provided one will be applied to the NIC associated with the VMSS. This means that there will be two different NSGs: one at the VNet level and another at NIC level.
The Network Security Group must be in the correct resource group, depending on the resource configuration of your Dremio cloud (see the section in this topic about Creating or Using a Resource Group):
- Using Compute Resource Group Only
- Using Compute and Network Resource Groups
The Network Security Group must be in the Compute resource group.
The Network Security Group must be in the Network resource group.
Network Security Group Rules
In the subnet, Dremio executors communicate with one another, and in order to return results to the Dremio control plane, executors require outbound connectivity back to the Dremio control plane. See the following advised rules for the outbound security rules:
Inbound & Outbound Network Security Rules (inclusive of Azure default rules)
Inbound/Outbound | Priority | Port[s] | Protocol | Source | Destination | Allow/Deny |
---|---|---|---|---|---|---|
Inbound | 65000 | Any | Any | VirtualNetwork | VirtualNetwork | Allow |
65000 | Any | Any | AzureLoadBalancer | Any | Allow | |
65500 | Any | Any | Any | Any | Deny | |
Outbound | 100 | 443 | TCP | Any | Any | Allow |
4096 | Any | Any | Any | Internet | Deny | |
65000 | Any | Any | VirtualNetwork | VirtualNetwork | Allow | |
65001 | Any | Any | Any | Internet | Allow | |
65500 | Any | Any | Any | Any | Deny |
Registering an Application
You will need to register a new application or use an existing application with a service principal and then grant it the necessary permissions (as detailed below).
To create an app registration, follow these steps to register an application within your Azure tenant:
-
For Name, enter a name for the application. Then copy the name and save it in a location where it can be retrieved.
-
For Supported Account Types, select the Single Tenant option.
-
Do not specify a redirect URL.
-
Click Register.
-
Copy the Application (client) ID and save it in a location where it can be retrieved.
-
Add a client secret and save it in a location where it can be retrieved.
Granting Permissions
Dremio needs permissions to create Virtual Machine Scale Sets (VMSS) and manage Virtual Machines (VM). To grant these permissions, you can either use the Azure built-in roles or create custom roles, and the way you do it will depend on the the configuration you have for the resource groups (see the section in this topic about Creating or Using a Resource Group).
- Using Compute Resource Group Only
- Using Compute and Network Resource Groups
If you're only using a Compute Resourse Group (and no Network Resource Group), the built-in roles are:
- Virtual Machine Contributor
- Avere Contributor
Below is a comparison table for you to decide which built-in roles to use for the application(s) based on the permissions assigned to them:
Role | Compute | Storage | Compute Resource Group | Comments | Best Practice |
---|---|---|---|---|---|
Virtual Machine Contributor + Avere Contributor | Yes | Yes | Virtual Machine Contributor + Avere Contributor | Single Azure app-registration with both roles assigned can be used for compute and storage | Use a single Azure app-registration with both roles assigned at resource group scope OR Use two Azure app-registrations:
|
Virtual Machine Contributor | No (Virtual Machine Contributor role does not have proximity placement group permissions) | No | Virtual Machine Contributor | Single Azure app-registration with only the Virtual Machine Contributor role will not work for compute | |
Avere Contributor | No (Avere Contributor role does not have VMSS permissions) | Yes | Avere Contributor | Single Azure app-registration with only the Avere Contributor role will not work for compute but will work for storage |
If you're using a Compute Resourse Group and a Network Resource Group, the built-in roles are:
- Virtual Machine Contributor
- Avere Contributor
- Network Contributor
Below is a comparison table for you to decide which built-in roles to use for the application(s) based on the permissions assigned to them:
Role | Compute | Storage | Network | Compute Resource Group | Network Resource Group | Comments | Best Practice |
---|---|---|---|---|---|---|---|
Virtual Machine Contributor + Avere Contributor + Network Contributor | Yes | Yes | Yes | Virtual Machine Contributor + Avere Contributor + Network Contributor | Network Contributor | Single Azure app-registration with 3 roles assigned can be used for compute, network, and storage | Use a single Azure app-registration with 3 roles assigned at Compute Resource Group scope and the Network Contributor role assigned at Network Resource Group OR Use two Azure app-registrations:
|
Virtual Machine Contributor | No (Virtual Machine Contributor role does not have proximity placement group permissions) | No | Yes | Virtual Machine Contributor | N/A | Single Azure app-registration with only the Virtual Machine Contributor role will not work for compute but will only work for network | |
Avere Contributor | No (Avere Contributor role does not have VMSS permissions) | Yes | No | Avere Contributor | N/A | Single Azure app-registration with only the Avere Contributor role will not work for compute but will work for storage | |
Network Contributor | No | No | Yes | Network Contributor | Network Contributor | Single Azure app-registration with only the Network Contributor role will not work for compute but will only work for network |
How to Assign Roles to App-registrations
To grant permissions to Dremio for creating and managing compute resources, follow these steps to assign roles using the Azure portal with the service principal in your resource group that you created specifically for Dremio:
- Using Compute Resource Group Only
- Using Compute and Network Resource Groups
-
For Step 3: Select the appropriate role, assign the Virtual Machine Contributor role.
But, if you want to use a tailored set of permissions, create a custom role (expand and use the code below), and select that role instead of the Virtual Machine Contributor role.(Optional) Create a custom role for creating and managing compute resources
JSON code with the Compute policy for creating a custom role{
"properties": {
"roleName": "<role_name>",
"description": "Custom role for Dremio Cloud to manage compute resources",
"assignableScopes": [
"/subscriptions/<subscription-id>/resourceGroups/<resource-group-name>"
],
"permissions": [
{
"actions": [
# Allow Dremio to deallocate virtual machine scale sets
"Microsoft.Compute/virtualMachineScaleSets/deallocate/action",
# Allow Dremio to read, write, and delete the properties of virtual machine scale sets
"Microsoft.Compute/virtualMachineScaleSets/delete",
"Microsoft.Compute/virtualMachineScaleSets/write",
"Microsoft.Compute/virtualMachineScaleSets/read",
# Allow Dremio to get the properties of a virtual machine scale set SKU
"Microsoft.Compute/virtualMachineScaleSets/skus/read",
# Allow Dremio to read, dellocate, and delete virtual machines
"Microsoft.Compute/virtualMachineScaleSets/virtualMachines/read",
"Microsoft.Compute/virtualMachineScaleSets/virtualMachines/deallocate/action",
"Microsoft.Compute/virtualMachineScaleSets/virtualMachines/delete",
# Allow Dremio to read network interfaces
"Microsoft.Compute/virtualMachineScaleSets/virtualMachines/networkInterfaces/read",
"Microsoft.Compute/virtualMachineScaleSets/virtualMachines/networkInterfaces/ipConfigurations/read",
# Allow Dremio to read, write, and delete disks
"Microsoft.Compute/disks/write",
"Microsoft.Compute/disks/read",
"Microsoft.Compute/disks/delete",
# Allow Dremio to read, write, and delete proximity placement groups
"Microsoft.Compute/proximityPlacementGroups/write",
"Microsoft.Compute/proximityPlacementGroups/read",
"Microsoft.Compute/proximityPlacementGroups/delete",
# Allow Dremio to read galleries and images
"Microsoft.Compute/galleries/read",
"Microsoft.Compute/galleries/images/read",
"Microsoft.Compute/galleries/images/versions/read",
# Join an application gateway backend address pool
"Microsoft.Network/applicationGateways/backendAddressPools/join/action",
# Allow Dremio to create and manage network interfaces
"Microsoft.Network/networkInterfaces/join/action",
# Join a network security group
"Microsoft.Network/networkSecurityGroups/join/action",
# Get a network security group definition
"Microsoft.Network/networkSecurityGroups/read",
# Get a virtual network definition
"Microsoft.Network/virtualNetworks/read",
# Join a virtual network
"Microsoft.Network/virtualNetworks/subnets/join/action",
# Get the resources for the resource group
"Microsoft.Resources/subscriptions/resourceGroups/read",
# Connect to a serial port
"Microsoft.SerialConsole/serialPorts/connect/action"
],
"notActions": [],
"dataActions": [],
"notDataActions": []
}
]
}
} -
In Step 4: Select who needs access, for Assign access to, select user, group or service principal. For Select Members, select the (exact) name of the application/service principal that you registered before.
-
Click Review + assign.
-
Similar to the Virtual Machine Contributor role, also assign the Avere Contributor role to the same application.
-
For Step 3: Select the appropriate role, assign the Virtual Machine Contributor role.
But, if you want to use a tailored set of permissions, create a custom role (expand and use the code below), and select that role instead of the Virtual Machine Contributor role.(Optional) Create a custom role for creating and managing compute resources
JSON code with the Compute policy for creating a custom role{
"properties": {
"roleName": "<role_name>",
"description": "Custom role for Dremio Cloud to manage compute resources",
"assignableScopes": [
"/subscriptions/<subscription-id>/resourceGroups/<resource-group-name>"
],
"permissions": [
{
"actions": [
# Allow Dremio to deallocate virtual machine scale sets
"Microsoft.Compute/virtualMachineScaleSets/deallocate/action",
# Allow Dremio to read, write, and delete the properties of virtual machine scale sets
"Microsoft.Compute/virtualMachineScaleSets/delete",
"Microsoft.Compute/virtualMachineScaleSets/write",
"Microsoft.Compute/virtualMachineScaleSets/read",
# Allow Dremio to get the properties of a virtual machine scale set SKU
"Microsoft.Compute/virtualMachineScaleSets/skus/read",
# Allow Dremio to read, dellocate, and delete virtual machines
"Microsoft.Compute/virtualMachineScaleSets/virtualMachines/read",
"Microsoft.Compute/virtualMachineScaleSets/virtualMachines/deallocate/action",
"Microsoft.Compute/virtualMachineScaleSets/virtualMachines/delete",
# Allow Dremio to read network interfaces
"Microsoft.Compute/virtualMachineScaleSets/virtualMachines/networkInterfaces/read",
"Microsoft.Compute/virtualMachineScaleSets/virtualMachines/networkInterfaces/ipConfigurations/read",
# Allow Dremio to read, write, and delete disks
"Microsoft.Compute/disks/write",
"Microsoft.Compute/disks/read",
"Microsoft.Compute/disks/delete",
# Allow Dremio to read, write, and delete proximity placement groups
"Microsoft.Compute/proximityPlacementGroups/write",
"Microsoft.Compute/proximityPlacementGroups/read",
"Microsoft.Compute/proximityPlacementGroups/delete",
# Allow Dremio to read galleries and images
"Microsoft.Compute/galleries/read",
"Microsoft.Compute/galleries/images/read",
"Microsoft.Compute/galleries/images/versions/read",
# Join an application gateway backend address pool
"Microsoft.Network/applicationGateways/backendAddressPools/join/action",
# Allow Dremio to create and manage network interfaces
"Microsoft.Network/networkInterfaces/join/action",
# Join a network security group
"Microsoft.Network/networkSecurityGroups/join/action",
# Get a network security group definition
"Microsoft.Network/networkSecurityGroups/read",
# Get a virtual network definition
"Microsoft.Network/virtualNetworks/read",
# Join a virtual network
"Microsoft.Network/virtualNetworks/subnets/join/action",
# Get the resources for the resource group
"Microsoft.Resources/subscriptions/resourceGroups/read",
# Connect to a serial port
"Microsoft.SerialConsole/serialPorts/connect/action"
],
"notActions": [],
"dataActions": [],
"notDataActions": []
}
]
}
} -
In Step 4: Select who needs access, for Assign access to, select user, group or service principal. For Select Members, select the (exact) name of the application/service principal that you registered before.
-
Click Review + assign.
-
Similar to the Virtual Machine Contributor role, also assign the Avere Contributor role to the same application.
-
Also similar to the Virtual Machine Contributor role, also assign the Network Contributor role to the same application. But, if you want to use a tailored set of permissions, create a custom role (expand and use the code below) with a scope at both Compute and Network resource groups, and select that role instead of the Network Contributor role.
(Optional) Create a custom role for creating and managing network resources
JSON code with the Network policy for creating a custom role{
"properties": {
"roleName": "<role-name>",
"description": "Custom role for Dremio Cloud to manage network resources in another resource group from the compute.",
"assignableScopes": [
"/subscriptions/<subscription-id>/resourceGroups/<compute-resource-group-name>",
"/subscriptions/<subscription-id>/resourceGroups/<network-resource-group-name>"
],
"permissions": [
{
"actions": [
# Allow Dremio to create and manage network interfaces
"Microsoft.Network/networkInterfaces/join/action",
# Join a network security group
"Microsoft.Network/networkSecurityGroups/join/action",
# Get a network security group definition
"Microsoft.Network/networkSecurityGroups/read",
# Get a virtual network definition
"Microsoft.Network/virtualNetworks/read",
# Join a virtual network
"Microsoft.Network/virtualNetworks/subnets/join/action",
# Get private endpoints definition. If Private endpoints aren't used, it could be removed
"Microsoft.Network/privateEndpoints/read"
],
"notActions": [],
"dataActions": [],
"notDataActions": []
}
]
}
}
Creating a Project Store
Azure uses a project store for storing metadata and Dremio reflections. You must create a container in Azure Storage and grant Azure permissions to manage data within that container.
To grant permissions to Azure Storage for storing metadata, complete the following steps:
-
Create a storage account preferably within the same resource group specifically created for Azure.
a. (Optional) On the Advanced tab, you can Enable hierarchical namespace. More info can be found here.
b. On the Data Protection tab, disable the Enable soft delete for blobs permission.
c. On the Encryption tab, for Encryption Type, choose Microsoft-managed key (MMK).
d. Copy the project store name and save it in a location where it can be retrieved.
-
Create a container within that storage account and copy the name of the project store (container) in a location where it can be retrieved.
noteThe project store must be Azure Data Lake Storage (ADLS) Gen2 storage.
-
(Optional) Register a new application similar to the one created for compute resources or use the same application created for compute resources.
a. If registering a new application, add a client secret. Then copy the Application (client) ID and client secret and save them in a location where they can be retrieved.
-
Assign roles in your resource group that you created specifically for Azure.
a. In Step 3: Select the appropriate role, assign the Avere Contributor role.
But, if you want to use a tailored set of permissions, create a custom role (expand and use the code below), and select that role instead of the Avere Contributor role.Code example for creating a custom role for storing metadata
Code example for creating a custom role for storing metadata{
"properties": {
"roleName": "<role_name>",
"description": "<role_description>",
"assignableScopes": [
"/subscriptions/<subscription>/resourceGroups/<resource-group>/providers/Microsoft.Storage/storageAccounts/
<storage-account>" (Storage)
],
"permissions": [
{
"actions": [
# Return blob service properties or statistics
"Microsoft.Storage/storageAccounts/blobServices/read",
# Delete a container
"Microsoft.Storage/storageAccounts/blobServices/containers/delete",
# Return a container or a list of containers
"Microsoft.Storage/storageAccounts/blobServices/containers/read",
# Modify the metadata or properties of a container
"Microsoft.Storage/storageAccounts/blobServices/containers/write",
# Return a user delegation key for the blob service
"Microsoft.Storage/storageAccounts/blobServices/generateUserDelegationKey/action",
],
"notActions": [],
"dataActions": [
# Delete a blob
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete",
# Return a blob or a list of blobs
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
# Write to a blob
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write"
],
"notDataActions": []
}
]
}
}b. In Step 4: Select who needs access, for Assign access to, select user, group or service principal. For Select Members, select the (exact) name of the application/service principal that you registered before.
Provisioning with the Azure CLI
The prerequisites can also be created through the Azure CLI. The following instructions provision all the required resources according to best practices.
- Using Compute Resource Group Only
- Using Compute and Network Resource Groups
# Create local variables for your subscription, region, resource group name, and storage account name. Note: The East US and West Europe regions are supported.
SUBSCRIPTION_ID=<SUBSCRIPTION_ID>
REGION=<Region_ID>
RESOURCE_GROUP=<RESOURCE_GROUP>
STORAGE_ACCOUNT=<STORAGE_ACCOUNT>
# Log into your Azure account and set the subscription
az login
az account set --subscription $SUBSCRIPTION_ID
# Enable Disk Encryption
az feature register --name EncryptionAtHost --namespace Microsoft.Compute
# Create your Dremio resource group
az group create -l $REGION -n $RESOURCE_GROUP
# Create a virtual network, subnet and network security group
az network vnet create -g $RESOURCE_GROUP -n dremio-cloud-vnet-$REGION --address-prefix 10.0.0.0/16 --subnet-name dremio-cloud-sn-$REGION --subnet-prefix 10.0.0.0/16
az network nsg create -g $RESOURCE_GROUP -n dremio-cloud-nsg-$REGION
# Create a storage account and container with hierarchical namespace enabled and disable soft deletes. Then retrieve the account access key.
az storage account create -n $STORAGE_ACCOUNT -g $RESOURCE_GROUP -l $REGION --sku Standard_GRS --enable-hierarchical-namespace true --min-tls-version TLS1_2 --allow-blob-public-access false --https-only true
az storage blob service-properties delete-policy update --account-name $STORAGE_ACCOUNT --auth-mode login --enable false
az storage container create --name dremio-cloud --account-name $STORAGE_ACCOUNT --auth-mode login
# Create an application, grant the appropriate access then create and retrieve a key secret
az ad app create --display-name dremio-cloud-$REGION
APP_ID=$(az ad app list --display-name dremio-cloud-$REGION --query "[].[appId]" --output tsv)
az ad sp create --id $APP_ID
az role assignment create --role "Virtual Machine Contributor" --assignee $APP_ID --scope /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP
az role assignment create --role "Avere Contributor" --assignee $APP_ID --scope /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP
APP_PWD_O=$(az ad app credential reset --id $APP_ID --years 1 --display-name dremio-cloud-secret --append --output tsv)
APP_PWD=$(echo $APP_PWD_O | awk '{print $2}')
# Retrieve your tenant ID
TENANT_ID=$(echo $APP_PWD_O | awk '{print $3}')
# Create local variables for your subscription, region, resource group name, and storage account name. Note: The East US and West Europe regions are supported.
SUBSCRIPTION_ID=<SUBSCRIPTION_ID>
REGION=<Region_ID>
RESOURCE_GROUP=<RESOURCE_GROUP>
NETWORK_RESOURCE_GROUP=<NETWORK_RESOURCE_GROUP>
STORAGE_ACCOUNT=<STORAGE_ACCOUNT>
# Log into your Azure account and set the subscription
az login
az account set --subscription $SUBSCRIPTION_ID
# Enable Disk Encryption
az feature register --name EncryptionAtHost --namespace Microsoft.Compute
# Create your Dremio resource group
az group create -l $REGION -n $RESOURCE_GROUP
az group create -l $REGION -n $NETWORK_RESOURCE_GROUP
# Create a virtual network, subnet and network security group
az network vnet create -g $NETWORK_RESOURCE_GROUP -n dremio-cloud-vnet-$REGION --address-prefix 10.0.0.0/16 --subnet-name dremio-cloud-sn-$REGION --subnet-prefix 10.0.0.0/16
az network nsg create -g $NETWORK_RESOURCE_GROUP -n dremio-cloud-nsg-$REGION
# Create a storage account and container with hierarchical namespace enabled and disable soft deletes. Then retrieve the account access key.
az storage account create -n $STORAGE_ACCOUNT -g $RESOURCE_GROUP -l $REGION --sku Standard_GRS --enable-hierarchical-namespace true --min-tls-version TLS1_2 --allow-blob-public-access false --https-only true
az storage blob service-properties delete-policy update --account-name $STORAGE_ACCOUNT --auth-mode login --enable false
az storage container create --name dremio-cloud --account-name $STORAGE_ACCOUNT --auth-mode login
# Create an application, grant the appropriate access then create and retrieve a key secret
az ad app create --display-name dremio-cloud-$REGION
APP_ID=$(az ad app list --display-name dremio-cloud-$REGION --query "[].[appId]" --output tsv)
az ad sp create --id $APP_ID
az role assignment create --role "Virtual Machine Contributor" --assignee $APP_ID -g $RESOURCE_GROUP
az role assignment create --role "Avere Contributor" --assignee $APP_ID -g $RESOURCE_GROUP
az role assignment create --role "Network Contributor" --assignee $APP_ID -g $RESOURCE_GROUP
az role assignment create --role "Network Contributor" --assignee $APP_ID -g $NETWORK_RESOURCE_GROUP
APP_PWD_O=$(az ad app credential reset --id $APP_ID --years 1 --display-name dremio-cloud-secret --append --output tsv)
APP_PWD=$(echo $APP_PWD_O | awk '{print $2}')
# Retrieve your tenant ID
TENANT_ID=$(echo $APP_PWD_O | awk '{print $3}')
All relevant information required in the Dremio Cloud console can be obtained using the following two commands:
For Cloud
Cloud Registration Detailsecho -e "CLOUD REGISTRATION DETAILS\n{\n\t\"REGION\": \"$REGION\"\n\t\"SUBSCRIPTION_ID\": \"$SUBSCRIPTION_ID\"\n\t\"APPLICATION_ID\": \"$APP_ID\"\n\t\"CLIENT_SECRET\": \"$APP_PWD\"\n\t\"TENANT_ID\": \"$TENANT_ID\"\n\t\"RESOURCE_GROUP\": \"$RESOURCE_GROUP\"\n}\n\nNETWORK ACCESS DETAILS\n{\n\t\"SUBNET\": \"dremio-cloud-sn-$REGION\"\n\t\"NETWORK_SECURITY_GROUP\": \"dremio-cloud-nsg-$REGION\"\n\t\"VIRTUAL_NETWORK\": \"dremio-cloud-vnet-$REGION\"\n}"
For Project
Project Registration Detailsecho -e "PROJECT REGISTRATION DETAILS\n{\n\t\"PROJECT_STORE\": \"dremio-cloud\"\n\t\"ACCOUNT_NAME\":\"$STORAGE_ACCOUNT\"\n\t\"APPLICATION_ID\":\"$APP_ID\"\n\t\"TENANT_ID\":\"$TENANT_ID\"\n\t\"CLIENT_SECRET\":\"$APP_PWD\"\n}"
Prerequisites for ARM Template-Based Azure Onboarding
Your Azure subscription that will be used to deploy the ARM template must have the following resource providers registered:
-
Microsoft.Compute
-
Microsoft.Network
-
Microsoft.Storage
Refer to Azure documentation on how to register a resource provider.
Enabling Disk Encryption
To protect your data and ensure organizational security and compliance needs, Azure allows disk encryption for the virtual machines (VMs) it launches in your environment. Specifically, Azure enables encryption at host functionality to provide end-to-end encryption of VM data.
To enable disk encryption for your Azure subscription, use the following command:
Azure CLI: Command to enable disk encryption for your Azure subscriptionaz feature register --name EncryptionAtHost --namespace Microsoft.Compute
For more information, follow these steps.
Checking Compute Quota
Sufficient quota needs to be assigned based on workloads and usage estimates as well as Dremio engine requirements.
Make sure your Azure subscription has the quota allocated to launch the required D16d_v5 or D32d_v5 VMs in the supported region that you plan to use, because Dremio supports these two VMs. Use the Ddv5 SKU to set your quota in Azure, and if you need to increase your quota, see Increase VM-family vCPU Quotas.
Also ensure that the type and number of Azure VMs align with the engine size you plan to use. An engine represents a Dremio Cloud entity that manages compute resources. For a query that is submitted to execute on an engine, the control plane assigns an engine replica to that query. An engine replica is a group of Azure VMs defined by the engine capacity.
Refer to the table below for the engine sizes that are mapped to Azure VMs and a fixed number of cores. The engine sizes are shown for one replica.
Engine Size | Number of Azure VMs | Number of Cores |
---|---|---|
XX_SMALL_V1 | 1 Standard_D16d_v5 | 16 |
X_SMALL_V1 | 1 Standard_D32d_v5 | 32 |
SMALL_V1 | 2 Standard_D32d_v5 | 64 |
MEDIUM_V1 | 4 Standard_D32d_v5 | 128 |
LARGE_V1 | 8 Standard_D32d_v5 | 256 |
X_LARGE_V1 | 16 Standard_D32d_v5 | 512 |
XX_LARGE_V1 | 32 Standard_D32d_v5 | 1024 |
XXX_LARGE_V1 | 64 Standard_D32d_v5 | 2048 |
Dremio uses the unutilized quota on the Ddv5 SKUs.
Creating a Resource Group
You must create a new resource group for ARM template-based onboarding. Using a new resource group improves visibility into the resources created by Dremio. For information about creating an Azure resource group, see the Azure Resource Manager documentation.
After you configure an Azure resource group for Dremio Cloud, you cannot alter or delete that resource group from your Azure account, or else all projects associated with the resource group will become unusable.
Registering an Application
Create an Azure app registration for each ARM template deployment. After you create the app registration, make a note of the following information. You will need to provide this information when you add a Sonar project:
-
Tenant ID
-
Application (client) ID
-
Client Secret
-
Object ID
noteMake sure to use the Object ID provided on the application's overview page, not the Object ID listed on the app registrations page.
The ARM template includes definitions to assign the Virtual Machine Contributor and Avere Contributor roles to the registered app.
Granting Permissions
Make sure that the following privileges are assigned to the Azure user ID that you will use to log in to your Azure tenant during deployment using an ARM template. These privileges must be assigned at the Resource Group scope.
-
Owner/Contributor (required for resource creation).
-
RBAC Administrator (required for role assignment on the Service Principal for the app registration.
Wrap-up and Next Steps
- To configure your cloud resources, see Getting Started with Dremio Sonar.
- For additional information, see AWS Resources or Azure Resources.