Skip to main content

Prerequisites for Configuring Cloud Resources

Before running queries with Dremio Sonar, you must first provision the required resources in your cloud provider. You can then configure these cloud resources in the next step when adding your Sonar project.

See the prerequisites for your cloud provider: AWS or Azure.

AWS Prerequisites

For the configuration, you will need to address the following prerequisites:

Connecting Your AWS Account

If you don't have access to an AWS account with the required permissions, you can sign up for an AWS Free Tier account at https://aws.amazon.com/free/.

Selecting a VPC and Subnets

You can use an existing VPC and subnets, although if you don't have a VPC that meets the networking requirements, then you will need to create one. For steps, see Create a VPC and Subnets.

See the following guidelines for selecting subnets:

  • Specify only private subnets or only public subnets. Mixing private and public subnets is not supported.
  • Ensure that each subnet that you specify belongs to a separate availability zone. For example, if you specify subnet A and subnet B, they cannot both be in availability zone C, but they must be in separate availability zones.
  • Ensure that subnet IDs are unique across all of the availability zones within a VPC.

Establishing Outbound Connectivity

Outbound connectivity is required to allow query engines to communicate with Dremio Cloud. Engines establish a connection with the Dremio Cloud control plane using port 443 (HTTPS). No other open ports are required in your VPC.

To establish this connection, you can use an internet gateway with a public IP address, NAT gateway, or AWS PrivateLink. If your VPC has internet connectivity, you can securely connect to the Dremio Cloud control plane via the internet gateway or NAT gateway. However, we recommend using PrivateLink as it provides secure connectivity to the Dremio Cloud control plane and also improves the overall security posture as it does not require the VPC to have internet connectivity. In addition, we provide a CloudFormation template to simplify the provisioning of the PrivateLink.

Verify Connectivity

Before getting started with Dremio Sonar, verify outbound connectivity from your subnets by running the following command from an EC2 instance in each subnet:

Command to verify connectivity
curl -v https://gw.dremio.cloud

Granting Permissions

You must also grant Dremio some permissions on your VPC. You can choose either to:

  • Grant them automatically by launching the CloudFormation template (CFT) from your AWS user account. The CFT is the recommended method, because the CFT will create the required resources for you. For a breakdown of the resources that will be created and the permissions that will be granted, see the annotated CloudFormation template.

  • Grant them manually by following the steps listed on Configuring Cloud Resources Manually if you prefer to create the required resources and Sonar project manually. For a breakdown of the resources that you will need to create, see Required Resources.

To use either method, you must have permissions to create the following required resources and grant the required access:

Required Resources

If you choose to configure the cloud resources manually, you will need to create the following required resources. Otherwise, the CFT will create the required resources for you.

S3 Bucket

An encrypted S3 bucket is used for the project store that stores various types of project data, including:

  • The data for reflections that are created in the project
  • The default path for new tables that are used for data and manifests for datasets
  • All of the tables that store records of events and other historical data
Security Group

A security group acts as a virtual firewall to control the traffic that is allowed to and from your resources, ensuring that only traffic from Dremio Cloud reaches the resources that you have allocated for your Dremio Cloud organization.

Outbound Rule

An outbound rule allows EC2 instances to connect to Dremio’s control plane by using TLS. For example, if the VPC for your organization is running in AWS, Dremio’s control plane deploys compute engines as AWS EC2 instances within your VPC.

IAM Role or IAM User

An IAM role is an IAM identity that you can create in your account that has specific permissions. In this case, the IAM roles are granted permissions on the resources that you specify for your Dremio Cloud organization, and these roles are assigned to Dremio Cloud.

An IAM user is an entity that you create in AWS to represent the person or application that uses it to interact with AWS. A user in AWS consists of a name and credentials. In this case, Dremio Cloud is given the access key ID and secret access key as credentials for connecting to your VPC to access the resources that you give it permission to use.

Policy Template to Grant Access to the Project Store

The following policy template is the minimum policy requirement to allow read and write access to the project store. It grants Dremio Cloud permissions, through IAM roles or IAM users, for storing metadata and views for the project in an S3 bucket in your Amazon VPC. The permissions are described in comments in the template. Replace BUCKET-NAME with the name of the S3 bucket you want to use as the Dremio Cloud project store:

Template for the Policy JSON
{
"Version": "2012-10-17",
"Statement": [
# Allow Dremio to enumerate S3 buckets within the account.
{
"Effect": "Allow",
"Action": [
"s3:ListAllMyBuckets"
],
"Resource": "arn:aws:s3:::*"
},
# Allow Dremio R/W access to the Project Store bucket used to store housekeeping information such as metadata and reflections.
{
"Effect": "Allow",
"Action": [
"s3:DeleteObject",
"s3:GetObject",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::BUCKET-NAME/*"
]
},
# Allow Dremio to determine the region, list content and add tags on the Project Store bucket.
{
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:PutBucketTagging"
],
"Resource": [
"arn:aws:s3:::BUCKET-NAME"
]
},
# Allow Dremio read access to sample datasets used to get users started easily on the platform without connecting their own data.
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::ap-southwest-1.examples.dremio.com",
"arn:aws:s3:::eu-west-1.examples.dremio.com",
"arn:aws:s3:::us-east-1.examples.dremio.com",
"arn:aws:s3:::us-west-1.examples.dremio.com",
"arn:aws:s3:::us-west-2.examples.dremio.com"
]
}
]
}
Policy Template for Enabling Dremio Cloud to Manage Engines

The following policy enables Dremio Cloud to create and manage engines in your VPC. The permissions are described in comments in the template:

Template for the Policy JSON
{
"Version": "2012-10-17",
"Statement": [
# Allow Dremio to terminate instances with the "dremio_managed" tag.
{
"Effect": "Allow",
"Action": "ec2:TerminateInstances",
"Resource": "arn:aws:ec2:*:*:instance/*",
"Condition": {
"StringEquals": {
"ec2:ResourceTag/dremio_managed": "true"
}
}
},
# Require the "dremio_managed" tag for instances/volumes when creating instances.
{
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": [
"arn:aws:ec2:*:*:volume/*",
"arn:aws:ec2:*:*:instance/*"
],
"Condition": {
"StringEquals": {
"aws:RequestTag/dremio_managed": "true"
}
}
},
# Allow creating instances without the "dremio_managed" tag on resources other than instances/volumes.
{
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": [
"arn:aws:ec2:*:*:launch-template/*",
"arn:aws:ec2:*:*:fleet/*",
"arn:aws:ec2:*::image/*",
"arn:aws:ec2:*:*:network-interface/*",
"arn:aws:ec2:*:*:security-group/*",
"arn:aws:ec2:*:*:subnet/*",
"arn:aws:ec2:*:*:placement-group/*"
]
},
# Allow Dremio to create tags on instances/volumes only upon the initial creation of an instance.
{
"Effect": "Allow",
"Action": "ec2:CreateTags",
"Resource": [
"arn:aws:ec2:*:*:instance/*",
"arn:aws:ec2:*:*:volume/*"
],
"Condition": {
"StringEquals": {
"ec2:CreateAction": "RunInstances"
}
}
},
# Allow Dremio to create tags on placement groups (PG) upon the initial creation of a PG.
{
"Effect": "Allow",
"Action": "ec2:CreateTags",
"Resource": "arn:aws:ec2:*:*:placement-group/*",
"Condition": {
"StringEquals": {
"ec2:CreateAction": "CreatePlacementGroup"
}
}
},
# Allow Dremio to create tags on a launch template (LT) upon the initial creation of a LT.
{
"Effect": "Allow",
"Action": "ec2:CreateTags",
"Resource": "arn:aws:ec2:*:*:launch-template/*",
"Condition": {
"StringEquals": {
"ec2:CreateAction": "CreateLaunchTemplate"
}
}
},
# Allow Dremio to create tags on a fleet upon the initial creation of the fleet.
{
"Effect": "Allow",
"Action": "ec2:CreateTags",
"Resource": "arn:aws:ec2:*:*:fleet/*",
"Condition": {
"StringEquals": {
"ec2:CreateAction": "CreateFleet"
}
}
},
# Allow Dremio to create fleet only when including the "dremio_managed" tag.
{
"Effect": "Allow",
"Action": "ec2:CreateFleet",
"Resource": "arn:aws:ec2:*:*:fleet/*",
"Condition": {
"StringEquals": {
"aws:RequestTag/dremio_managed": "true"
}
}
},
# Allow Dremio to create fleet with other resources without the "dremio_managed" tag.
{
"Effect": "Allow",
"Action": "ec2:CreateFleet",
"Resource": [
"arn:aws:ec2:*:*:instance/*",
"arn:aws:ec2:*:*:image/*",
"arn:aws:ec2:*:*:launch-template/*",
"arn:aws:ec2:*:*:network-interface/*",
"arn:aws:ec2:*:*:placement-group/*",
"arn:aws:ec2:*:*:security-group/*",
"arn:aws:ec2:*:*:subnet/*"
]
},
# Only allow Dremio to delete fleets with the "dremio_managed" tag.
{
"Effect": "Allow",
"Action": "ec2:DeleteFleets",
"Resource": "arn:aws:ec2:*:*:fleet/*",
"Condition": {
"StringEquals": {
"ec2:ResourceTag/dremio_managed": "true"
}
}
},
# Allow Dremio to create a launch template.
{
"Effect": "Allow",
"Action": "ec2:CreateLaunchTemplate",
"Resource": "arn:aws:ec2:*:*:launch-template/*"
},
# Only allow Dremio to delete launch templates with the "dremio_managed" tag.
{
"Effect": "Allow",
"Action": "ec2:DeleteLaunchTemplate",
"Resource": "arn:aws:ec2:*:*:launch-template/*",
"Condition": {
"StringEquals": {
"ec2:ResourceTag/dremio_managed": "true"
}
}
},
# Allow Dremio to describe fleets with the "dremio_managed" tag.
{
"Effect": "Allow",
"Action": "ec2:DescribeFleets",
"Resource": "arn:aws:ec2:*:*:fleet/*",
"Condition": {
"StringEquals": {
"ec2:ResourceTag/dremio_managed": "true"
}
}
},
# Only allow Dremio to delete placement groups with the "dremio_managed" tag.
{
"Effect": "Allow",
"Action": "ec2:DeletePlacementGroup",
"Resource": "arn:aws:ec2:*:*:placement-group/*",
"Condition": {
"StringEquals": {
"ec2:ResourceTag/dremio_managed": "true"
}
}
},
# Allow Dremio to create a placement group.
{
"Effect": "Allow",
"Action": "ec2:CreatePlacementGroup",
"Resource": "arn:aws:ec2:*:*:placement-group/*"
},
# Allow Dremio to enumerate resources in the account.
{
"Effect": "Allow",
"Action": [
"ec2:DescribeImages",
"ec2:DescribeLaunchTemplateVersions",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeVpcs",
"ec2:DescribeSubnets",
"ec2:DescribeTags",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInstances",
"ec2:DescribeInstanceStatus",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeNetworkInterfaceAttribute",
"ec2:DescribePlacementGroups",
"ec2:DescribeSecurityGroups",
"ec2:DescribeVpcEndpoints",
"ec2:DescribeVolumes"
],
"Resource": "*"
},
# This section appears only if you chose to create a cross-account IAM role in the previous step.
{
"Effect": "Allow",
"Action": [
"iam:PassRole",
"sts:AssumeRole"
],
"Resource": [
"<Role ARN from Step 1: Configure Storage Settings>"
]
}
]
}

Azure Prerequisites

Before connecting your Azure account to Dremio Cloud, you must provision appropriate resources within your subscription. The prerequisites differ depending on whether you use the manual or template-based onboarding.

  • Copy the following IDs and save them in a location where they can be retrieved (template and manual connection methods):
    • Tenant ID from the Microsoft Entra ID
    • Subscription ID from the Subscriptions page

Prerequisites for Manual Azure Onboarding

note

Your Azure subscription that will be used to deploy the ARM template must have the following resource providers registered:

  • Microsoft.Compute

  • Microsoft.Network

  • Microsoft.Storage

Refer to Azure documentation on how to register a resource provider.

For manual onboarding, you can complete the prerequisites by provisioning with the Azure CLI if desired.

Enabling Disk Encryption

To protect your data and ensure organizational security and compliance needs, Azure allows disk encryption for the virtual machines (VMs) it launches in your environment. Specifically, Azure enables encryption at host functionality to provide end-to-end encryption of VM data.

To enable disk encryption for your Azure subscription, use the following command:

Azure CLI: Command to enable disk encryption for your Azure subscription
az feature register --name EncryptionAtHost  --namespace Microsoft.Compute

For more information, follow these steps.

Checking Compute Quota

Sufficient quota needs to be assigned based on workloads and usage estimates as well as Dremio engine requirements.

Make sure your Azure subscription has the quota allocated to launch the required D16d_v5 or D32d_v5 VMs in the supported region that you plan to use, because Dremio supports these two VMs. Use the Ddv5 SKU to set your quota in Azure, and if you need to increase your quota, see Increase VM-family vCPU Quotas.

Also ensure that the type and number of Azure VMs align with the engine size you plan to use. An engine represents a Dremio Cloud entity that manages compute resources. For a query that is submitted to execute on an engine, the control plane assigns an engine replica to that query. An engine replica is a group of Azure VMs defined by the engine capacity.

Refer to the table below for the engine sizes that are mapped to Azure VMs and a fixed number of cores. The engine sizes are shown for one replica.

Engine SizeNumber of Azure VMsNumber of Cores
XX_SMALL_V11 Standard_D16d_v516
X_SMALL_V11 Standard_D32d_v532
SMALL_V12 Standard_D32d_v564
MEDIUM_V14 Standard_D32d_v5128
LARGE_V18 Standard_D32d_v5256
X_LARGE_V116 Standard_D32d_v5512
XX_LARGE_V132 Standard_D32d_v51024
XXX_LARGE_V164 Standard_D32d_v52048
note

Dremio uses the unutilized quota on the Ddv5 SKUs.

note

Azure images for Dremio Cloud executors are built on the Ubuntu 22.04 Linux distribution.

Creating or Using a Resource Group

You can use any previously created resource groups, but Dremio recommends creating a separate resource group, which will simplify the visibility into the resources created by Dremio. For creating or managing Azure resource groups, see the Azure Resource Manager.

caution

After you configure an Azure resource group for Dremio Cloud, you cannot alter or delete that resource group from your Azure account, or else all projects associated with the resource group will become unusable.

When adding a Dremio cloud for Azure, you can specify the following resource groups:

Compute Resource Group(Optional) Network Resource Group
This is the default resource group where Dremio Engines will be scaled, and the required network resources (as described in this prerequistes) would need to exist in this resource group.This allows you to specify a separate resource group (distinct from the Compute Resource Group) focused on networking resources.

This will allow you to have Cloud configurations as, for example:

Cloud ConfigurationCompute Resource GroupNetwork Resource Group
Using Compute Resource Group Only
  • The Virtual Machine Scale Set and Proximity Placement Group are managed by Dremio
  • The Network Security Group, subnet, Virtual Network, and Private Endpoints are created before onboarding.
N/A
Using Compute and Network Resource Groups
  • The Virtual Machine Scale Set and Proximity Placement Group are managed by Dremio
  • The Network Security Group, subnet, Virtual Network, and Private Endpoints are created before onboarding.

So, depending on the configuration you want for your Dremio cloud, you will have to specify the following in Azure:

  • A Compute Resource Group.
    or
  • A Compute Resource Group and a Network Resource Group.

You will also have to grant permissions to the App-Registrations you plan to use to onboard onto Dremio in your Azure account at these resource group scopes. For more information, see the section Registering an Application on this page.

Creating or Using a VNet and Subnets

Outbound connectivity from your Azure VNet and subnets is required to allow query engines to communicate with Dremio Cloud. Engines establish a connection with the Dremio Cloud control plane using port 443 (HTTPS) outbound to the internet. No open ports are required in your Azure subnet, and neither subnets for incoming connections nor engines require public IP addresses.

  • Copy the names (not the resource IDs) of the VNet and subnets, and save them in a location where they can be retrieved.

  • Ensure that there are at least five IP addresses available in the subnets. If you configure an engine to use more than one replica, the engine will autoscale based on the load, and the number of available IP addresses in the subnets must support the scale.

  • It is recommended for the subnet size to be 10.0.0.0/16 (65536 addresses).

The Virtual Network (VNet) and Subnet must be in the correct resource group, depending on the resource configuration of your Dremio cloud (see the section in this topic about Creating or Using a Resource Group):

The Virtual Network (VNet) and Subnet must be in the Compute resource group.

Creating or Using a Network Security Group

A network security group (NSG) is required for connecting your Azure account and it must have internet access to communicate with the Dremio control plane and access the Azure storage account for storing metadata. You can decide either to:

  • Associate the VNet with a NSG before connecting your Azure account (as a prerequisite), because you will not need to provide the NSG as part of the process when connecting your account. To do so, copy the name (not the resource ID) of the NSG and save it in a location where it can be retrieved.

  • Provide the NSG as part of the process when connecting your Azure account, and Dremio would associate the NSG with the network interface card (NIC) created during VM creation. If a NSG is provided, then the provided one will be applied to the NIC associated with the VMSS. This means that there will be two different NSGs: one at the VNet level and another at NIC level.

The Network Security Group must be in the correct resource group, depending on the resource configuration of your Dremio cloud (see the section in this topic about Creating or Using a Resource Group):

The Network Security Group must be in the Compute resource group.

Network Security Group Rules

In the subnet, Dremio executors communicate with one another, and in order to return results to the Dremio control plane, executors require outbound connectivity back to the Dremio control plane. See the following advised rules for the outbound security rules:

Inbound & Outbound Network Security Rules (inclusive of Azure default rules)

Inbound/OutboundPriorityPort[s]ProtocolSourceDestinationAllow/Deny
Inbound65000AnyAnyVirtualNetworkVirtualNetworkAllow
65000AnyAnyAzureLoadBalancerAnyAllow
65500AnyAnyAnyAnyDeny
Outbound100443TCPAnyAnyAllow
4096AnyAnyAnyInternetDeny
65000AnyAnyVirtualNetworkVirtualNetworkAllow
65001AnyAnyAnyInternetAllow
65500AnyAnyAnyAnyDeny

Registering an Application

You will need to register a new application or use an existing application with a service principal and then grant it the necessary permissions (as detailed below).

To create an app registration, follow these steps to register an application within your Azure tenant:

  1. For Name, enter a name for the application. Then copy the name and save it in a location where it can be retrieved.

  2. For Supported Account Types, select the Single Tenant option.

  3. Do not specify a redirect URL.

  4. Click Register.

  5. Copy the Application (client) ID and save it in a location where it can be retrieved.

  6. Add a client secret and save it in a location where it can be retrieved.

Granting Permissions

Dremio needs permissions to create Virtual Machine Scale Sets (VMSS) and manage Virtual Machines (VM). To grant these permissions, you can either use the Azure built-in roles or create custom roles, and the way you do it will depend on the the configuration you have for the resource groups (see the section in this topic about Creating or Using a Resource Group).

If you're only using a Compute Resourse Group (and no Network Resource Group), the built-in roles are:

  • Virtual Machine Contributor
  • Avere Contributor

Below is a comparison table for you to decide which built-in roles to use for the application(s) based on the permissions assigned to them:

RoleComputeStorageCompute Resource GroupCommentsBest Practice
Virtual Machine Contributor
+
Avere Contributor
YesYesVirtual Machine Contributor
+
Avere Contributor
Single Azure app-registration with both roles assigned can be used for compute and storageUse a single Azure app-registration with both roles assigned at resource group scope

OR

Use two Azure app-registrations:
  • App1 for compute: Azure app-registration with both roles assigned at resource group scope
  • App2 for storage: Azure app-registration with Avere Contributor role assigned at storage account scope
Virtual Machine ContributorNo
(Virtual Machine Contributor role does not have proximity placement group permissions)
NoVirtual Machine ContributorSingle Azure app-registration with only the Virtual Machine Contributor role will not work for compute
Avere ContributorNo
(Avere Contributor role does not have VMSS permissions)
YesAvere ContributorSingle Azure app-registration with only the Avere Contributor role will not work for compute but will work for storage

How to Assign Roles to App-registrations

To grant permissions to Dremio for creating and managing compute resources, follow these steps to assign roles using the Azure portal with the service principal in your resource group that you created specifically for Dremio:

  1. For Step 3: Select the appropriate role, assign the Virtual Machine Contributor role.
    But, if you want to use a tailored set of permissions, create a custom role (expand and use the code below), and select that role instead of the Virtual Machine Contributor role.

    (Optional) Create a custom role for creating and managing compute resources
    JSON code with the Compute policy for creating a custom role
    {
    "properties": {
    "roleName": "<role_name>",
    "description": "Custom role for Dremio Cloud to manage compute resources",
    "assignableScopes": [
    "/subscriptions/<subscription-id>/resourceGroups/<resource-group-name>"
    ],
    "permissions": [
    {
    "actions": [

    # Allow Dremio to deallocate virtual machine scale sets
    "Microsoft.Compute/virtualMachineScaleSets/deallocate/action",

    # Allow Dremio to read, write, and delete the properties of virtual machine scale sets
    "Microsoft.Compute/virtualMachineScaleSets/delete",
    "Microsoft.Compute/virtualMachineScaleSets/write",
    "Microsoft.Compute/virtualMachineScaleSets/read",

    # Allow Dremio to get the properties of a virtual machine scale set SKU
    "Microsoft.Compute/virtualMachineScaleSets/skus/read",

    # Allow Dremio to read, dellocate, and delete virtual machines
    "Microsoft.Compute/virtualMachineScaleSets/virtualMachines/read",
    "Microsoft.Compute/virtualMachineScaleSets/virtualMachines/deallocate/action",
    "Microsoft.Compute/virtualMachineScaleSets/virtualMachines/delete",

    # Allow Dremio to read network interfaces
    "Microsoft.Compute/virtualMachineScaleSets/virtualMachines/networkInterfaces/read",
    "Microsoft.Compute/virtualMachineScaleSets/virtualMachines/networkInterfaces/ipConfigurations/read",

    # Allow Dremio to read, write, and delete disks
    "Microsoft.Compute/disks/write",
    "Microsoft.Compute/disks/read",
    "Microsoft.Compute/disks/delete",

    # Allow Dremio to read, write, and delete proximity placement groups
    "Microsoft.Compute/proximityPlacementGroups/write",
    "Microsoft.Compute/proximityPlacementGroups/read",
    "Microsoft.Compute/proximityPlacementGroups/delete",

    # Allow Dremio to read galleries and images
    "Microsoft.Compute/galleries/read",
    "Microsoft.Compute/galleries/images/read",
    "Microsoft.Compute/galleries/images/versions/read",

    # Join an application gateway backend address pool
    "Microsoft.Network/applicationGateways/backendAddressPools/join/action",

    # Allow Dremio to create and manage network interfaces
    "Microsoft.Network/networkInterfaces/join/action",

    # Join a network security group
    "Microsoft.Network/networkSecurityGroups/join/action",

    # Get a network security group definition
    "Microsoft.Network/networkSecurityGroups/read",

    # Get a virtual network definition
    "Microsoft.Network/virtualNetworks/read",

    # Join a virtual network
    "Microsoft.Network/virtualNetworks/subnets/join/action",

    # Get the resources for the resource group
    "Microsoft.Resources/subscriptions/resourceGroups/read",

    # Connect to a serial port
    "Microsoft.SerialConsole/serialPorts/connect/action"
    ],
    "notActions": [],
    "dataActions": [],
    "notDataActions": []
    }
    ]
    }
    }
  2. In Step 4: Select who needs access, for Assign access to, select user, group or service principal. For Select Members, select the (exact) name of the application/service principal that you registered before.

  3. Click Review + assign.

  4. Similar to the Virtual Machine Contributor role, also assign the Avere Contributor role to the same application.

Creating a Project Store

Azure uses a project store for storing metadata and Dremio reflections. You must create a container in Azure Storage and grant Azure permissions to manage data within that container.

To grant permissions to Azure Storage for storing metadata, complete the following steps:

  1. Create a storage account preferably within the same resource group specifically created for Azure.

    a. (Optional) On the Advanced tab, you can Enable hierarchical namespace. More info can be found here.

    b. On the Data Protection tab, disable the Enable soft delete for blobs permission.

    c. On the Encryption tab, for Encryption Type, choose Microsoft-managed key (MMK).

    d. Copy the project store name and save it in a location where it can be retrieved.

  2. Create a container within that storage account and copy the name of the project store (container) in a location where it can be retrieved.

    note

    The project store must be Azure Data Lake Storage (ADLS) Gen2 storage.

  3. (Optional) Register a new application similar to the one created for compute resources or use the same application created for compute resources.

    a. If registering a new application, add a client secret. Then copy the Application (client) ID and client secret and save them in a location where they can be retrieved.

  4. Assign roles in your resource group that you created specifically for Azure.

    a. In Step 3: Select the appropriate role, assign the Avere Contributor role.
    But, if you want to use a tailored set of permissions, create a custom role (expand and use the code below), and select that role instead of the Avere Contributor role.

    Code example for creating a custom role for storing metadata
    Code example for creating a custom role for storing metadata
    {
    "properties": {
    "roleName": "<role_name>",
    "description": "<role_description>",
    "assignableScopes": [
    "/subscriptions/<subscription>/resourceGroups/<resource-group>/providers/Microsoft.Storage/storageAccounts/
    <storage-account>" (Storage)
    ],
    "permissions": [
    {
    "actions": [
    # Return blob service properties or statistics
    "Microsoft.Storage/storageAccounts/blobServices/read",

    # Delete a container
    "Microsoft.Storage/storageAccounts/blobServices/containers/delete",

    # Return a container or a list of containers
    "Microsoft.Storage/storageAccounts/blobServices/containers/read",

    # Modify the metadata or properties of a container
    "Microsoft.Storage/storageAccounts/blobServices/containers/write",

    # Return a user delegation key for the blob service
    "Microsoft.Storage/storageAccounts/blobServices/generateUserDelegationKey/action",
    ],
    "notActions": [],
    "dataActions": [
    # Delete a blob
    "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete",

    # Return a blob or a list of blobs
    "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",

    # Write to a blob
    "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write"
    ],
    "notDataActions": []
    }
    ]
    }
    }

    b. In Step 4: Select who needs access, for Assign access to, select user, group or service principal. For Select Members, select the (exact) name of the application/service principal that you registered before.

Provisioning with the Azure CLI

The prerequisites can also be created through the Azure CLI. The following instructions provision all the required resources according to best practices.

Create Resources Using Azure CLI
# Create local variables for your subscription, region, resource group name, and storage account name. Note: The East US and West Europe regions are supported.
SUBSCRIPTION_ID=<SUBSCRIPTION_ID>
REGION=<Region_ID>
RESOURCE_GROUP=<RESOURCE_GROUP>
STORAGE_ACCOUNT=<STORAGE_ACCOUNT>

# Log into your Azure account and set the subscription
az login
az account set --subscription $SUBSCRIPTION_ID

# Enable Disk Encryption
az feature register --name EncryptionAtHost --namespace Microsoft.Compute

# Create your Dremio resource group
az group create -l $REGION -n $RESOURCE_GROUP

# Create a virtual network, subnet and network security group
az network vnet create -g $RESOURCE_GROUP -n dremio-cloud-vnet-$REGION --address-prefix 10.0.0.0/16 --subnet-name dremio-cloud-sn-$REGION --subnet-prefix 10.0.0.0/16
az network nsg create -g $RESOURCE_GROUP -n dremio-cloud-nsg-$REGION

# Create a storage account and container with hierarchical namespace enabled and disable soft deletes. Then retrieve the account access key.
az storage account create -n $STORAGE_ACCOUNT -g $RESOURCE_GROUP -l $REGION --sku Standard_GRS --enable-hierarchical-namespace true --min-tls-version TLS1_2 --allow-blob-public-access false --https-only true
az storage blob service-properties delete-policy update --account-name $STORAGE_ACCOUNT --auth-mode login --enable false
az storage container create --name dremio-cloud --account-name $STORAGE_ACCOUNT --auth-mode login

# Create an application, grant the appropriate access then create and retrieve a key secret
az ad app create --display-name dremio-cloud-$REGION
APP_ID=$(az ad app list --display-name dremio-cloud-$REGION --query "[].[appId]" --output tsv)
az ad sp create --id $APP_ID
az role assignment create --role "Virtual Machine Contributor" --assignee $APP_ID --scope /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP
az role assignment create --role "Avere Contributor" --assignee $APP_ID --scope /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP
APP_PWD_O=$(az ad app credential reset --id $APP_ID --years 1 --display-name dremio-cloud-secret --append --output tsv)
APP_PWD=$(echo $APP_PWD_O | awk '{print $2}')

# Retrieve your tenant ID
TENANT_ID=$(echo $APP_PWD_O | awk '{print $3}')

All relevant information required in the Dremio Cloud console can be obtained using the following two commands:

For Cloud

Cloud Registration Details
echo -e "CLOUD REGISTRATION DETAILS\n{\n\t\"REGION\": \"$REGION\"\n\t\"SUBSCRIPTION_ID\": \"$SUBSCRIPTION_ID\"\n\t\"APPLICATION_ID\": \"$APP_ID\"\n\t\"CLIENT_SECRET\": \"$APP_PWD\"\n\t\"TENANT_ID\": \"$TENANT_ID\"\n\t\"RESOURCE_GROUP\": \"$RESOURCE_GROUP\"\n}\n\nNETWORK ACCESS DETAILS\n{\n\t\"SUBNET\": \"dremio-cloud-sn-$REGION\"\n\t\"NETWORK_SECURITY_GROUP\": \"dremio-cloud-nsg-$REGION\"\n\t\"VIRTUAL_NETWORK\": \"dremio-cloud-vnet-$REGION\"\n}"

For Project

Project Registration Details
echo -e "PROJECT REGISTRATION DETAILS\n{\n\t\"PROJECT_STORE\": \"dremio-cloud\"\n\t\"ACCOUNT_NAME\":\"$STORAGE_ACCOUNT\"\n\t\"APPLICATION_ID\":\"$APP_ID\"\n\t\"TENANT_ID\":\"$TENANT_ID\"\n\t\"CLIENT_SECRET\":\"$APP_PWD\"\n}"

Prerequisites for ARM Template-Based Azure Onboarding

note

Your Azure subscription that will be used to deploy the ARM template must have the following resource providers registered:

  • Microsoft.Compute

  • Microsoft.Network

  • Microsoft.Storage

Refer to Azure documentation on how to register a resource provider.

Enabling Disk Encryption

To protect your data and ensure organizational security and compliance needs, Azure allows disk encryption for the virtual machines (VMs) it launches in your environment. Specifically, Azure enables encryption at host functionality to provide end-to-end encryption of VM data.

To enable disk encryption for your Azure subscription, use the following command:

Azure CLI: Command to enable disk encryption for your Azure subscription
az feature register --name EncryptionAtHost  --namespace Microsoft.Compute

For more information, follow these steps.

Checking Compute Quota

Sufficient quota needs to be assigned based on workloads and usage estimates as well as Dremio engine requirements.

Make sure your Azure subscription has the quota allocated to launch the required D16d_v5 or D32d_v5 VMs in the supported region that you plan to use, because Dremio supports these two VMs. Use the Ddv5 SKU to set your quota in Azure, and if you need to increase your quota, see Increase VM-family vCPU Quotas.

Also ensure that the type and number of Azure VMs align with the engine size you plan to use. An engine represents a Dremio Cloud entity that manages compute resources. For a query that is submitted to execute on an engine, the control plane assigns an engine replica to that query. An engine replica is a group of Azure VMs defined by the engine capacity.

Refer to the table below for the engine sizes that are mapped to Azure VMs and a fixed number of cores. The engine sizes are shown for one replica.

Engine SizeNumber of Azure VMsNumber of Cores
XX_SMALL_V11 Standard_D16d_v516
X_SMALL_V11 Standard_D32d_v532
SMALL_V12 Standard_D32d_v564
MEDIUM_V14 Standard_D32d_v5128
LARGE_V18 Standard_D32d_v5256
X_LARGE_V116 Standard_D32d_v5512
XX_LARGE_V132 Standard_D32d_v51024
XXX_LARGE_V164 Standard_D32d_v52048
note

Dremio uses the unutilized quota on the Ddv5 SKUs.

Creating a Resource Group

You must create a new resource group for ARM template-based onboarding. Using a new resource group improves visibility into the resources created by Dremio. For information about creating an Azure resource group, see the Azure Resource Manager documentation.

caution

After you configure an Azure resource group for Dremio Cloud, you cannot alter or delete that resource group from your Azure account, or else all projects associated with the resource group will become unusable.

Registering an Application

Create an Azure app registration for each ARM template deployment. After you create the app registration, make a note of the following information. You will need to provide this information when you add a Sonar project:

  • Tenant ID

  • Application (client) ID

  • Client Secret

  • Object ID

    note

    Make sure to use the Object ID provided on the application's overview page, not the Object ID listed on the app registrations page.

    Location of the Object ID to use.

The ARM template includes definitions to assign the Virtual Machine Contributor and Avere Contributor roles to the registered app.

Granting Permissions

Make sure that the following privileges are assigned to the Azure user ID that you will use to log in to your Azure tenant during deployment using an ARM template. These privileges must be assigned at the Resource Group scope.

  • Owner/Contributor (required for resource creation).

  • RBAC Administrator (required for role assignment on the Service Principal for the app registration.

Wrap-up and Next Steps