Azure Storage

The Dremio source connector for Azure Storage includes support for the following Azure Storage services:

Azure Blob Storage is Microsoft's object storage solution for the cloud. Blob storage is optimized for storing massive amounts of unstructured data, such as text or binary data.
Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on top of Azure Blob storage. Features, such as file system semantics, directory, and file-level security and scale are combined with the low-cost, tiered storage, and high availability/disaster recovery capabilities of Azure Blob storage.

caution

Soft delete for blobs is not supported for Azure Storage accounts. Soft delete should be disabled to establish a successful connection.

note

If you see 0 byte files being created with your Iceberg tables in your Azure Storage account, these files do not impact Dremio’s functionality and can be ignored if you cannot update your storage container. If you can update your container, see Azure Data Lake Storage Gen2 hierarchical namespace for more information on how to enable Hierarchical Namespace to prevent the creation of these files.

Granting Permissions

In order to use Azure Storage as a data source, the OAuth 2.0 application that you created in Azure must have appropriate permissions within the specified Azure Storage account.

To grant these permissions, you can use the built-in Storage Blob Data Contributor role by assigning roles for your storage account:

In Step 3: Select the appropriate role, assign the Storage Blob Data Contributor role.
In Step 4: Select who needs access, for Assign access to, select User, group or service principal. For Select Members, select the name of the application/service principal that you previously registered.

Configuring Network Access

If you are using Dremio Standard Edition, your Azure Storage account must have public network access enabled from all networks. To allow traffic from all networks:

Log in to the Azure portal and go to your Azure Storage account.
On your Azure Storage account page, select Networking from the left sidebar.
For Public network access, select Enabled from all networks.
Click Save.

If you are using Dremio Enterprise Edition, please contact Dremio Support for advanced networking settings.

Adding an Azure Storage Source

To add an Azure Storage source to your project:

From the Datasets page, click Object Storage at the bottom of the Sources pane.
From the top-right of the page, click the Add object storage button.
In the Add Data Source dialog, under Object Storage, click Azure Storage.

The New Azure Storage Source dialog box appears, which contains the following sections: General, Advanced Options, Reflection Refresh, Metadata, Privileges.

Refer to the following for guidance on how to complete each section.

General

Section	Field/Option	Description
Name	Name	Provide a name to use for this Azure Storage source. The name cannot include the following special characters: `/`, `:`, `[`, or `]`.
Connection	Account Name	The name of the Azure Storage account from the Azure portal app.
	Encrypt connection	Enabled by default, this option encrypts network traffic with TLS. Dremio does not allow this option to be disabled (unchecked).
	Storage Connection Protocol (Driver)	Select the Azure Storage driver connection protocol you would like to use. The options are `WASBS (Legacy)` and `ABFSS (Recommended)`. ABFSS is the default based on Azure best practices.
Authentication	Shared access key	Select this option to authenticate using the Shared Access Key from the Azure portal App.
Authentication	Microsoft Entra ID	Select this option to use Microsoft Entra ID credentials for authentication.

note

Although unencrypted connections are supported when connecting to Azure storage, they are not recommended.

Microsoft Entra ID Authentication

To configure the Azure Storage source to use Microsoft Entra ID for authentication, provide the following values from the OAuth 2.0 application that you created in the Azure portal for this source:

Application ID - The application (client) ID in Azure.
OAuth 2.0 Token Endpoint - The OAuth 2.0 token endpoint (v1.0), which includes the tenant ID and is used by the application in order to get an access token or a refresh token.
Application Secret - The secret key generated for the application.

Advanced Options

Advanced Options include:

Advanced Option	Description
Enable partition column inference	If a source dataset uses Parquet files and the data is partitioned on one or more columns, enabling this option will append a column named `dir<n>` for each partition level and use subfolder names for values in those columns. Dremio detects the name of the partition column, appends a column that uses that name, detects values in the names of subfolders, and uses those values in the appended column.
Root Path	The root path for the Azure Storage location. The default root path is /.
Default CTAS Format	Choose the default format for tables you create in Dremio, either Iceberg or Parquet.
Advanced Properties	Provide the custom key value pairs for the connection relevant to the source. Click Add Property. For Name, enter a connection property. For Value, enter the corresponding connection property value.
Blob Containers & Filesystem Allowlist	Add an approved Azure Storage account in the text field. You can add multiple accounts this way. When using this option to add specific accounts, you will only be able to see those accounts and not all accounts that may be available in the source.

Under Cache Options, review the following table and edit the options to meet your needs.

Cache Options	Description
Enable local caching when possible	Selected by default, along with asynchronous access for cloud caching. Uncheck the checkbox to disable this option. For more information about local caching, see the note below this table.
Max percent of total available cache space to use when possible	Specifies the disk quota, as a percentage, that a source can use on any single node only when local caching is enabled. The default is 100 percent of the total disk space available. You can either manually enter in a percentage in the value field or use the arrows to the far right to adjust the percentage.

note

Columnar Cloud Cache (C3) enables Dremio to achieve NVMe-level I/O performance on S3/ADLS by leveraging the NVMe/SSD built into cloud compute instances. C3 caches only the data required to satisfy your workloads and can even cache individual microblocks within datasets. If your table has 1,000 columns and you only query a subset of those columns and filter for data within a certain timeframe, C3 will cache only that portion of your table. By selectively caching data, C3 eliminates over 90% of S3/ADLS I/O costs, which can make up 10-15% of the costs for each query you run.

Reflection Refresh

These settings define how often Reflections are refreshed and how long data can be served before expiration. To learn more about Reflections, refer to Accelerating Queries with Reflections.

note

All Reflection parameters are optional.

You can set the following refresh policies for Reflections:

Refresh period: Manage the refresh period by either enabling the option to never refresh or setting a refresh frequency in hours, days, or weeks. The default frequency to refresh Reflections is every hour.
Expiration period: Set the expiration period for the length of time that data can be served by either enabling the option to never expire or setting an expiration time in hours, days, or weeks. The default expiration time is set to three hours.

Metadata

Metadata settings include Dataset Handling options and Metadata Refresh options.

Dataset Handling

You can review each option provided in the following table to set up the dataset handling options to meet your needs.

Parameter	Description
Remove dataset definitions if underlying data is unavailable (Default)	When selected, datasets are automatically removed if their underlying files/folders are removed from Azure Storage or if the folder or source are not accessible. If this option is not selected, Dremio will not remove dataset definitions if underlying files/folder are removed from Azure Storage. This may be useful if files are temporarily deleted and replaced with a new set of files.
Automatically format files into physical datasets when you issue queries	When selected, Dremio will automatically promote a folder to a table using default options. If you have CSV files, especially with non-default formatting, it might be useful to not select this option.

Metadata Refresh

The Metadata Refresh parameters include Dataset Details, which is the metadata that Dremio needs for query planning such as information required for fields, types, shards, statistics, and locality. The following table describes the parameters that fetch the dataset information.

Parameter	Description
Fetch mode	You can choose to fetch only from queried datasets that are set by default. Dremio updates details for previously queried objects in a source. Fetching from all datasets is deprecated.
Fetch every	You can choose to set the frequency to fetch dataset details in minutes, hours, days, or weeks. The default frequency to fetch dataset details is one day.
Expire after	You can choose to set the expiry time of dataset details in minutes, hours, days, or weeks. The default expiry time of dataset details is three days.

note

All metadata parameters are optional.

Privileges

This section lets you grant privileges on the source to specific users or roles. To learn more about how Dremio allows for the implementation of granular-level privileges, see Privileges.

note

All privileges parameters are optional.

To add a privilege for a user or to a role:

In the Add User/Role field, enter the user or role to which you want to apply privileges.
Click Add to Privileges. The user or role is added to the Users table.

To set privileges for a user or role:

In the Users table, identify the user to set privileges for and click under the appropriate column (Select, Alter, Create Table, etc.) to either enable or disable that privilege. A green checkmark indicates that the privilege is enabled.
Click Save.

Granting Permissions​

Configuring Network Access​

Adding an Azure Storage Source​

General​

Microsoft Entra ID Authentication​

Advanced Options​

Reflection Refresh​

Metadata​

Dataset Handling​

Metadata Refresh​

Privileges​

Granting Permissions

Configuring Network Access

Adding an Azure Storage Source

General

Microsoft Entra ID Authentication

Advanced Options

Reflection Refresh

Metadata

Dataset Handling

Metadata Refresh

Privileges