Azure Storage
The Dremio connector for Azure Storage includes support for the following Azure Storage services:
- Azure Blob Storage is Microsoft's object storage solution for the cloud. Blob Storage is optimized for storing massive amounts of unstructured data, such as text or binary data.
- Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on top of Azure Blob Storage. Features such as file system semantics, directory- and file-level security, and scale are combined with the low-cost, tiered storage and high availability/disaster recovery capabilities of Azure Blob Storage.
Soft delete for blobs is not supported for Azure Storage accounts. Soft delete should be disabled to establish a successful connection.
Zero-byte files created with Iceberg tables in Azure Storage can be safely ignored—they don't impact Dremio's functionality. To prevent these files from being created, enable Hierarchical Namespace on your storage container. See Azure Data Lake Storage Gen2 hierarchical namespace for instructions.
Grant Permissions
To connect to Azure Storage, the OAuth 2.0 application that you created in Azure must have appropriate permissions within the specified Azure Storage account.
To grant these permissions, you can use the built-in Storage Blob Data Contributor role by assigning roles for your storage account:
- In Step 3: Select the appropriate role, assign the Storage Blob Data Contributor role.
- In Step 4: Select who needs access, for Assign access to, select User, group or service principal. For Select Members, select the name of the application/service principal that you previously registered.
Connect to Azure Storage
- In the Dremio console, click Add Data on the Home page.
- In the Add Data dialog, select Azure Storage.
- Configure the connection using the sections below, then click Save.
General
| Section | Field/Option | Description |
|---|---|---|
| Name | Name | Provide a name to use for the connection. The name cannot include the following special characters: /, :, [, or ]. |
| Connection | Account Name | The name of the Azure Storage account from the Azure portal app. |
| Encrypt connection | Enabled by default, this option encrypts network traffic with TLS. Dremio recommends encrypted connections. | |
| Storage Connection Protocol (Driver) | Select the Azure Storage driver connection protocol you would like to use. The options are WASBS (Legacy) and ABFSS (Recommended). ABFSS is the default based on Azure best practices. | |
| Authentication | Shared access key | Select this option to authenticate using the Shared Access Key from the Azure portal app. |
| Microsoft Entra ID | Select this option to use Microsoft Entra ID credentials for authentication. |
Microsoft Entra ID Authentication
To configure the connection to use Microsoft Entra ID for authentication, provide the following values from the OAuth 2.0 application that you created in the Azure portal:
- Application ID – The application (client) ID in Azure.
- OAuth 2.0 Token Endpoint – The OAuth 2.0 token endpoint (v1.0), which includes the tenant ID and is used by the application to get an access token or a refresh token.
- Application Secret – The secret key generated for the application.
Advanced Options
- Enable partition column inference – If a dataset uses Parquet files and the data is partitioned on one or more columns, enabling this option will append a column named
dir<n>for each partition level and use subfolder names for values in those columns. Dremio detects the name of the partition column, appends a column that uses that name, detects values in the names of subfolders, and uses those values in the appended column. - Root Path – The root path for the Azure Storage location. The default root path is /.
- Default CTAS Format – Choose the default format for tables you create in Dremio, either Iceberg or Parquet.
- Advanced Properties – Provide custom key-value pairs for the connection.
- Click Add Property.
- For Name, enter a connection property.
- For Value, enter the corresponding connection property value.
- Blob Containers & Filesystem Allowlist – Add an approved Azure Storage account in the text field. You can add multiple accounts this way. When using this option to add specific accounts, you will only be able to see those accounts and not all accounts that may be available.
Under Cache Options, review the following table and edit the options to meet your needs.
- Enable local caching when possible – Selected by default, along with asynchronous access for cloud caching. Uncheck the checkbox to disable this option. For more information about local caching, see the note below this table.
- Max percent of total available cache space to use when possible – Specifies the disk quota, as a percentage, available for this connection on any single node only when local caching is enabled. The default is 100 percent of the total disk space available. You can either manually enter a percentage in the value field or use the arrows to the far right to adjust the percentage.
Columnar Cloud Cache (C3) enables Dremio to achieve NVMe-level I/O performance on S3/ADLS by leveraging the NVMe/SSD built into cloud compute instances. C3 caches only the data required to satisfy your workloads and can even cache individual microblocks within datasets. If your table has 1,000 columns and you only query a subset of those columns and filter for data within a certain timeframe, C3 will cache only that portion of your table. By selectively caching data, C3 eliminates over 90% of S3/ADLS I/O costs, which can make up 10–15% of the costs for each query you run.
Reflection Refresh
- Never refresh: Select to prevent automatic Reflection refresh; otherwise, the default is to refresh automatically.
- Refresh every: Define how often to refresh Reflections, specified in hours, days, or weeks. This option is ignored if Never refresh is selected.
- Set refresh schedule: Specify the daily or weekly schedule.
- Never expire: Select to prevent Reflections from expiring; otherwise, the default is to expire automatically after the time limit specified in Expire after.
- Expire after: The time limit after which Reflections expire and are removed from Dremio, specified in hours, days, or weeks. This option is ignored if Never expire is selected.
Metadata
Dataset Handling
- Remove dataset definitions if underlying data is unavailable (Default) – When selected, datasets are automatically removed if their underlying files/folders are removed from Azure Storage or are inaccessible. If this option is not selected, Dremio will not remove dataset definitions if underlying files/folders are removed from Azure Storage. This may be useful if files are temporarily deleted and replaced with a new set of files.
- Automatically format files into physical datasets when you issue queries – When selected, Dremio will automatically promote a folder to a table using default options. If you have CSV files, especially with non-default formatting, it might be useful to not select this option.
Metadata Refresh
- Data Discovery – Set the time interval for fetching top-level object names such as databases and tables. You can choose to set the Fetch every frequency to fetch object names in minutes, hours, days, or weeks. The default frequency to fetch object names is one hour.
- Dataset Details – The metadata that Dremio needs for query planning, such as information about fields, types, shards, statistics, and locality. Use these parameters to fetch or expire the metadata:
- Fetch mode – You can choose to fetch only from queried datasets, which is set by default. Dremio updates details for previously queried objects. Fetching from all datasets is deprecated.
- Fetch every – You can choose to set the frequency to fetch dataset details in minutes, hours, days, or weeks. The default frequency to fetch dataset details is one day.
- Expire after – You can choose to set the expiry time of dataset details in minutes, hours, days, or weeks. The default expiry time of dataset details is three days.
Privileges
This connection inherits privileges from Project settings. To grant specific users or roles additional privileges in this connection:
- Enter the username or role name that you want to grant access to and click the Add to Privileges button. The added user or role is displayed in the USERS/ROLES table.
- For the users or roles in the USERS/ROLES table, toggle the checkmark for each privilege you want to grant on the Dremio source that is being created.
- Click Save after setting the configuration.
See Privileges for additional information about privileges.
Edit an Azure Storage Connection
- On the Open Catalog page, under Connections, right-click the connection and select Settings.
- Update the connection configuration as needed.
- Click Save.
Delete an Azure Storage Connection
- On the Open Catalog page, under Connections, right-click the connection and select Delete.
- Click Delete to confirm.