Azure Storage
The Dremio source connector for Azure Storage includes support for the following Azure Storage services:
-
Azure Blob Storage is Microsoft's object storage solution for the cloud. Blob storage is optimized for storing massive amounts of unstructured data, such as text or binary data.
-
Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on top of Azure Blob storage, and converges the capabilities of Azure Blob Storage and Azure Data Lake Storage Gen1. Features from Azure Data Lake Storage Gen1, such as file system semantics, directory, and file level security and scale are combined with the low-cost, tiered storage, high availability/disaster recovery capabilities of Azure Blob storage.
Soft delete for blobs is not supported for Azure Storage accounts. Soft delete should be disabled to establish a successful connection.
If you see 0 byte files being created with your Iceberg tables in your Azure Storage account, these files do not impact Dremio’s functionality and can be ignored if you cannot update your storage container. If you can update your container, see Azure Data Lake Storage Gen2 hierarchical namespace for more information on how to enable Hierarchical Namespace to prevent the creation of these files.
Granting Permissions
In order to use Azure Storage as a data source, the OAuth 2.0 application that you created in Azure must have appropriate permissions within the specified Azure Storage account.
To grant these permissions, you can use the built-in Storage Blob Data Contributor role by assigning roles for your storage account:
-
In Step 3: Select the appropriate role, assign the Storage Blob Data Contributor role.
-
In Step 4: Select who needs access, for Assign access to, select User, group or service principal. For Select Members, select the name of the application/service principal that you previously registered.
Configuring Network Access
If you are using Dremio Standard Edition, your Azure Storage account must have public network access enabled from all networks. To allow traffic from all networks:
-
Log in to the Azure portal and go to your Azure Storage account.
-
On your Azure Storage account page, select Networking from the left sidebar.
-
For Public network access, select Enabled from all networks.
-
Click Save.
If you are using Dremio Enterprise Edition, please contact Dremio Support for advanced networking settings.
Adding an Azure Storage Source
To add an Azure Storage source to your project:
-
From the Datasets page, click Object Storage at the bottom of the Sources pane.
-
From the top-right of the page, click the Add object storage button.
-
In the Add Data Source dialog, under Object Storage, click Azure Storage.
The New Azure Storage Source dialog box appears, which contains the following sections: General, Advanced Options, Reflection Refresh, Metadata, Privileges.
Refer to the following for guidance on how to complete each section.
General
Section | Field/Option | Description |
---|---|---|
Name | Name | Provide a name to use for this Azure Storage source. The name cannot include the following special characters: / , : , [ , or ] . |
Connection | Account Name | The name of the Azure Storage account from the Azure portal app. |
Encrypt connection | Enabled by default, this option encrypts network traffic with TLS. Dremio does not allow this option to be disabled (unchecked). | |
Account Version | Select the Azure Storage version for this source connection. StorageV1 and StorageV2 are supported. Default: StorageV2 | |
Authentication | Shared access key | Select this option to authenticate using the Shared Access Key from the Azure portal App. |
Microsoft Entra ID | Select this option to use Microsoft Entra ID credentials for authentication. |
Although unencrypted connections are supported when connecting to Azure storage, they are not recommended.
Microsoft Entra ID Authentication
To configure the Azure Storage source to use Microsoft Entra ID for authentication, provide the following values from the OAuth 2.0 application that you created in the Azure portal for this source:
-
Application ID - The application (client) ID in Azure.
-
OAuth 2.0 Token Endpoint - The OAuth 2.0 token endpoint (v1.0), which includes the tenant ID and is used by the application in order to get an access token or a refresh token.
-
Application Secret - The secret key generated for the application.