Skip to main content
Version: 24.3.x

Azure Storage

The Dremio Azure Storage Connector includes support for the following Azure Storage services:

Azure Blob Storage Azure Blob storage is Microsoft's object storage solution for the cloud. Blob storage is optimized for storing massive amounts of unstructured data, such as text or binary data.

Azure Data Lake Storage Gen2 Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on top of Azure Blob storage, and converges the capabilities of Azure Blob Storage and Azure Data Lake Storage Gen1. Features from Azure Data Lake Storage Gen1, such as file system semantics, directory, and file level security and scale are combined with the low-cost, tiered storage, high availability/disaster recovery capabilities of Azure Blob storage.

caution

Soft delete for blobs is not supported for Azure Storage accounts. Soft delete should be disabled to establish a successful connection.

Configuration

General

  • Name: Name to use for the Azure Storage source.

Connection

  • Account Name: Name of the Azure Storage account.
  • Encrypt connection: Select to encrypt network traffic over SSL.
  • Account Version: Version of Azure Storage. The options are StorageV1 and Storage V2. The default is Storage V2.

Authentication

For Authentication Type, select Shared access key or Azure Active Directory.

note

To use Azure Key Vault as your application secret store, you must:

It is not necessary to restart the Dremio coordinator when you rotate secrets stored in Azure Key Vault. Read Requirements for Secrets Rotation for more information.

If you select Shared access key authentication:

  • For Secret Store, select Dremio or Azure Key Vault.
    • If you select Dremio, enter the shared access key in plain text. Dremio stores the key.
    • If you select Azure Key Vault, enter the URI for the Azure Key Vault secret that stores the shared access key. The URI format is https://<vault_name>.vault.azure.net/secrets/<secret_name> (for example, https://myvault.vault.azure.net/secrets/mysecret). Dremio connects to Azure Key Vault and fetches the secret to use as the shared access key. Dremio does not store the fetched secret.

If you select Azure Active Directory authentication:

  • For Application ID, specify the Application (Client) ID in Azure Active Directory.
  • For OAuth 2.0 Token Endpoint, specify the OAuth 2.0 token endpoint for your Azure application.
  • For Application Secret Store, select Dremio or Azure Key Vault.
    • If you select Dremio, enter the shared access key in plain text. Dremio stores the key.
    • If you select Azure Key Vault, enter the URI for the Azure Key Vault secret that stores the shared access key. The URI format is https://<vault_name>.vault.azure.net/secrets/<secret_name> (for example, https://myvault.vault.azure.net/secrets/mysecret). Dremio connects to Azure Key Vault and fetches the secret to use as the shared access key. Dremio does not store the fetched secret.
Requirements for Authenticating with Azure Key Vault

Dremio uses Azure Active Directory (AAD) managed identities to connect to Azure Key Vault. Follow the AAD instructions linked to from the steps below to ensure that Dremio can connect to Azure Key Vault for authentication when you create an Azure Storage source:

  1. Create a user-assigned managed identity in AAD.
  2. Assign the managed identity to the Dremio coordinator and executor virtual machines (VMs).
  3. Assign the Azure Key Vault access policy to allow access to the managed identity.
  4. Add a secret in the Azure Key Vault whose value is either the shared access key or application secret, depending on the authentication type you select, that Dremio requires to connect to your Azure Storage source.
Requirements for Secrets Rotation

For seamless rotation of secrets stored in Azure Key Vault, the rotation must be done with two secrets. After the Azure Key Vault secret value is updated, both secrets must remain valid for the minimum holdover period:

  • Plain secrets: 5 minutes
  • AAD client secrets: 90 minutes

You may invalidate the old secret when the holdover period expires.

It is not necessary to restart the Dremio coordinator when you rotate secrets stored in Azure Key Vault.

Advanced Options

  • Enable asynchronous access when possible: Select to enable cloud caching so that the Azure Storage source can support simultaneous actions like adding and editing new sources.
  • Enable partition column inference: Select if Dremio should use partition column inference to handle partition columns.
  • Root Path: Root path for the source. The default is /.
  • Advanced Properties: Add connection properties, specifying their names and values.
  • Blob Containers & Filesystem Allowlist Add the names of containers to include in the source. This setting disables automatic container and filesystem discovery. Dremio limits the available containers and filesystems to those you add to the allowlist.

Cache Options

  • Enable local caching when possible: Select to create local caches of any data used from the source. Read Configuring Cloud Caching for more information.
  • Max percent of total available cache space to use when possible: Maximum amount of cache space, as a percentage, that a source can use on any single executor node when local caching is enabled The default value is 100.

Reflection Refresh

The reflection refresh options control how often Dremio refreshes reflections automatically and the time limit after which reflections expire and are removed.

Refresh Policy

  • Never refresh: Select to prevent the automatic refresh of reflections. The default is to allow automatic refreshes.
  • Refresh every: If using automatic refresh, how often to refresh reflections, specified in minutes, hours, days, or weeks. The default is 1 hour. Ignored if you select Never refresh.
  • Never expire: Select to prevent the expiration of reflections. The default is expiration after the specified time limit.
  • Expire after: Time limit after which reflections expire and are removed from Dremio, specified in minutes, hours, days, or weeks. The default is 3 hours. Ignored if you select Never expire.

Metadata

Metadata settings include options for dataset handling and metadata refresh.

Dataset Handling

  • Remove dataset definitions if underlying data is unavailable: Select to automatically remove datasets if their underlying files and folders are removed from Azure Storage or if the folder or source is not accessible. This option is selected by default. If not selected, Dremio does not remove dataset definitions even if their underlying files and folders are removed from Azure Storage, which is useful when files are temporarily deleted and replaced with a new set of files.
  • Automatically format files into tables when users issue queries: Select to automatically promote folders to tables using the default options when a user runs a query on the folder data for the first time. This option is not selected by default. For Azure Storage sources that contain CSV files, especially CSV files with non-default formatting, consider leaving this option unselected.

Metadata Refresh

Metadata Refresh settings allow you to configure the refresh interval for gathering detailed information about promoted tables, including fields, data types, shards, statistics, and locality. Dremio uses this information during query planning and optimization.

  • Fetch mode: The default is Only Queried Datasets, which only updates details only for previously queried objects in a source. This option increases query performance because the datasets require less work at query time. Other options are deprecated.
  • Fetch every: How often to refresh dataset details, specified in minutes, hours, days, or weeks. The default is 1 hour.
  • Expire after: Time limit after which dataset details expire, specified in minutes, hours, days, or weeks. The default is 3 hours.

Privileges

Use the Privileges sidebar to specify the privileges that individual users and roles have for the source.

  1. Enter a user or role name in the search field and click the Add to Privileges button.
  2. Click once in the intersecting field for each user or role and privilege to assign the desired privileges.
  3. Click Save.

Read Privileges and Access Management for more information about privileges.

Distributed Storage

See Configuring Distributed Storage for information to configure Azure Storage as a distribute storage source.

Azure Government

To configure Azure Storage for the Azure Government platform add one of the following properties to the Advanced Options tab under Advanced Properties, depending on if the Azure Storage source is of Account Kind Storage V1 or Storage V2.

  • Storage V1 -- Add the following property and value if the Azure Storage source is of Account Kind Storage V1

    Property and value for Storage V1
    fs.azure.endpoint = blob.core.usgovcloudapi.net
  • Storage V2 -- Add the following property and value if the Azure Storage source is of Account Kind Storage V2

    Property and value for Storage V2
    fs.azure.endpoint = dfs.core.usgovcloudapi.net

Columnar Cloud Cache

Azure Storage supports Columnar Cloud Cache.

For More Information