Azure Storage

This topic describes Azure ADLS Gen2 and Dremio configuration information.

The Azure Storage is the foundation for the ADLS Gen2 service. With Dremio's implementation as a data source, ADLS Gen2 files systems can be configured along with blob containers and filesystem whitelists.

Dremio Configuration

Settings for Dremio configuration are provided in the following tabs:

  • General
  • Advanced Options
  • Reflection Refresh
  • Metadata
  • Sharing

General

Dremio Field Azure Property Description
Resource Name Name Name of the Azure Storage source.
Connection Account Name Name of the account.
Account Kind Select either StorageV1 or StorageV2. Default: StorageV2
Encrypt connection Sets a forced encryption over TLS.
Authentication Shared Access Key Generated password value for the registered application

Azure Storage General Settings

Advanced Options

Advanced Options include:

  • Enable asynchronous access when possible (default)
  • Enable exports into the source (CTAS and DROP).
  • Root Path -- Root path for the source.
  • Advanced Properties -- A list of connection properties (name and value).
  • Blob containers and Filesystem Whitelist -- Specifies a list of containers to include. Note that this disables automated container discovery.
  • Cache Options
    • Enable local caching when possible
    • Max percent of total available cache space to use when possible.

Azure Storage Advanced Options Settings

Reflection Refresh

Reflection refresh policy options include:

  • Never refresh -- Specifies how often to refresh based on hours, days, weeks, or never.
  • Never expire -- Specifies how often to expire based on hours, days, weeks, or never.

Azure Storage Reflections Refresh Settings

Metadata

The metadata settings include:

  • Dataset handling options
  • Metadata refresh options

Azure Storage Metadata Settings

Dataset Handling

  • Remove dataset definitions if underlying data is unavailable (Default).
    If this box is not checked and the underlying files under a folder are removed or the folder/source is not accessible, Dremio does not remove the dataset definitions. This option is useful in cases when files are temporarily deleted and put back in place with new sets of files.
  • Automatically format files into physical datasets when users issue queries. If this box is checked and a query runs against the un-promoted PDS/folder, Dremio automatically promotes using default options. If you have CSV files, especially with non-default options, it might be useful to not check this box.

Metadata Refresh

  • Dataset Discovery -- Refresh interval for top-level source object names such as names of DBs and tables.
    • Fetch every -- Specify fetch time based on minutes, hours, days, or weeks. Default: 1 hour
  • Dataset Details -- The metadata that Dremio needs for query planning such as information needed for fields, types, shards, statistics, and locality.
    • Fetch mode -- Specify either Only Queried Datasets, All Datasets, or As Needed. Default: Only Queried Datasets
      • Only Queried Datasets -- Dremio updates details for previously queried objects in a source.
        This mode increases query performance because less work is needed at query time for these datasets.
      • All Datasets -- Dremio updates details for all datasets in a source. This mode increases query performance because less work is needed at query time.
      • As Needed -- Dremio updates details for a dataset at query time. This mode minimized metadata queries on a source when not used, but might lead to longer planning times.
    • Fetch every -- Specify fetch time based on minutes, hours, days, or weeks. Default: 1 hour
    • Expire after -- Specify expiration time based on minutes, hours, days, or weeks. Default: 3 hours

Sharing

Sharing options for which users can edit include:

Azure Storage Sharing Settings

Distributed Storage

The Azure Storage is the foundation for the ADLS Gen2 service. See Configuring Distributed Storage for additional information about configuring for distributed storage.

To configure for distributed storage:

  1. Create core-site.xml and add the following properties:
     <?xml version="1.0"?>
     <configuration>
      <property>
         <name>fs.dremioAzureStorage.impl</name>
         <description>FileSystem implementation. Must always be com.dremio.plugins.azure.AzureStorageFileSystem</description>
         <value>com.dremio.plugins.azure.AzureStorageFileSystem</value>
      </property>
      <property>
          <name>dremio.azure.account</name>
          <description>The name of the storage account.</description>
          <value>ACCOUNT_NAME</value>
      </property>
      <property>
          <name>dremio.azure.key</name>
          <description>The shared access key for the storage account.</description>
          <value>ACCESS_KEY</value>
      </property>
      <property>
          <name>dremio.azure.mode</name>
          <description>The storage account type. Value: STORAGE_V2</description>
          <value>STORAGE_V2</value>
      </property>
      <property>
          <name>dremio.azure.secure</name>
          <description>Boolean option to enable SSL connections. Default: True Value: True/False</description>
          <value>True</value>
      </property>
     </configuration>
    
    [info] For Azure Government configuration, see Using Azure Government.
  2. Copy the core-site.xml file to the Dremio's configuration directory location (the same location as dremio.conf) on all nodes.

Using Azure Government

To configure the Azure Storage data source for the Azure Government cloud platform, determine whether you are configuring the Azure Storage data source to use Storage V1 or Storage V2.

Storage V2

To configure the Azure Storage data source to access data on Azure Government using Storage V2:

  1. Add the following property to the core-site.xml file along with the general Azure Storage properties and copy the file to Dremio's configuration directory location on all nodes:
     <property>
           <name>fs.azure.endpoint</name>
           <description>The azure storage endpoint to use.</description>
           <value>dfs.core.usgovcloudapi.net</value>
     </property>
    
  2. Add the Azure Storage data source and configure the following:
    1. From the General tab, specify StorageV2 for the Account Kind Connection.
    2. From the Advance Options > Advanced Properties tab, add the following property and value:
      fs.azure.endpoint = dfs.core.usgovcloudapi.net

Storage V1

To configure the Azure Storage data source to access data on Azure Government using Storage V1:

  1. Add the following property to the core-site.xml file along with the general Azure Storage properties and copy the file to Dremio's configuration directory location on all nodes:
     <property>
           <name>fs.azure.endpoint</name>
           <description>The azure storage endpoint to use.</description>
           <value>blob.core.usgovcloudapi.net</value>
     </property>
    
  2. Add the Azure Storage data source and configure the following:
    1. From the General tab, specify StorageV1 for the Account Kind Connection.
    2. From the Advance Options > Advanced Properties tab, add the following property and value:
      fs.azure.endpoint = blob.core.usgovcloudapi.net

Using OAuth 2.0 Authentication

To configure the Azure Storage data source for the OAuth 2.0 Authentication:

  1. Log in to the Azure Portal and navigate to App Registrations.
  2. If you haven't already done so, create an app for OAuth 2.0.
  3. Obtain the following values:
    • Application ID - This is the Application (Client) ID in Azure.
    • OAuth 2.0 Token Endpoint - This is the OAuth 2.0 token endpoint (v1.0)
    • Client Secret - This is the secret key generated in the application.
  4. From the Dremio UI, add the Azure Storage data source.
  5. From the General tab, select Azure Active Directory and specify the following values:
    • Application ID
    • OAuth 2.0 Token Endpoint
    • Client Secret

Configuring OAuth2.0 with Distributed Storage

To enable distributed storage with OAuth, update the core-site.xml file. See the following sample information for reference:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>fs.dremioAzureStorage.impl</name>
    <description>FileSystem implementation. Must always be com.dremio.plugins.azure.AzureStorageFileSystem</description>
    <value>com.dremio.plugins.azure.AzureStorageFileSystem</value>
  </property>
  <property>
    <name>dremio.azure.account</name>
    <description>The name of the storage account.</description>
    <value>ACCOUNT_NAME</value>
  </property>
  <property>
    <name>dremio.azure.mode</name>
    <description>The storage account type. Value: STORAGE_V1 or STORAGE_V2</description>
    <value>MODE</value>
  </property>
  <property>
    <name>dremio.azure.secure</name>
    <description>Boolean option to enable SSL connections. Default: True, Value: True/False</description>
    <value>SECURE</value>
  </property>
  <property>
    <name>dremio.azure.credentialsType</name>
    <description>The credentials used for authentication. Value: ACCESS_KEY or AZURE_ACTIVE_DIRECTORY</description>
    <value>CREDENTIALS_TYPE</value>
  </property>
  <property>
    <name>dremio.azure.clientId</name>
    <description>The client ID of the Azure application used for Azure Active Directory</description>
    <value>CLIENT_ID</value>
  </property>
  <property>
    <name>dremio.azure.tokenEndpoint</name>
    <description>OAuth 2.0 token endpoint for Azure Active Directory(v1.0)</description>
    <value>TOKEN_ENDPOINT</value>
  </property>
  <property>
    <name>dremio.azure.clientSecret</name>
    <description>The client secret of the Azure application used for Azure Active Directory</description>
    <value>CLIENT_SECRET</value>
  </property>
</configuration>

Configuring OAuth 2.0 with Azure Goverment Cloud

To use OAuth 2.0 authentication with Azure Government cloud platform, add the following property to the core-site.xml:

<property>
 <name>fs.azure.endpoint</name>
 <description>Azure Government Cloud Endpoint</description>
 <value>GOVERNMENT_CLOUD_ENDPOINT</value>
</property>

Configuring Cloud Cache

As of Dremio 4.0 Enterprise Edition, cloud caching is available. See Cloud Cache and Configuring Cloud Cache for more information.

For More Information


results matching ""

    No results matching ""