Distributed Storage

The paths.dist property in the dremio.conf file specifies the cache location where Dremio holds accelerator, CREATE TABLE AS tables, job result, download, and upload data. If this property is updated, then it must be updated in the dremio.conf file on all nodes. By default, Dremio uses the disk space on local Dremio nodes.

Store Supported?
NAS Yes
HDFS Yes
MapR-FS Yes
Amazon S3 Yes
Amazon Elastic File System (EFS) No
Azure Data Lake Store (ADLS) Yes
Azure File System (AFS) No

NAS

NAS (network attached storage) is a device that serves files via a network using a protocol such as NFS. Dremio supports the NFS protocol with NAS.

NFS protocol type Supported?
Netapp Yes
MapR NFS shares No
Window file shares No

[info] Before Configuring

This information is applicable to Dremio 3.1.x and earlier.
Before configuring NAS for Dremio, mount your NFS share with the acdirmin=0,acdirmax=0 options. These options provide faster response time and avoid timeouts on results being loaded.

For example:
mount -t nfs -o acdirmin=0,acdirmax=0 172.28.1.8:/var/nfs /var/nfs

Configuration

To configure NAS as Dremio's distributed storage, add the distributed path to the dremio.conf file:

paths: {
  ...
  dist: "file:///shared_mount_path"
}

HDFS

Before configuring HDFS as Dremio's distributed storage, test adding the same cluster as a Dremio source and verify the connection.

The following are dremio.conf file changes:

paths: {
  ...
  dist: "hdfs://<NAMENODE_HOST>:8020/path"}

When deploying on Hadoop using YARN, Dremio automatically copies this option to all nodes. So it only needs to be configured manually on Coordinator nodes.

Name Node HA
If Name Node HA is enabled, when specifying distributed storage (paths.dist in dremio.conf), path should be specific using fs.defaultFS value instead of the active name node. (e.g. <value_for_fs_defaultFS>/path)

fs.defaultFS value can be found in core-site.xml (typically found under /etc/hadoop/conf).

As per Hadoop using YARN deployment guide, ensure that you've copied core-site.xml, hdfs-site.xml, and yarn-site.xml (typically under /etc/hadoop/conf) files into Dremio's conf directory.

MapR-FS

Before configuring MapR-FS as Dremio's distributed storage, test adding the same cluster as a Dremio source and verify the connection.

The following are dremio.conf file changes:

paths: {
  ...
  dist: "maprfs:///<MOUNT_PATH>/<CACHE_DIRECTORY>"
}

When deploying on MapR using YARN, Dremio automatically copies this option to all nodes. So it only needs to be configured manually on Coordinator nodes.

Amazon S3

Before configuring Amazon S3 as Dremio's distributed storage:

  • Before configuring Amazon S3 as Dremio's distributed storage, test adding the same bucket as a Dremio source and verify the connection.
  • Ensure that the following minimum policy requirements for storing reflections are provided:
    {
      "Version": "2012-10-17",
      "Statement": [
          {
              "Sid": "VisualEditor0",
              "Effect": "Allow",
              "Action": [
                  "s3:PutObject",
                  "s3:GetObject",
                  "s3:ListBucket",
                  "s3:DeleteObject"
              ],
              "Resource": [
                  "arn:aws:s3:::BUCKET-NAME",
                  "arn:aws:s3:::BUCKET-NAME/*"
              ]
          },
          {
              "Sid": "VisualEditor2",
              "Effect": "Allow",
              "Action": [
                  "s3:ListAllMyBuckets",
                  "s3:HeadBucket"
              ],
              "Resource": "*"
          }
      ]
    }
    Options
    

To configure Dremio for Amazon S3:

  1. Change the following in the dremio.conf file:

    paths: {
    # the local path for dremio to store data.
    #local: ${DREMIO_HOME}"/data"
    # the distributed path Dremio data including job results, downloads, uploads, etc
    dist: "dremioS3:///qa1.dremio.com/jduong/accel"
    }
    

    Storage root directory needs to be created first.

  2. Create core-site.xml and include IAM credentials with list, read and write permissions:

    <?xml version="1.0"?>
    <configuration>
    <property>
        <name>fs.dremioS3.impl</name>
        <description>The FileSystem implementation. Must be set to com.dremio.plugins.s3.store.S3FileSystem</description>
        <value>com.dremio.plugins.s3.store.S3FileSystem</value>
    </property>
    <property>
        <name>fs.s3a.access.key</name>
        <description>AWS access key ID.</description>
        <value></value>
    </property>
    <property>
        <name>fs.s3a.secret.key</name>
        <description>AWS secret key.</description>
        <value></value>
    </property>
    <property>>
        <name>fs.s3a.aws.credentials.provider</name>
        <description>The credential provider type.</description>
        <value>org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider</value>
    </property>
    </configuration>
    
  3. Copy core-site.xml to under Dremio's configuration directory (same as dremio.conf) on all nodes.

Azure Data Lake Storage Gen1

Before configuring Azure Data Lake Storage Gen1 as Dremio's distributed storage:

  • Test adding the same store as a Dremio source and verify the connection. DO NOT proceed before completing this step.
  • Create a storage root directory before configuring with the dremio.conf and core-site.xml files.

To set up configuration for distributed storage:

  1. Change the following in the dremio.conf file:
    paths: {
    ...
    dist: "dremioAdl://<DATA_LAKE_STORE_NAME>.azuredatalakestore.net/<STORAGE_ROOT_DIRECTORY>"
    }
    
  1. Create core-site.xml and include ADLS connection details you verified above:
    <?xml version="1.0"?>
    <configuration>
     <property>
         <name>fs.dremioAdl.impl</name>
         <description>Must be set to com.dremio.plugins.adl.store.DremioAdlFileSystem</description>
         <value>com.dremio.plugins.adl.store.DremioAdlFileSystem</value>
     </property>
     <property>
         <name>dfs.adls.oauth2.client.id</name>
         <description>Application ID of the registered application under Azure Active Directory</description>
         <value>APPLICATION_ID</value>
     </property>
     <property>
         <name>dfs.adls.oauth2.credential</name>
         <description>Generated password value for the registered application</description>
         <value>PASSWORD</value>
     </property>
     <property>
         <name>dfs.adls.oauth2.refresh.url</name>
         <description>Azure Active Directory OAuth 2.0 Token Endpoint for registered applications.</description>
         <value>OATH2_ENDPOINT</value>
     </property>
     <property>
         <name>dfs.adls.oauth2.access.token.provider.type </name>
         <description>Must be set to ClientCredential</description>
         <value>ClientCredential</value>
     </property>
     <property>
         <name>fs.adl.impl.disable.cache</name>
         <description>Only include this property AFTER validating the ADLS connection.</description>
         <value>false</value>
     </property>
    </configuration>
    
  2. Copy the core-site.xml file to under Dremio's configuration directory (same as dremio.conf) on all nodes.

Azure Storage

The Azure Storage is the foundation for the ADLS Gen2 service. See the Azure Storage data source for more information.

To set up configuration for distributed storage:

  1. Change the following property in the dremio.conf file. The ALTERNATIVE_STORAGE_ROOT_DIRECTORY is optional and is used for an alternative location for creating sub-directories. If the alternative directory is not specified, the sub-directories are created directly under the FILE_SYSTEM directory.

    paths: {
    ...
    dist: "dremioAzureStorage://:///<FILE_SYSTEM_NAME>/<ALTERNATIVE_STORAGE_ROOT_DIRECTORY>"
    }
    

    Storage root directory needs to be created first.

  2. Create core-site.xml and add the following details:

    <?xml version="1.0"?>
    <configuration>
    <property>
     <name>fs.dremioAzureStorage.impl</name>
     <description>FileSystem implementation. Must always be com.dremio.plugins.azure.AzureStorageFileSystem</description>
     <value>com.dremio.plugins.azure.AzureStorageFileSystem</value>
    </property>
    <property>
      <name>dremio.azure.account</name>
      <description>The name of the storage account.</description>
      <value>ACCOUNT_NAME</value>
    </property>
    <property>
      <name>dremio.azure.key</name>
      <description>The shared access key for the storage account.</description>
      <value>ACCESS_KEY</value>
    </property>
    <property>
      <name>dremio.azure.mode</name>
      <description>The storage account type. Value: STORAGE_V2</description>
      <value>STORAGE_V2</value>
    </property>
    <property>
      <name>dremio.azure.secure</name>
      <description>Boolean option to enable SSL connections. Default: True Value: True/False</description>
      <value>True</value>
    </property>
    </configuration>
    
  3. Copy the core-site.xml file to under Dremio's configuration directory (same as dremio.conf) on all nodes.

results matching ""

    No results matching ""