Distributed storage

paths.dist is the cache location where Dremio holds accelerator, CREATE TABLE AS tables, job result, download and upload data.

By default Dremio uses the disk space on local Dremio nodes. You can indicate a different store such as NAS, HDFS, MapR-FS, S3 or ADLS:

This option needs to be update in dremio.conf for all nodes.

NAS

Before configuring NAS as Dremio's distributed storage, test adding the same mount as a Dremio source and verify the connection.

dremio.conf changes:

paths: {
  ...
  dist: "/shared_mount_path"
}

HDFS

Before configuring HDFS as Dremio's distributed storage, test adding the same cluster as a Dremio source and verify the connection.

dremio.conf changes:

paths: {
  ...
  dist: "hdfs://<NAMENODE_HOST>:8020/path"}

When deploying on Hadoop using YARN, Dremio automatically copies this option to all nodes. So it only needs to be configured manually on Coordinator nodes.

Name Node HA
If Name Node HA is enabled, when specifying distributed storage (paths.dist in dremio.conf), path should be specific using fs.defaultFS value instead of the active name node. (e.g. <value_for_fs_defaultFS>/path)

fs.defaultFS value can be found in core-site.xml (typically found under /etc/hadoop/conf).

As per Hadoop using YARN deployment guide, ensure that you've copied core-site.xml, hdfs-site.xml and yarn-site.xml (typically under /etc/hadoop/conf) files into Dremio's conf directory.

MapR-FS

Before configuring MapR-FS as Dremio's distributed storage, test adding the same cluster as a Dremio source and verify the connection.

dremio.conf changes:

paths: {
  ...
  dist: "maprfs:///<MOUNT_PATH>/<CACHE_DIRECTORY>"
}

When deploying on MapR using YARN, Dremio automatically copies this option to all nodes. So it only needs to be configured manually on Coordinator nodes.

Amazon S3

Before configuring Amazon S3 as Dremio's distributed storage, test adding the same bucket as a Dremio source and verify the connection.

  1. dremio.conf changes:

    paths: {
    ...
    dist: "s3a://<BUCKET_NAME>/<STORAGE_ROOT_DIRECTORY>"
    }
    

    Storage root directory needs to be created first.

  2. Create core-site.xml and include IAM credentials with list, read and write permissions:

    <?xml version="1.0"?>
    <configuration>
       <property>
           <name>fs.s3a.access.key</name>
           <description>AWS access key ID.</description>
           <value>ACCESS KEY</value>
       </property>
       <property>
           <name>fs.s3a.secret.key</name>
           <description>AWS secret key.</description>
           <value>SECRET KEY</value>
       </property>
    </configuration>
    
  3. Copy core-site.xml to under Dremio's configuration directory (same as dremio.conf) on all nodes.

Azure Data Lake Store

Before configuring Azure Data Lake Store as Dremio's distributed storage, test adding the same store as a Dremio source and verify the connection. DO NOT proceed before completing this step.

  1. dremio.conf changes:

    paths: {
    ...
    dist: "adl://<DATA_LAKE_STORE_NAME>.azuredatalakestore.net/<STORAGE_ROOT_DIRECTORY>"
    }
    

    Storage root directory needs to be created first.

  2. Create core-site.xml and include ADLS connection details you verified above:

    <?xml version="1.0"?>
    <configuration>
     <property>
         <name>fs.adl.impl</name>
         <description>Must be set to org.apache.hadoop.fs.adl.AdlFileSystem</description>
         <value>org.apache.hadoop.fs.adl.AdlFileSystem</value>
     </property>
     <property>
         <name>dfs.adls.oauth2.client.id</name>
         <description>Application ID of the registered application under Azure Active Directory</description>
         <value>APPLICATION_ID</value>
     </property>
     <property>
         <name>dfs.adls.oauth2.credential</name>
         <description>Generated password value for the registered application</description>
         <value>PASSWORD</value>
     </property>
     <property>
         <name>dfs.adls.oauth2.refresh.url</name>
         <description>Azure Active Directory OAuth 2.0 Token Endpoint for registered applications.</description>
         <value>OATH2_ENDPOINT</value>
     </property>
     <property>
         <name>dfs.adls.oauth2.access.token.provider.type </name>
         <description>Must be set to ClientCredential</description>
         <value>ClientCredential</value>
     </property>
     <property>
         <name>fs.adl.impl.disable.cache</name>
         <description>Only include this property AFTER validating the ADLS connection.</description>
         <value>false</value>
     </property>
    </configuration>
    
  3. Copy core-site.xml to under Dremio's configuration directory (same as dremio.conf) on all nodes.

results matching ""

    No results matching ""