Skip to main content

Dremio Arctic

Dremio Arctic is a lakehouse management service built on Project Nessie and Apache Iceberg. It is an Iceberg-native catalog that automates data management and enables git-like workflows on data.

Adding a Dremio Arctic Source

Each Sonar project is deployed with an Arctic catalog that serves as a native Iceberg catalog and provides data management capabilities for that project. To add an additional Arctic catalog as a source to your Sonar project:

  1. From the Datasets page, in the Sources section, click the plus (+) sign.
note

Alternatively, click Add Source at the bottom of the data panel on the Datasets page.

  1. In the Add Data Source dialog, under Arctic Catalogs, click Arctic.

The New Arctic Source dialog box appears, which contains the following sections:

  • General: Choose an existing Arctic catalog to add as a source to your Sonar project, or create a new one.
  • Storage: Configure storage settings such as authentication, AWS or Azure Storage root path, and other connection properties.
  • Advanced Options: Configure advanced properties such as caching.
  • Reflection Refresh: Configure settings to refresh reflections defined on tables in the catalog, such as schedule and expiration policies.

Refer to the following for guidance on how to edit each section.

General

  1. From the Choose a catalog menu, select an existing Arctic Catalog or select + New Arctic Catalog.
  2. If you are adding a new catalog, provide a name for the new catalog, then click Add. The name cannot include the following special characters: /, :, [, or ].

Storage

Authentication Using Project Data Credentials

Use project data credentials to enable Dremio to access Amazon S3 using the IAM role that is associated with your Dremio project. This IAM role was created when you signed up for Dremio and is the default credential that is used to access all the sources in your project.



For this option, you will attach the necessary IAM policies to your Dremio project's IAM role. The following policies are available:



For instructions to set up these policies and attach them to an IAM role, see Setting Up IAM Permissions.



Authentication Using Data Source Credentials

Use data source credentials to enable Dremio to access Amazon S3 using either a source-specific access key or an IAM role. This method provides you the flexibility to assign either an access key or an IAM role to each source that is in your Amazon S3 account. The following IAM policies are available:



Choose one of the following authentication methods to access the data source. During this set up, you will attach the IAM policies to provide Dremio read and/or write access to the data source.


  • Use an access key: Create an IAM user in your AWS account. The access key is generated during the set up process. If you use this authentication method, you need to provide the AWS access key ID and AWS secret access key. For more information, see Creating an IAM role.
  • Use a new IAM role: For the steps to create a new role, see Creating an IAM user.
  • Use the Dremio project IAM role: To attach these policy templates to the Dremio project's IAM role, see Setting Up IAM Permissions.

AWS Root Path

Provide the root path to your S3 bucket. When creating new tables in Arctic, the root path is the default location where tables will be created when you do not specify a LOCATION attribute in a CREATE TABLE statement. Example root path: bucket-name/optional/folder/path.



By default, the root path for the catalog is automatically set to the S3 bucket used by your Sonar project's project store (the metadata store for your project).



Connection Properties (Optional)

Provide the custom key value pairs for the connection relevant to the source.



  1. Click Add Property.
  2. For Name, enter a connection property.
  3. For Value, enter the corresponding connection property value.
note

Azure only: If you see 0 byte files being created with your Iceberg tables in your Azure Storage account, these files do not impact Dremio’s functionality and can be ignored if you cannot update your storage container. If you can update your container, see Azure Data Lake Storage Gen2 hierarchical namespace for more information on how to enable Hierarchical Namespace to prevent the creation of these files.

Advanced Options

Review each option provided in the following table to set up the advanced options to meet your needs.

Advanced OptionDescription
Enable asynchronous access when possibleActivated by default, uncheck the box to deactivate. Enables cloud caching for the S3 bucket to support simultaneous actions such as adding and editing a new source.

Under Cache Options, review the following table and edit the options to meet your needs.

Cache OptionsDescription
Enable local caching when possibleSelected by default, along with asynchronous access for cloud caching. Uncheck the checkbox to disable this option. For more information about local caching, see Columnar Cloud Cache.
Max percent of total available cache space to use when possibleSpecifies the disk quota, as a percentage, that a source can use on any single executor node only when local caching is enabled. The default is 100 percent of the total disk space available on the mount point provided for caching. You can either manually enter in a percentage in the value field or use the arrows to the far right to adjust the percentage.

Reflection Refresh

The Reflection Refresh section allows you to set a schedule for refreshing all of the reflections that are defined on tables in the catalog. You can override this schedule on individual tables in different branches. This section also lets you specify how long all reflections in the catalog exist until they expire. Again, you can override this setting on individual tables in different branches.

To learn more, see Refreshing Reflections and Setting the Expiration Policy for Reflections.

Editing an Arctic Source

To edit an Arctic source:

  1. From the Datasets page, click Arctic Catalogs to see the list of Arctic catalog sources the project is connected to.
  2. On the All Arctic Catalogs page, hover over an Arctic source to display the hidden Settings (gear) icon, then click the icon.
note

Alternatively, you can hover over an Arctic source, click the More (...) option, and then click Settings.

  1. In the Source Settings dialog box, you can make changes to the settings for Storage, Advanced Options, or Privileges.
note

You cannot change the catalog selected for a configured Arctic source.

  1. Click Save.

Removing an Arctic Source

To remove an Arctic catalog as a source for a Sonar project:

  1. From the Datasets page, click Arctic Catalogs to see the list of Arctic catalog sources the project is connected to.
  2. On the All Arctic Catalogs page, hover over an Arctic source, click the More (...) option, and then click Remove Source.
  3. Click Remove to confirm removal of the source, or click Cancel to return to the All Arctic Catalogs page.