Dremio Arctic
Dremio Arctic is a lakehouse management service built on Project Nessie and Apache Iceberg. It is an Iceberg-native catalog that automates data management and enables git-like workflows on data.
Adding a Dremio Arctic Source
Each Sonar project is deployed with an Arctic catalog that serves as a native Iceberg catalog and provides data management capabilities for that project. To add an additional Arctic catalog as a source to your Sonar project:
- From the Datasets page, in the Sources section, click the plus (+) sign.
Alternatively, click Add Source at the bottom of the data panel on the Datasets page.
- In the Add Data Source dialog, under Arctic Catalogs, click Arctic.
The New Arctic Source dialog box appears, which contains the following sections:
- General: Choose an existing Arctic catalog to add as a source to your Sonar project, or create a new one.
- Storage: Configure storage settings such as authentication, AWS or Azure Storage root path, and other connection properties.
- Advanced Options: Configure advanced properties such as caching.
- Reflection Refresh: Configure settings to refresh reflections defined on tables in the catalog, such as schedule and expiration policies.
Refer to the following for guidance on how to edit each section.
General
- From the Choose a catalog menu, select an existing Arctic Catalog or select + New Arctic Catalog.
- If you are adding a new catalog, provide a name for the new catalog, then click Add. The name cannot include the following special characters:
/
,:
,[
, or]
.
Storage
- AWS
- Azure
Authentication Using Project Data Credentials
Use project data credentials to enable Dremio to access Amazon S3 using the IAM role that is associated with your Dremio project. This IAM role was created when you signed up for Dremio and is the default credential that is used to access all the sources in your project.For this option, you will attach the necessary IAM policies to your Dremio project's IAM role. The following policies are available:
- Enable Dremio to read and query the S3 source.
- Enable Dremio to write to the S3 source.
For instructions to set up these policies and attach them to an IAM role, see Setting Up IAM Permissions.
Authentication Using Data Source Credentials
Use data source credentials to enable Dremio to access Amazon S3 using either a source-specific access key or an IAM role. This method provides you the flexibility to assign either an access key or an IAM role to each source that is in your Amazon S3 account. The following IAM policies are available:- Enable Dremio to read and query the S3 source.
- Enable Dremio to write to the S3 source.
Choose one of the following authentication methods to access the data source. During this set up, you will attach the IAM policies to provide Dremio read and/or write access to the data source.
- Use an access key: Create an IAM user in your AWS account. The access key is generated during the set up process. If you use this authentication method, you need to provide the AWS access key ID and AWS secret access key. For more information, see Creating an IAM role.
- Use a new IAM role: For the steps to create a new role, see Creating an IAM user.
- Use the Dremio project IAM role: To attach these policy templates to the Dremio project's IAM role, see Setting Up IAM Permissions.
AWS Root Path
Provide the root path to your S3 bucket. When creating new tables in Arctic, the root path is the default location where tables will be created when you do not specify aLOCATION
attribute in a CREATE TABLE
statement. Example root path: bucket-name/optional/folder/path
.By default, the root path for the catalog is automatically set to the S3 bucket used by your Sonar project's project store (the metadata store for your project).
Connection Properties (Optional)
Provide the custom key value pairs for the connection relevant to the source.- Click Add Property.
- For Name, enter a connection property.
- For Value, enter the corresponding connection property value.
- For Storage account name, enter the name of the Azure Storage account to use.
- For Azure root path, enter the root path inside your Azure Storage account, which is the name of the Azure Storage container, followed by optional folder(s). Example:
/containername/optional/folder/path
. - To allow Dremio to access your Azure Storage account, choose one of the following authentication methods:
- Select Shared access key and provide the key name from the Azure portal application.
- Select Microsoft Entra ID and provide the values from the OAuth 2.0 application that you created in the Azure portal for this source:
- For Application ID, enter the application (client) ID in Azure.
- For Client secret, enter the secret key generated for the application.
- For OAuth 2.0 token endpoint, enter the OAuth 2.0 token endpoint (v1.0), which includes the tenant ID and is used by the application in order to get an access token or a refresh token.
- (Optional) Click Add property to provide custom key value pairs. Enter a connection property for Name and enter the corresponding connection property value for Value.
- Click Save.
Azure only: If you see 0 byte files being created with your Iceberg tables in your Azure Storage account, these files do not impact Dremio’s functionality and can be ignored if you cannot update your storage container. If you can update your container, see Azure Data Lake Storage Gen2 hierarchical namespace for more information on how to enable Hierarchical Namespace to prevent the creation of these files.
Advanced Options
Review each option provided in the following table to set up the advanced options to meet your needs.
Advanced Option | Description |
---|---|
Enable asynchronous access when possible | Activated by default, uncheck the box to deactivate. Enables cloud caching for the S3 bucket to support simultaneous actions such as adding and editing a new source. |
Under Cache Options, review the following table and edit the options to meet your needs.
Cache Options | Description |
---|---|
Enable local caching when possible | Selected by default, along with asynchronous access for cloud caching. Uncheck the checkbox to disable this option. For more information about local caching, see Columnar Cloud Cache. |
Max percent of total available cache space to use when possible | Specifies the disk quota, as a percentage, that a source can use on any single executor node only when local caching is enabled. The default is 100 percent of the total disk space available on the mount point provided for caching. You can either manually enter in a percentage in the value field or use the arrows to the far right to adjust the percentage. |
Reflection Refresh
The Reflection Refresh section allows you to set a schedule for refreshing all of the reflections that are defined on tables in the catalog. You can override this schedule on individual tables in different branches. This section also lets you specify how long all reflections in the catalog exist until they expire. Again, you can override this setting on individual tables in different branches.
To learn more, see Refreshing Reflections and Setting the Expiration Policy for Reflections.
Editing an Arctic Source
To edit an Arctic source:
- From the Datasets page, click Arctic Catalogs to see the list of Arctic catalog sources the project is connected to.
- On the All Arctic Catalogs page, hover over an Arctic source to display the hidden Settings (gear) icon, then click the icon.
Alternatively, you can hover over an Arctic source, click the More (...) option, and then click Settings.
- In the Source Settings dialog box, you can make changes to the settings for Storage, Advanced Options, or Privileges.
You cannot change the catalog selected for a configured Arctic source.
- Click Save.
Removing an Arctic Source
To remove an Arctic catalog as a source for a Sonar project:
- From the Datasets page, click Arctic Catalogs to see the list of Arctic catalog sources the project is connected to.
- On the All Arctic Catalogs page, hover over an Arctic source, click the More (...) option, and then click Remove Source.
- Click Remove to confirm removal of the source, or click Cancel to return to the All Arctic Catalogs page.