Skip to main content
Version: current [25.x]

Nessie Catalogs

Nessie catalogs enable you to process, manage, consume, and share data in the same way that code is shared during software development. That is, you are empowered to take control of your data using concepts including version control, commits, and testing and development in isolation from your production data. Dremio enables you to perform data as code activities using Project Nessie, which provides Git-like capabilities for the data lakehouse.

Prerequisites

Dremio supports Nessie version 0.59.0 and later. If you have not yet set up a Nessie server and connected it with your dataset, you can choose to either set up a server in a fast-start Docker image or with secure HTTPS transport in Minikube.

When using Nessie as a source, Dremio can connect to Amazon S3 buckets, Azure Storage, Google Cloud Storage (GCS), or S3-compatible storage providers like MinIO and Dell ECS. Read Storage for details about the required credentials for connecting to each storage provider.

Configuring Nessie as a Source

To add a Nessie source to your project:

  1. On the Datasets page, to the right of Sources in the left panel, click This is the Add Source icon..

  2. In the Add Data Source dialog, under Nessie Catalogs, select Nessie.

    The New Nessie Source dialog box appears, which contains the following sections:

    • General: Create a name for your Nessie source, specify the endpoint URL, and set the authentication type. The name cannot include the following special characters: /, :, [, or ].

    • Storage: Set the storage option by setting up the authentication type and the connection properties.

    • Advanced Options: (Optional) Use the default settings or, optionally, configure access preferences and cache options.

    • Privileges: (Optional) Add privileges for users or roles.

    Refer to the following for guidance on how to edit each section.

General

This tab provides options for configuring connections to a Nessie source.

  1. In the Name field, enter a name.
note

The name you enter must be unique in the organization. Also, consider a name that is easy for users to reference. This name cannot be edited once the source is created. The name cannot exceed 255 characters and must contain only the following characters: 0-9, A-Z, a-z, underscore(_), or hyphen (-).

  1. In the Nessie endpoint URL field, specify the IP address and port that you have set up for your Nessie server (e.g., https://localhost:19120/api/v2). For more information, see Project Nessie Configuration.

  2. Under Nessie authentication type, select either None or Bearer:

    • None: The Nessie server does not require authentication.

    • Bearer: Set authentication using an OpenID bearer token. For more information about setting up this type of authentication, see Project Nessie's Authentication page. Then, choose a method for providing the password from the dropdown menu:

    • Dremio: Provide the bearer token in plain text. Dremio stores the bearer token.

    • Azure Key Vault: Provide the URI for the Azure Key Vault secret that stores the bearer token. The URI format is https://<vault_name>.vault.azure.net/secrets/<secret_name> (for example, https://myvault.vault.azure.net/secrets/mysecret).

      note

      To use Azure Key Vault as your application secret store, you must:
      - Deploy Dremio on Azure.
      - Complete the Requirements for Authenticating with Azure Key Vault.

      It is not necessary to restart the Dremio coordinator when you rotate secrets stored in Azure Key Vault. Read Requirements for Secrets Rotation for more information.

    • AWS Secrets Manager: Provide the Amazon Resource Name (ARN) for the AWS Secrets Manager secret that holds the bearer token, which is available in the AWS web console or using command line tools.

    • HashiCorp Vault: Choose the HashiCorp secrets engine you're using from the dropdown menu and enter the secret reference for the bearer token in the correct format in the provided field.

Next, set up the storage options.

Storage

Nessie sources can use Amazon S3 buckets (AWS), Azure Storage (Azure), Google Cloud Storage [Google (Preview)], or S3-compatible storage providers like MinIO and Dell ECS as storage.

To connect an Amazon S3 bucket or a S3-compatible storage provider to the Nessie source, select the AWS storage provider option.



S3 Storage

In the field under AWS root path, provide the root path of the S3 bucket to use. We recommend that you have either a dedicated S3 bucket or a dedicated folder in which to store Nessie objects.



Authentication


S3 Authentication
Under Authentication method, choose the method you want to use to authenticate to Amazon S3.

  • AWS Access Key:
    • In the field under AWS access key, provide the access key for the Amazon S3 account.
    • Under AWS access secret, use the dropdown menu to choose a method for providing the access secret for the Amazon S3 account:
      • Dremio: Provide the Amazon S3 access secret in plain text. Dremio stores the Amazon S3 access secret.
      • Azure Key Vault: Provide the URI for the Azure Key Vault secret that stores the Amazon S3 access secret. The URI format is https://vault_name.vault.azure.net/secrets/secret_name. To use Azure Key Vault as your application secret store, you must deploy Dremio on Azure and complete the requirements for authenticating with Azure Key Vault.
      • AWS Secrets Manager: Provide the Amazon Resource Name (ARN) for the AWS Secrets Manager secret that holds the Amazon S3 access secret, which is available in the AWS web console or using command line tools.
      • HashiCorp Vault: Choose the HashiCorp secrets engine you're using from the dropdown menu and provide the secret reference for the Amazon S3 access secret in the correct format in the provided field.
    • In the field under IAM role to assume, provide the ARN of the IAM role.
  • EC2 Metadata: In the field under IAM role to assume, provide the ARN of an IAM role with privileges on the S3 bucket. This role could be attached to the EC2 instance or to an IAM role to assume for connecting to the S3 bucket. In either case, the role must provide privileges to use the S3 bucket.
  • AWS Profile: In the field under AWS profile (optional), provide the AWS Profile name. If you leave the field blank, Dremio uses the default AWS Profile.
  • No Authentication: Select this option if no credentials are required because you are connecting the Nessie source to a public Amazon S3 bucket.

S3-Compatible Storage Provider Authentication
If you are connecting to S3-compatible storage like MinIO or Dell ECS, choose AWS access key for authentication and provide the access key and secret.


Other: Connection Properties

Provide the custom key-value pairs for the connection relevant to the source.


(Optional) If you are connecting to S3 storage, complete the following:
  1. Click Add Property.
  2. For Name, provide a connection property.
  3. For Value, provide the corresponding value for the connection property.

If you are connecting to S3-compatible storage like MinIO or Dell ECS, complete the following:
  1. Add fs.s3a.path.style.access and set the value to true. This setting ensures that the request path is created correctly when using IP addresses or hostnames as the endpoint.
  2. Add fs.s3a.endpoint property and its corresponding server endpoint value (IP address). The endpoint value cannot contain the http(s):// prefix nor can it start with the string s3. For example, if the endpoint is http://123.1.2.3:9000, the value is 123.1.2.3:9000.
  3. Add dremio.s3.compat and set the value to true.

Other: Encrypt connection

Optional: To secure the connections between the Amazon S3 bucket and Dremio, select the Encrypt connection checkbox.



To save the configuration, click Save. To configure additional settings, proceed to Advanced Options.


Advanced Options

Click Advanced Options in the left menu sidebar.

note

All advanced parameters are optional.

Review each option provided in the following table to set up the advanced options to meet your needs.

Advanced OptionDescription
Enable asynchronous access when possibleActivated by default, uncheck the box to deactivate. Enables cloud caching for the S3 bucket to support simultaneous actions such as adding and editing a new source.

Under Cache Options, review the following table and edit the options to meet your needs.

Cache OptionsDescription
Enable local caching when possibleSelected by default, along with asynchronous access for cloud caching. Uncheck the checkbox to disable this option. For more information about local caching, see Columnar Cloud Cache.
Max percent of total available cache space to use when possibleSpecifies the disk quota, as a percentage, that a source can use on any single executor node only when local caching is enabled. The default is 100 percent of the total disk space available on the mount point provided for caching. You can either manually enter in a percentage in the value field or use the arrows to the far right to adjust the percentage.

Reflection Refresh

The Reflection Refresh section allows you to set a schedule for refreshing all of the reflections that are defined on tables in the catalog. You can override this schedule on individual tables in different branches. This section also lets you specify how long all reflections in the catalog exist until they expire. Again, you can override this setting on individual tables in different branches.

To learn more, see Refreshing Reflections and Setting the Expiration Policy for Reflections.

Privileges

On the Privileges tab, you can grant privileges to specific users or roles. See Access Controls for additional information about privileges.

note

All privileges are optional.

  1. For Privileges, enter the user name or role name that you want to grant access to and click the Add to Privileges button. The added user or role is displayed in the USERS/ROLES table.
  2. For the users or roles in the USERS/ROLES table, toggle the checkmark for each privilege you want to grant on the Dremio source that is being created.
  3. Click Save after setting the configuration.

At this point, a connection with the Nessie server is attempted. If a connection cannot be made, report the issue to the Project Nessie community's Zulip channel. You can also file a ticket on the Project Nessie community's GitHub page.

Updating a Nessie Source

To update a Nessie source:

  1. On the Datasets page, under Nessie Catalogs in the panel on the left, find the name of the source you want to edit.
  2. Right-click the source name and select Settings from the list of actions. Alternatively, click the source name and then the The Settings icon at the top right corner of the page.
  3. In the Source Settings dialog, edit the settings you wish to update. Dremio does not support updating the source name. For information about the settings options, see Configuring Nessie as a Source.
  4. Click Save.

Deleting a Nessie Source

note

If the source is in a bad state (for example, Dremio cannot authenticate to the source or the source is otherwise unavailable), only users who belong to the ADMIN role can delete the source.

To delete a Nessie source, perform these steps:

  1. On the Datasets page, click Sources > Nessie Catalogs in the panel on the left.
  2. In the list of data sources, hover over the name of the source you want to remove and right-click.
  3. From the list of actions, click Delete.
  4. In the Delete Source dialog, click Delete to confirm that you want to remove the source.
note

Deleting a source causes all downstream views that depend on objects in the source to break.

Limitations

  • Changes to tables and views that are in Nessie sources are not logged. Nessie sources do not have audit logs.
    DX-64988
  • The Catalog API is unable to retrieve or manage Nessie sources.
    DX-64994
  • Dremio does not support moving, copying, or renaming tables and views in Nessie sources or removing the format from tables in Nessie sources.