Skip to main content
Version: 24.3.x

Nessie Catalogs Preview

Nessie catalogs enable you to process, manage, consume, and share data in the same way that code is shared during software development. That is, you are empowered to take control of your data using concepts including version control, commits, and testing and development in isolation from your production data. Dremio enables you to perform data as code activities using Project Nessie, which provides Git-like capabilities for the data lakehouse.

Prerequisites

Dremio supports Nessie version 0.59.0 and later. If you have not yet set up a Nessie server and connected it with your dataset, you can choose to either set up a server in a fast-start docker image or with secure HTTPS transport in Minikube.

When using Nessie as a source, Dremio can only connect to Amazon S3 buckets. If you have not yet connected Amazon S3 to Dremio, see Amazon S3 Credentials for guidance.

Adding a Nessie Source

To add a Nessie source to your project:

  1. From the Datasets page in the Sources section, click the plus sign (+).
note

Alternatively, click Add Source at the bottom of the data panel on the Datasets page.

  1. In the Add Data Source dialog, Nessie Catalogs, click Nessie (Preview).

    The New Nessie (Preview) Source dialog box appears, which contains the following sections:

    • General: Create a name for your Nessie source, specify the endpoint URL, and set the authentication type.

    • Storage: Set the storage option by setting up the authentication type and the connection properties.

    • Advanced Options: (Optional) Use the default settings or, optionally, configure access preferences and cache options.

    • Privileges: (Optional) Add privileges for users or roles.

    Refer to the following for guidance on how to edit each section.

General

This tab provides options for configuring connections to a Nessie source.

  1. In the Name field, enter a name.
note

The name you enter must be unique in the organization. Also, consider a name that is easy for users to reference. This name cannot be edited once the source is created. The name cannot exceed 255 characters and must contain only the following characters: 0-9, A-Z, a-z, underscore(_), or hyphen (-).

  1. In the Nessie Endpoint URL field, specify the IP address and port that you have set up for your Nessie server (e.g., https://localhost:19120/api/v2). For more information, see Project Nessie Configuration.

  2. Select the Nessie Authentication Type. You can select None or Bearer:

    • None: Authentication is not enforced on the Nessie server. Other users in the Dremio organization will be able to view the Nessie source without authenticating to it.

    • Bearer: Set authentication using an OpenID bearer token. For more information about setting up this type of authentication, see Project Nessie's Authentication page.

Next, set up the storage options.

Storage

This tab enables you to configure the storage options for the Nessie source. Nessie sources use Amazon S3 only, so you must specify the AWS authentication method to use, if one is required. (See Prerequisites if you need to set up storage for the Nessie source). Additionally, you can set up connection properties and enable encryption of the connection.

Authentication

To connect an Amazon S3 bucket to the Nessie source, choose one of the following authentication methods:

  • AWS Access Key: Enables an IAM user or the AWS account root user to access the Amazon S3 bucket. You can choose to authenticate with both an AWS Access Key and an AWS Access Secret, or with an IAM Role to Assume field to authenticate to the specified S3 bucket.

    Either the bucket or, if specified, the whitelisted bucket associated with the authentication method you are connecting with will be made available.

    note

    For information about long-term credentials for an IAM user or the AWS account root user, see Managing access keys for IAM users.

    • Choice 1: AWS Access Key (for example, AKIAIOSFODNN7EXAMPLE) and AWS Access Secret: (for example: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY)

    • Choice 2: IAM Role to Assume: An identity within your AWS account that has specific permissions.

  • EC2 Metadata: To authenticate to your Amazon S3 bucket using EC2 metadata, you need to provide an IAM role with privileges to the bucket. This role could be attached to the EC2 instance or to an IAM role to assume for connecting to the bucket. In either case, the role must provide privileges to use the bucket.

  • AWS Profile: Dremio reads your credentials from the specified AWS profile. For information on how to set up a configuration or credentials file for AWS, see AWS Custom Authentication.

    • Profile Name (Optional) -- The AWS profile name. If this is left blank, then the default profile will be used. For more information about using profiles in a credentials or configuration file, see AWS's documentation on Configuration and credential file settings.
  • AWS Profile: Specify the AWS Profile that can be used to access the storage. For more information about AWS Profiles, see Using instance profiles.
  • No Authentication: Select this option when you are connecting the Nessie source to a public Amazon S3 bucket.

To connect to S3-compatible storage like MinIO:

  1. Choose AWS Access Key for authentication and provide the access key and secret.

  2. Click Add property under Connection Properties and add the following properties:

    • Add fs.s3a.path.style.access and set the value to true.
    note

    This setting ensures that the request path is created correctly when using IP addresses or hostnames as the endpoint.

    • Add the fs.s3a.endpoint property and its corresponding server endpoint value (IP address).
    note

    The endpoint value cannot contain the http(s):// prefix nor can it start with the string s3. For example, if the endpoint is http://123.1.2.3:9000, the value is 123.1.2.3:9000.

    • Add dremio.s3.compat and set the value to true.

AWS Root Path

The root path to the Amazon S3 bucket. It is recommended that you either have a dedicated S3 bucket or, at least, a dedicated folder in which to store Nessie objects. Example: /bucket-name/optional/folder/path

Connection Properties (Optional)

When using Nessie as a source, tables and metadata files are stored in an Amazon S3 bucket. This section enables you to provide the custom key value pairs for the connection relevant to the source.

  1. Click Add Property.
  2. For Name, enter a connection property.
  3. For Value, enter the corresponding connection property value.

(Optional) To secure the connections between the S3 buckets and Dremio, tick the Encrypt connection checkbox.

After configuring the General and Storage options, you can either save your settings or continue on to set up the optional settings.

  • To save the configuration, click Save.
  • To configure the optional settings, proceed to Advanced Options.

Advanced Options

Click Advanced Options in the left menu sidebar.

note

All advanced parameters are optional.

Review each option provided in the following table to set up the advanced options to meet your needs.

Advanced OptionDescription
Enable asynchronous access when possibleActivated by default, uncheck the box to deactivate. Enables cloud caching for the S3 bucket to support simultaneous actions such as adding and editing a new source.

Under Cache Options, review the following table and edit the options to meet your needs.

Cache OptionsDescription
Enable local caching when possibleSelected by default, along with asynchronous access for cloud caching. Uncheck the checkbox to disable this option. For more information about local caching, see Columnar Cloud Cache.
Max percent of total available cache space to use when possibleSpecifies the disk quota, as a percentage, that a source can use on any single executor node only when local caching is enabled. The default is 100 percent of the total disk space available on the mount point provided for caching. You can either manually enter in a percentage in the value field or use the arrows to the far right to adjust the percentage.

Reflection Refresh

The Reflection Refresh section allows you to set a schedule for refreshing all of the reflections that are defined on tables in the catalog. You can override this schedule on individual tables in different branches. This section also lets you specify how long all reflections in the catalog exist until they expire. Again, you can override this setting on individual tables in different branches.

To learn more, see Refreshing Reflections and Setting the Expiration Policy for Reflections.

Privileges

Click Privileges in the left menu sidebar. This section grants privileges to specific users or roles. To learn more about how Dremio allows for the implementation of granular privileges, see Privileges.

note

All privileges are optional.

To add a privilege to a user or role:

  • In the Add User/Role field, enter the user or role name that you want to apply privileges to and then click Add to Privileges. The user or role is added to the Users table.

To set privileges for a user or role:

  • In the Users table, identify the user to set privileges for and click under the appropriate column (Select, Alter, Create Table, etc.) to either enable or disable that privilege. A green checkmark indicates that the privilege is enabled.

After you have completed configuring the Nessie source, click Save.

At this point, a connection with the Nessie server is attempted. If a connection cannot be made, report the issue to the Project Nessie community's Zulip channel. You can also file a ticket on the Project Nessie community's GitHub page.

Editing a Nessie Source

To edit a Nessie source:

  1. From the Datasets page, click Nessie Catalogs. A list of Nessie sources is displayed.

  2. On the All Nessie Catalogs page, hover over a Nessie source to show the Settings (gear) icon, then click the icon.

note

Alternatively, you can hover over a Nessie source, click the More (...) option, and then click Settings.

  1. In the Source Settings dialog, you can make changes to the settings for General, Storage, Advanced Options, or Privileges.

  2. Click Save.

Removing a Nessie Source

To remove a Nessie source:

  1. From the Datasets page, click Nessie Catalogs. A list of Nessie sources is displayed.

  2. On the All Nessie Catalogs page, hover over a Nessie source, click the More (...) option, and then click Remove Source.

  3. Click Remove to confirm removal of the source, or click Cancel to return to the All Nessie Catalogs page.

Limitations

  • Changes to tables and views that are in Nessie sources are not logged. Nessie sources do not have audit logs.
    DX-64988
  • The Catalog API is unable to retrieve or manage Nessie sources.
    DX-64994
  • Dremio does not support moving, copying, or renaming tables and views in Nessie sources or removing the format from tables in Nessie sources.