Nessie catalogs enable you to process, manage, consume, and share data in the same way that code is shared during software development. That is, you are empowered to take control of your data using concepts including version control, commits, and testing and development in isolation from your production data. Dremio enables you to perform data as code activities using Project Nessie, which provides Git-like capabilities for the data lakehouse.
Dremio supports Nessie version 0.59.0 and later. If you have not yet set up a Nessie server and connected it with your dataset, you can choose to either set up a server in a fast-start docker image or with secure HTTPS transport in Minikube.
When using Nessie as a source, Dremio can only connect to Amazon S3 buckets. If you have not yet connected Amazon S3 to Dremio, see Amazon S3 Credentials for guidance.
Adding a Nessie Source
To add a Nessie source to your project:
From the Datasets page in the Sources section, click the plus sign (+).note
Alternatively, click Add Source at the bottom of the data panel on the Datasets page.
In the Add Data Source dialog, Nessie Catalogs, click Nessie (Preview).
The New Nessie (Preview) Source dialog box appears, which contains the following sections:
General: Create a name for your Nessie source, specify the endpoint URL, and set the authentication type.
Storage: Set the storage option by setting up the authentication type and the connection properties.
Advanced Options: (Optional) Use the default settings or, optionally, configure access preferences and cache options.
Privileges: (Optional) Add privileges for users or roles.
Refer to the following for guidance on how to edit each section.
This tab provides options for configuring connections to a Nessie source.
In the Name field, enter a name.note
The name you enter must be unique in the organization. Also, consider a name that is easy for users to reference. This name cannot be edited once the source is created. The name cannot exceed 255 characters and must contain only the following characters: 0-9, A-Z, a-z, underscore(_), or hyphen (-).
In the Nessie Endpoint URL field, specify the IP address and port that you have set up for your Nessie server (e.g.,
https://localhost:19120/api/v2). For more information, see Project Nessie Configuration.
Select the Nessie Authentication Type. You can select None or Bearer:
None: Authentication is not enforced on the Nessie server. Other users in the Dremio organization will be able to view the Nessie source without authenticating to it.
Bearer: Set authentication using an OpenID bearer token. For more information about setting up this type of authentication, see Project Nessie's Authentication page.
Next, set up the storage options.
This tab enables you to configure the storage options for the Nessie source. Nessie sources use Amazon S3 only, so you must specify the AWS authentication method to use, if one is required. (See Prerequisites if you need to set up storage for the Nessie source). Additionally, you can set up connection properties and enable encryption of the connection.
To connect an Amazon S3 bucket to the Nessie source, choose one of the following authentication methods:
AWS Access Key: Enables an IAM user or the AWS account root user to access the Amazon S3 bucket. You can choose to authenticate with both an AWS Access Key and an AWS Access Secret, or with an IAM Role to Assume field to authenticate to the specified S3 bucket.
Either the bucket or, if specified, the whitelisted bucket associated with the authentication method you are connecting with will be made available.note
For information about long-term credentials for an IAM user or the AWS account root user, see Managing access keys for IAM users.
Choice 1: AWS Access Key (for example,
AKIAIOSFODNN7EXAMPLE) and AWS Access Secret: (for example:
Choice 2: IAM Role to Assume: An identity within your AWS account that has specific permissions.
EC2 Metadata: To authenticate to your Amazon S3 bucket using EC2 metadata, you need to provide an IAM role with privileges to the bucket. This role could be attached to the EC2 instance or to an IAM role to assume for connecting to the bucket. In either case, the role must provide privileges to use the bucket.
AWS Profile: Dremio reads your credentials from the specified AWS profile. For information on how to set up a configuration or credentials file for AWS, see AWS Custom Authentication.
- Profile Name (Optional) -- The AWS profile name. If this is left blank, then the default profile will be used. For more information about using profiles in a credentials or configuration file, see AWS's documentation on Configuration and credential file settings.
- No Authentication: Select this option when you are connecting the Nessie source to a public Amazon S3 bucket.
To connect to S3-compatible storage like MinIO:
Choose AWS Access Key for authentication and provide the access key and secret.
Click Add property under Connection Properties and add the following properties:
fs.s3a.path.style.accessand set the value to
This setting ensures that the request path is created correctly when using IP addresses or hostnames as the endpoint.
fs.s3a.endpointproperty and its corresponding server endpoint value (IP address).note
Do not include the http(s):// prefix in the endpoint value. For example, if the endpoint is http://126.96.36.199:9000, the value is 188.8.131.52:9000.
dremio.s3.compatand set the value to
AWS Root Path
The root path to the Amazon S3 bucket. It is recommended that you either have a dedicated S3 bucket or, at least, a dedicated folder in which to store Nessie objects.
Connection Properties (Optional)
When using Nessie as a source, tables and metadata files are stored in an Amazon S3 bucket. This section enables you to provide the custom key value pairs for the connection relevant to the source.
- Click Add Property.
- For Name, enter a connection property.
- For Value, enter the corresponding connection property value.
(Optional) To secure the connections between the S3 buckets and Dremio, tick the Encrypt connection checkbox.
After configuring the General and Storage options, you can either save your settings or continue on to set up the optional settings.
- To save the configuration, click Save.
- To configure the optional settings, proceed to Advanced Options.
Click Advanced Options in the left menu sidebar.
All advanced parameters are optional.
Review each option provided in the following table to set up the advanced options to meet your needs.
|Enable asynchronous access when possible
|Activated by default, uncheck the box to deactivate. Enables cloud caching for the S3 bucket to support simultaneous actions such as adding and editing a new source.
Under Cache Options, review the following table and edit the options to meet your needs.
|Enable local caching when possible
|Selected by default, along with asynchronous access for cloud caching. Uncheck the checkbox to disable this option. For more information about local caching, see Columnar Cloud Cache.
|Max percent of total available cache space to use when possible
|Specifies the disk quota, as a percentage, that a source can use on any single executor node only when local caching is enabled. The default is 100 percent of the total disk space available on the mount point provided for caching. You can either manually enter in a percentage in the value field or use the arrows to the far right to adjust the percentage.
The Reflection Refresh section allows you to set a schedule for refreshing all of the reflections that are defined on tables in the catalog. You can override this schedule on individual tables in different branches. This section also lets you specify how long all reflections in the catalog exist until they expire. Again, you can override this setting on individual tables in different branches.
Click Privileges in the left menu sidebar. This section grants privileges to specific users or roles. To learn more about how Dremio allows for the implementation of granular privileges, see Privileges.
All privileges are optional.
To add a privilege to a user or role:
- In the Add User/Role field, enter the user or role name that you want to apply privileges to and then click Add to Privileges. The user or role is added to the Users table.
To set privileges for a user or role:
- In the Users table, identify the user to set privileges for and click under the appropriate column (Select, Alter, Create Table, etc.) to either enable or disable that privilege. A green checkmark indicates that the privilege is enabled.
After you have completed configuring the Nessie source, click Save.
At this point, a connection with the Nessie server is attempted. If a connection cannot be made, report the issue to the Project Nessie community's Zulip channel. You can also file a ticket on the Project Nessie community's GitHub page.
Editing a Nessie Source
To edit a Nessie source:
From the Datasets page, click Nessie Catalogs. A list of Nessie sources is displayed.
On the All Nessie Catalogs page, hover over a Nessie source to show the Settings (gear) icon, then click the icon.note
Alternatively, you can hover over a Nessie source, click the More (...) option, and then click Settings.
In the Source Settings dialog, you can make changes to the settings for General, Storage, Advanced Options, or Privileges.
Removing a Nessie Source
To remove a Nessie source:
From the Datasets page, click Nessie Catalogs. A list of Nessie sources is displayed.
On the All Nessie Catalogs page, hover over a Nessie source, click the More (...) option, and then click Remove Source.
Click Remove to confirm removal of the source, or click Cancel to return to the All Nessie Catalogs page.
- Changes to tables and views that are in Nessie sources are not logged. Nessie sources do not have audit logs. DX-64988
- The Catalog API is unable to retrieve or manage Nessie sources. DX-64994
- Dremio does not support moving, copying, or renaming tables and views in Nessie sources or removing the format from tables in Nessie sources.