Open Catalog
You can connect to Open Catalogs from other projects in your organization. When you add a catalog from another project, it appears as a connection in your project, letting you browse its namespaces and query its tables. Role-Based Access Control (RBAC) privileges and fine-grained access controls are enforced at the catalog level, so permissions apply regardless of which project accesses the data.
You can also manage the settings of the project's built-in Open Catalog. For instructions, see Manage Open Catalog.
Connect to an Open Catalog
To connect to a catalog from another project:
- In the Dremio console, click Add Data on the Home page.
- In the Add Data dialog, select Open Catalog.
- Configure the connection using the sections below, then click Save.
General
- Name – Select an Open Catalog from another project. You will only see projects in the dropdown where you have been granted access to their Open Catalog. If you do not see a project you expect, contact the project owner to request access.
Advanced Options
-
Enable local caching when possible: Selected by default. Along with asynchronous access for cloud caching, local caching can improve query performance.
-
Max percent of total available cache space to use when possible: Specifies the disk quota, as a percentage, available on any single executor node when local caching is enabled. The default is 100 percent of the total disk space available on the mount point provided for caching. You can either manually enter a percentage in the value field or use the arrows to the far right to adjust the percentage.
-
Connection Properties – You can add key-value pairs to provide custom connection properties.
- Click Add Property.
- For Name, enter a connection property.
- For Value, enter the corresponding connection property value.
Reflection Refresh
Control how often Reflections are automatically refreshed and when they expire. These settings are specific to each project using the catalog.
Refresh Settings
- Never refresh: Prevent automatic Reflection refresh. By default, Reflections refresh automatically.
- Refresh every: Set the refresh interval in hours, days, or weeks. Ignored if Never refresh is selected. Select Enable live refresh to automatically refresh Reflections when Iceberg table data changes.
- Set refresh schedule: Specify a daily or weekly refresh schedule.
Expire Settings
- Never expire: Prevent Reflections from expiring. By default, Reflections expire after the configured time limit.
- Expire after: The time limit after which Reflections are removed from Dremio, specified in hours, days, or weeks. Ignored if Never expire is selected.
Metadata
Configure how Dremio handles dataset definitions and metadata refresh. These settings are specific to each project using the catalog.
In Open Catalog, metadata refresh serves two purposes:
- Cache Refresh: Dremio maintains a project-level cache of table metadata to accelerate query planning and execution. Writes from Dremio query engines automatically update this cache. However, writes from other query engines only update snapshot metadata in object storage. Metadata refresh syncs these external changes into Dremio's cache to improve subsequent query performance.
- Lineage Computation: Metadata refresh recomputes lineage information to reflect the latest changes in lineage graphs.
Dataset Handling
- Remove dataset definitions if the underlying data is unavailable (Default) – When selected, Dremio removes dataset definitions if the underlying files are deleted or the folder/source becomes inaccessible. When deselected, Dremio retains dataset definitions even when data is unavailable. This is useful when files are temporarily deleted and replaced with new files.
Dataset Discovery
- Fetch every: How often to refresh top-level source object names (databases and tables). Set the interval in minutes, hours, days, or weeks. Default: 1 hour.
Dataset Details
Metadata Dremio needs for query planning, including field information, types, shards, statistics, and locality.
- Fetch mode: Choose to fetch metadata only from queried datasets. Dremio updates details for previously queried objects in the source. Default: Only Queried Datasets.
- Fetch every: How often to fetch dataset details, specified in minutes, hours, days, or weeks. Default: 1 hour.
- Expire after: When dataset details expire, specified in minutes, hours, days, or weeks. Default: 3 hours.
Privileges
This connection inherits privileges from Project settings. To grant specific users or roles additional privileges in this connection:
- Enter the username or role name that you want to grant access to and click the Add to Privileges button. The added user or role is displayed in the USERS/ROLES table.
- For the users or roles in the USERS/ROLES table, toggle the checkmark for each privilege you want to grant on the Dremio source that is being created.
- Click Save after setting the configuration.
See Privileges for additional information about privileges.
Edit an Open Catalog Connection
- On the Open Catalog page, under Connections, right-click the connection and select Settings.
- Update the connection configuration as needed.
- Click Save.
Delete an Open Catalog Connection
- On the Open Catalog page, under Connections, right-click the connection and select Delete.
- Click Delete to confirm.
You cannot delete your default Open Catalog as it is a core component of your project.