Dremio Catalog (External) Enterprise
The Dremio Catalog (external) source enables you to connect to Dremio Catalogs deployed in other Dremio instances. The connectivity is achieved via the Iceberg REST API. Once connectivity is established, users can read from and write to external catalogs. Additionally, user impersonation and vended credentials are on by default providing a consistent governance and security experience across your Dremio deployments.
Configuring Dremio Catalog (External)
To add a Dremio Catalog (external) source:
-
On the Datasets page, to the right of Sources in the left panel, click
.
-
In the Add Data Source dialog, under Lakehouse Catalogs, select Dremio Catalog (external).
The new Dremio Catalog (external) dialog box appears, which contains the following tabs:
-
General: Create a name for your Dremio Catalog (external) source, specify the endpoint URL and your OAuth token endpoint, and provide your Dremio personal access token (PAT).
-
Storage: (Optional) Define how you want to authenticate storage access. Dremio uses credential vending by default, but we provide the ability to override via master storage credentials.
-
Advanced Options: (Optional) Use catalog properties and credentials to set up storage authentication and authorization.
-
Reflection Refresh: (Optional) Set a policy to control how often reflections are refreshed and expired.
-
Metadata: (Optional) Specify dataset handling and metadata refresh.
-
Privileges: (Optional) Add privileges for users or roles.
Refer to the following sections for guidance on how to edit each tab.
-
General
To configure the source connection:
-
For Name, enter a name for the source.
noteThe name you enter must be unique in the organization. Also, consider a name that is easy for users to reference. This name cannot be edited once the source is created. The name cannot exceed 255 characters and must contain only the following characters: 0-9, A-Z, a-z, underscore(_), or hyphen (-)
-
For Dremio Catalog Endpoint URL, specify the Dremio Catalog endpoint URL of the target Dremio Catalog. An example is
http://dremio.example.com:19210/api/v2
. -
For OAuth Token Endpoint, specify the OAuth token endpoint. An example for is
http://dremio.example.com:9047/oauth/token
. -
For PAT token, specify the personal access token created in the target cluster. This PAT is used to authenticate to the cluster.
-
Allow Impersonation is on by default. This setting enables dremio to execute queries as the user that submits them. If user impersonation is not disabled the source credentials will be leveraged to access the catalog.
Storage
To configure the storage access, you need to choose how Dremio will authenticate with the underlying storage.
- Use master storage credentials enables Dremio instances to store storage location and secrets to access the underlying storage locations. If the Iceberg Tables data resides in storage locations other than those listed, Dremio will not be able to access the data.
Advanced Options
To set the advanced options:
-
(Optional) For Enable Asynchronous Access for Parquet Datasets, this option is enabled by default but you can uncheck the box to deactivate. Dremio enables asynchronous access and local caching when possible so that asynchronous requests do not wait for data to return from your storage. Activating this option can enable faster query times.
-
Under Cache Options, review the following table and edit the options to meet your needs.
Cache Options Description Enable local caching when possible Selected by default, along with asynchronous access for cloud caching, local caching can improve query performance. See Cloud Columnar Cache for details. Max percent of total available cache space to use when possible Specifies the disk quota, as a percentage, that a source can use on any single executor node only when local caching is enabled. The default is 100 percent of the total disk space available on the mount point provided for caching. You can either manually enter in a percentage in the value field or use the arrows to the far right to adjust the percentage.
Reflection Refresh
You can set the policy that controls how often reflections are scheduled to be refreshed automatically, as well as the time limit after which reflections expire and are removed. See the following options.
Option | Description |
---|---|
Never refresh | Select to prevent automatic reflection refresh, default is to automatically refresh. |
Refresh every | How often to refresh reflections, specified in hours, days or weeks. This option is ignored if Never refresh is selected. |
Set refresh schedule | Specify the daily or weekly schedule. |
Never expire | Select to prevent reflections from expiring, default is to automatically expire after the time limit below. |
Expire after | The time limit after which reflections expire and are removed from Dremio, specified in hours, days or weeks. This option is ignored if Never expire is selected. |
Metadata
Specifying metadata options is handled with the following settings.
Dataset Handling
- Remove dataset definitions if underlying data is unavailable (Default).
- If this box is not checked and the underlying files under a folder are removed or the folder/source is not accessible, Dremio does not remove the dataset definitions. This option is useful in cases when files are temporarily deleted and put back in place with new sets of files.
Metadata Refresh
These are the optional Metadata Refresh parameters:
-
Dataset Discovery: The refresh interval for fetching top-level source object names such as databases and tables. Set the time interval using this parameter.
Parameter Description Fetch every You can choose to set the frequency to fetch object names in minutes, hours, days, or weeks. The default frequency to fetch object names is 1 hour. -
Dataset Details: The metadata that Dremio needs for query planning such as information needed for fields, types, shards, statistics, and locality. These are the parameters to fetch the dataset information.
Parameter Description Fetch mode You can choose to fetch only from queried datasets. Dremio updates details for previously queried objects in a source. By default, this is set to Only Queried Datasets. Fetch every You can choose to set the frequency to fetch dataset details in minutes, hours, days, or weeks. The default frequency to fetch dataset details is 1 hour. Expire after You can choose to set the expiry time of dataset details in minutes, hours, days, or weeks. The default expiry time of dataset details is 3 hours.
Privileges
You have the option to grant privileges to specific users or roles. See Access Controls for additional information about privileges.
To grant access to a user or role:
-
For Privileges, enter the user name or role name that you want to grant access to and click the Add to Privileges button. The added user or role is displayed in the USERS/ROLES table.
-
For the users or roles in the USERS/ROLES table, toggle the checkmark for each privilege you want to grant on the Dremio source that is being created.
-
Click Save after setting the configuration.
Updating a Dremio Catalog (External)
To update a Dremio Catalog (External)source:
-
On the Datasets page, under Metastores in the panel on the left, find the name of the source you want to edit.
-
Right-click the source name and select Settings from the list of actions. Alternatively, click the source name and then the
at the top right corner of the page.
-
In the Source Settings dialog, edit the settings you wish to update. Dremio does not support updating the source name.
-
Click Save.
Deleting a Dremio Catalog (External) Source
If the source is in a bad state (for example, Dremio cannot authenticate to the source or the source is otherwise unavailable), only users who belong to the ADMIN role can delete the source.
To delete a Dremio Catalog (External) source:
-
On the Datasets page, click Sources > Lakehouse Catalogs in the panel on the left.
-
In the list of data sources, hover over the name of the source you want to remove and right-click.
-
From the list of actions, click Delete.
-
In the Delete Source dialog, click Delete to confirm that you want to remove the source.
Deleting a source causes all downstream views that depend on objects in the source to break.