Skip to main content

Connect to a Dremio Enterprise Cluster

Connect to one or more Dremio Enterprise clusters to create a federated data architecture that combines the best of both environments.

This configuration enables:

  • Reduced query latency – Queries or portions of queries that utilize tables on the Enterprise cluster are pushed down to maximize performance and reduce latency compared to transporting large raw tables.
  • Cross-cluster data federation – Join data across multiple Dremio Enterprise clusters and expose unified views through Dremio.
  • Enhanced security and data isolation – Expose only a single Dremio port to the cloud instead of opening multiple connections from your data center. Administrators of the Enterprise cluster control what data is visible to the managing Dremio environment, allowing isolation of highly sensitive data on the Enterprise cluster while exposing only aggregations or derived datasets to the managing project.
  • Simplified data access – Access all connections on Enterprise clusters as schemas within Dremio without managing individual connections.
  • Centralized semantic layer – Build views and virtual datasets on top of federated clusters for consistent business logic across your organization.

When you connect to a Dremio Enterprise cluster, all connections on the Enterprise cluster can be available from your project. You can create Reflections, build views, and query across the federated environment, just as you would with data from direct connections.

Example Configuration

When you connect to a Dremio Enterprise cluster:

  • The Enterprise cluster appears under Connections in your project.
  • Connections to the Enterprise cluster appear as folders/schemas.
  • You can promote tables from the Dremio Enterprise cluster.
  • You can create views and Reflections on any promoted tables.
  • Query and join data across all connections—direct and federated.

Deployment Considerations

If your Dremio project and the connected Dremio Enterprise cluster are in different cloud regions or cloud vendors, your deployment design may be influenced by network latency and egress costs.

Network Latency

Cross-region or cross-cloud queries can experience increased latency. To minimize impact:

  • Use Reflections – Create Reflections in your Dremio project of frequently queried data from the Enterprise cluster. Queries use the Reflections instead of fetching data across regions.
  • Push down filters and aggregations – Write queries that leverage Dremio's query pushdown to perform filtering and aggregation on the Enterprise cluster before returning results.
  • Colocate when possible – If latency is critical, deploy the Enterprise cluster in the same region as your Dremio organization.

Cloud Egress Costs

Data transfer between cloud regions or cloud vendors can incur significant egress charges. To control costs:

  • Create Reflections for frequently used data – Reflection data is stored in your Dremio region, eliminating repeated egress charges for frequently accessed datasets.
  • Use aggregated views – Expose only aggregated or summarized data from the Enterprise cluster rather than raw tables, reducing data transfer volume.
  • Limit full table scans – Ensure queries include appropriate filters to minimize the amount of data transferred across regions.
  • Monitor query patterns – Use Dremio's query history to identify expensive cross-region queries and optimize them with Reflections.

Security

Configure full TLS wire encryption on Enterprise clusters to protect data in transit across regions and cloud boundaries.

User Impersonation

When you connect your project to a Dremio Enterprise cluster, you provide the username and password of an account on the cluster. By default, queries that run from the project against the Dremio Enterprise cluster run under the username of that account.

Alternatively, you can utilize user impersonation, which allows users running queries from your project to run them under their own usernames on the Dremio Enterprise cluster. Users in your project must have accounts on the Dremio Enterprise cluster, and the usernames must match. User impersonation (also known as Inbound Impersonation) must be set up on the Dremio Enterprise cluster. The policy for user impersonation would look like this:

Example policy
ALTER SYSTEM SET "exec.impersonation.inbound_policies"='[
{
"proxy_principals":{
"users":[
"User_1"
]
},
"target_principals":{
"users":[
"User_1"
]
}
}
]'

Prerequisites

You must have a username and password for the account on the Dremio Enterprise cluster to use for connections from your project.

Connect to a Dremio Enterprise Cluster

  1. In the Dremio console, click Add Data on the Home page.
  2. Select Dremio Enterprise Cluster.
  3. Configure the connection using the sections below, then click Save.

General Options

  1. In the Name field, specify the connection name. The name cannot include the following special characters: /, :, [, or ].
  2. Under Connection, specify how you want to connect to the Dremio Enterprise cluster:
    • Direct: Connect directly to a coordinator node of the cluster.
    • ZooKeeper: Connect to an external ZooKeeper instance that is coordinating the nodes of the cluster.
  3. In the Host and Port fields, specify the hostname or IP address and the port number of the coordinator node or ZooKeeper instance.
  4. If the Dremio Enterprise cluster is configured to use TLS for connections to it, select the Use SSL option.
  5. Under Authentication, specify the username and password for the project to use when connecting to the Dremio Enterprise cluster.

Advanced Options

On the Advanced Options page, you can set values for these optional parameters:

  • Maximum Idle Connections – The total number of connections allowed to be idle at a given time. The default is 8.
  • Connection Idle Time – The amount of time (in seconds) allowed for a connection to remain idle before the connection is terminated. The default is 60 seconds.
  • Query Timeout – The amount of time (in seconds) allowed to wait for the results of a query. If this time expires, the connection being used is returned to an idle state.
  • User Impersonation – Allows users to run queries on the Dremio Enterprise cluster under their own user IDs, not the user ID for the account used to authenticate with the cluster. Inbound impersonation must be configured on the cluster.

Reflection Refresh Options

On the Reflection Refresh page, set the policy that controls how often Reflections are scheduled to be refreshed automatically, as well as the time limit after which Reflections expire and are removed.

  • Never refresh – Select to prevent automatic Reflection refresh. The default is to automatically refresh.
  • Refresh every – How often to refresh Reflections, specified in hours, days, or weeks. This option is ignored if Never refresh is selected.
  • Never expire – Select to prevent Reflections from expiring. The default is to automatically expire after the time limit below.
  • Expire after – The time limit after which Reflections expire and are removed from Dremio, specified in hours, days, or weeks. This option is ignored if Never expire is selected.

Metadata Options

On the Metadata page, you can configure settings to refresh metadata and handle datasets.

Dataset Handling

  • Remove dataset definitions if underlying data is unavailable – By default, Dremio removes dataset definitions if underlying data is unavailable. This is useful when files are temporarily deleted and added back in the same location with new sets of files.

Metadata Refresh

These are the optional Metadata Refresh parameters:

  • Dataset Discovery: The refresh interval for fetching top-level object names such as databases and tables. Set the time interval using this parameter.
    • Fetch every (Optional) – You can choose to set the frequency to fetch object names in minutes, hours, days, or weeks. The default is 1 hour.
  • Dataset Details: The metadata that Dremio needs for query planning, such as information required for fields, types, shards, statistics, and locality. These are the parameters to fetch the dataset information:
    • Fetch mode – You can choose to fetch only from queried datasets, which is set by default. Dremio updates details for previously queried objects. Fetching from all datasets is deprecated.
    • Fetch every – You can choose to set the frequency to fetch dataset details in minutes, hours, days, or weeks. The default is 1 hour.
    • Expire after – You can choose to set the expiry time of dataset details in minutes, hours, days, or weeks. The default is 3 hours.

Privileges

On the Privileges page, you can grant privileges to specific users or roles. See Access Control for additional information about user privileges.

  1. (Optional) For Privileges, enter the username or role name that you want to grant access to and click the Add to Privileges button. The added user or role is displayed in the Users table.
  2. (Optional) For the users or roles in the Users table, toggle the green checkmark for each privilege you want to grant.
  3. Click Save after setting the configuration.

Edit a Dremio Enterprise Cluster Connection

  1. On the Open Catalog page, under Connections, right-click the connection and select Settings.
  2. Update the connection configuration as needed.
  3. Click Save.

Delete a Dremio Enterprise Cluster Connection

  1. On the Open Catalog page, under Connections, right-click the connection and select Delete.
  2. Click Delete to confirm.

Predicate Pushdowns

Projects offload these operations to Dremio Enterprise clusters. Clusters either process these operations or offload them to their connected data sources.

&&, ||, !, AND, OR
+, -, /, *, %
<=, <, >, >=, =, <>, !=
ABS
ADD_MONTHS
AVG
BETWEEN
CASE
CAST
CEIL
CEILING
CHARACTER_LENGTH
CHAR_LENGTH
COALESCE
CONCAT
CONTAINS
COUNT
COUNT_DISTINCT
COUNT_DISTINCT_MULTI
COUNT_FUNCTIONS
COUNT_MULTI
COUNT_STAR
CURRENT_DATE
CURRENT_TIMESTAMP
DATE_ADD
DATE_DIFF
DATE_SUB
DATE_TRUNC
DATE_TRUNC_DAY
DATE_TRUNC_HOUR
DATE_TRUNC_MINUTE
DATE_TRUNC_MONTH
DATE_TRUNC_QUARTER
DATE_TRUNC_WEEK
DATE_TRUNC_YEAR
DAYOFMONTH
DAYOFWEEK
DAYOFYEAR
EXTRACT
FLATTEN
FLOOR
ILIKE
IN
IS DISTINCT FROM
IS NOT DISTINCT FROM
IS NOT NULL
IS NULL
LAST_DAY
LCASE
LEFT
LENGTH
LIKE
LOCATE
LOWER
LPAD
LTRIM
MAX
MEDIAN
MIN
MOD
NEXT_DAY
NOT
NVL
PERCENTILE_CONT
PERCENTILE_DISC
PERCENT_RANK
POSITION
REGEXP_LIKE
REPLACE
REVERSE
RIGHT
ROUND
RPAD
RTRIM
SIGN
SQRT
STDDEV
STDDEV_POP
STDDEV_SAMP
SUBSTR
SUBSTRING
SUM
TO_CHAR
TO_DATE
TRIM
TRUNC
TRUNCATE
UCASE
UPPER
VAR_POP
VAR_SAMP

Limitations

You cannot query columns that use complex data types, such as LIST, STRUCT, and MAP. Columns of complex data types do not appear in result sets.