Skip to main content

Workload Management

This topic covers how to manage resources and workloads by routing queries to particular engines through rules.

Overview

You can manage Dremio workloads via routing rules, which are evaluated at runtime (before query planning) to decide which query engine to use for a given query. In projects with only one engine, all queries share the same execution resources and route to the same single engine. However, when multiple engines are provisioned, rules determine the engine to be used.

You must arrange the rules in the order that you want them to be evaluated. In the case that multiple rules evaluate to true for a given query, the first rule that returns true will be used to select the engine. The following diagram shows a series of rules that are evaluated when a job gets submitted.

  • Rule1 routes jobs to Engine1
  • Rule2 routes jobs to Engine2
  • Rule3 routes jobs to the default engine that was created on project start up
  • Rule4 rejects the jobs that evaluate to true

Rules

You can use Dremio SQL syntax to specify rules to target particular jobs.

The following are the types of rules that can be created along with examples.

User

Create a rule that identifies the user that triggers the job.

Create rule that identifies user
USER in ('JRyan','PDirk','CPhillips')

Group Membership

Create a rule that identifies if the user that triggers the job is part of a particular group.

Create rule that identifies whether user belongs to a specified group
is_member('MarketingOps') OR
is_member('Engineering')

Job Type

Create a rule depending on the type of job. The types of jobs can be identified by the following categories:

  • Flight
  • JDBC
  • Internal Preview
  • Internal Run
  • Metadata Refresh
  • ODBC
  • Reflections
  • REST
  • UI Download
  • UI Preview
  • UI Run
Create rule based on type of job
query_type() IN ('JDBC', 'ODBC', 'UI Run', 'Flight')

Query Label

Labels enable rules that route queries running named commands to specific engines. Dremio supports the following query labels:

Query LabelDescription
COPYAssigned to all queries running a COPY INTO SQL command
CTASAssigned to all queries running a CREATE TABLE AS SQL command
DMLAssigned to all queries running an INSERT, UPDATE, DELETE, MERGE, or TRUNCATE SQL command
OPTIMIZATIONAssigned to all queries running an OPTIMIZE SQL command

Here are two example routing rules:

Create a routing rule for queries running a COPY INTO command
query_label() IN ('COPY')
Create a routing rule for queries running the DML commands INSERT, UPDATE, DELETE, MERGE, or TRUNCATE
query_label() IN ('DML')

Query Attributes

Query attributes enable routing rules that direct queries to specific engines based on their characteristics.

Dremio supports the following query attributes:

Query AttributeDescription
DREMIO_MCPSet when the job is submitted via the Dremio MCP Server.
AI_AGENTSet when the job is submitted via the Dremio AI Agent.
AI_FUNCTIONSSet when the job contains AI functions.

You can use the following functions to define routing rules based on query attributes:

FunctionApplicable AttributeDescription
query_has_attribute(<attr>)DREMIO_MCP, AI_AGENT, AI_FUNCTIONSReturns true if the specified attribute is present.
query_attribute(<attr>)DREMIO_MCP, AI_AGENT, AI_FUNCTIONSReturns the value of the attribute (if present), otherwise NULL.
query_calls_ai_functions()NAReturns true if the job has an AI function in the query.

Examples:

Create a routing rule for queries that use AI functions and are executed by a user
query_calls_ai_functions() AND USER = 'JRyan' 
Create a routing rule for queries with DREMIO_MCP and AI_FUNCTION
query_has_attribute('DREMIO_MCP') AND query_has_attribute('AI_FUNCTIONS')

Tag

Create a rule that routes jobs based on a routing tag.

Create rule that routes jobs based on routing tag
tag() = 'ProductionDashboardQueue'

Date and Time

Create a rule that routes a job based on the time it was triggered. Use Dremio SQL Functions.

Create rule that routes jobs based on time triggered
EXTRACT(HOUR FROM CURRENT_TIME)
BETWEEN 9 AND 18

Combined Conditions

Create rules based on multiple conditions.

The following example routes a job depending on user, group membership, query type, query cost, tag, and the time of day that it was triggered.

Create rule based on user, group, job type, query cost, tag, and time triggered
(
USER IN ('JRyan', 'PDirk', 'CPhillips')
OR is_member('superadmins')
)
AND query_type IN ('ODBC')
AND EXTRACT(HOUR FROM CURRENT_TIME)
BETWEEN 9 AND 18

Default Rules

Each Dremio project has its own set of rules. When a project is created, Dremio automatically creates rules for the default and preview engines. You can edit these rules as needed.

OrderRule NameRuleEngine
1UI Previewsquery_type() = 'UI Preview'preview
2Reflectionsquery_type() = 'Reflections'default
3All Other QueriesAll other queriesdefault

View All Rules

To view all rules:

  1. Click the Project Settings This is the icon that represents the Project Settings. icon in the side navigation bar.
  2. Select Engine Routing in the project settings sidebar to see the list of engine routing rules.

Add a Rule

To add a rule:

  1. On the Engine Routing page, click the Add Rule button at the top-right corner of the screen.

  2. In the New Rule dialog, for Rule Name, enter a name.

  3. For Conditions, enter the routing condition. See Rules for supported conditions.

  4. For Action, complete one of the following options:

    a. If you want to route the jobs that meet the conditions to a particular engine, select the Route to engine option. Then use the engine selector to choose the engine.

    b. If you want to reject the jobs that meet the conditions, select the Reject option.

  5. Click Add.

Edit a Rule

To edit a rule:

  1. On the Engine Routing page, hover over the rule and click the Edit Rule This is the icon that represents the Edit Rule settings. icon that appears next to the rule.

  2. In the Edit Rule dialog, for Rule Name, enter a name.

  3. For Conditions, enter the routing condition. See Rules for supported conditions.

  4. For Action, complete one of the following options:

    a. If you want to route the jobs that meet the conditions to a particular engine, select the Route to engine option. Then use the engine selector to choose the engine.

    b. If you want to reject the jobs that meet the conditions, select the Reject option.

  5. Click Save.

Delete a Rule

To delete a rule:

  1. On the Engine Routing page, hover over the rule and click the Delete Rule This is the icon that represents the Delete Rule settings. icon that appears next to the rule.
caution

You must have at least one rule per project to route queries to a particular engine.

  1. In the Delete Rule dialog, click Delete to confirm.

Set and Reset Engines

The SET ENGINE SQL command is used to specify the exact execution engine to run subsequent queries in the current session. When using SET ENGINE, WLM rules and direct routing connection properties are bypassed, and queries are routed directly to the specified queue. The RESET ENGINE command clears the session-level engine override, reverting query routing to follow the Workload Management (WLM) rules or any direct routing connection property if set.

SET TAG

The SET TAG SQL command is used to specify routing tag for subsequent queries in the current session. If a ROUTING_TAG connection property is already set for the session, SET TAG will override it. When using SET TAG, you must have a previously defined Workload Management (WLM) routing rule that routes queries based on that routing tag. The RESET TAG command clears the session-level routing tag override, reverting query routing to follow the Workload Management (WLM) rules or any direct routing connection property if set.

Connection Tagging and Direct Routing Configuration

Routing tags are configured by setting the ROUTING_TAG = <Tag Name> parameter for a given session to the desired tag name.

JDBC Session Configuration

To configure JDBC sessions add the ROUTING_TAG parameter to the JDBC connection URL. For example: jdbc:dremio:direct=localhost;ROUTING_TAG='TagA'.

ODBC Session Configuration

Configure ODBC sessions as follows:

Windows Sessions

Add the ROUTING_TAG parameter to the AdvancedProperties parameter in the ODBC DSN field.

Mac OS Sessions

  1. Add the ROUTING_TAG parameter to the AdvancedProperties parameter in the system odbc.ini file located at /Library/ODBC/odbc.ini. After adding the parameter, an example Advanced Properties configuration might be: AdvancedProperties=CastAnyToVarchar=true;HandshakeTimeout=5;QueryTimeout=180;TimestampTZDisplayTimezone=utc;NumberOfPrefetchBuffers=5;ROUTING_TAG='TagA';

  2. Add the ROUTING_TAG parameter to the AdvancedProperties parameter in the user's DSN located at ~/Library/ODBC/odbc.ini

Best Practices for Workload Management

Because every query workload is different, engine sizing often depends on several factors, such as the complexity of queries, number of concurrent users, data sources, dataset size, file and table formats, and specific business requirements for latency and cost. Workload management (WLM) ensures reliable query performance by choosing adequately sized engines for each workload type, configuring engines, and implementing query routing rules to segregate and route query workload types to appropriate engines.

This section describes best practices for adding and using Dremio engines, as well as configuring WLM to achieve reliable query performance in Dremio. This section also includes tips for migrating from self-managed Dremio Software to fully managed Dremio and information about using the system table sys.project.history.jobs, which stores metadata for historical jobs executed in a project, to assess the efficacy of WLM settings and make adjustments.

Set Up Engines

As a fully managed offering, Dremio is the best deployment model for Dremio in production because it allows you to achieve high levels of reliability and durability for your queries and maximize resource efficiency with engine autoscaling and does not require you to manually create and manage engines.

Segregating workload types into separate engines is vital for mitigating noisy neighbor issues, which can jeopardize performance reliability. You can segregate workloads by type, such as ad hoc, dashboard, and lakehouse (COPY INTO, DML, and optimization), as well as by business unit to facilitate cost distribution.

Metadata and Reflection refresh workloads should have their own engines for executing metadata and Reflection refresh queries. These internal queries can use a substantial amount of engine bandwidth, so assigning separate engines ensures that they do not interfere with user-initiated queries. Initial engine sizes should be XSmall and Small, but these sizes may change depending on the number and complexity of Reflection refresh and metadata jobs.

Dremio recommends the following engine setup configurations:

  • Dremio offers a range of engine sizes. Experiment with typical queries, concurrency, and engine sizes to establish the best engine size for each workload type based on your organization's budget constraints and latency requirements.

  • Maximum concurrency is the maximum number of jobs that Dremio can execute concurrently on an engine replica. Dremio provides an out-of-the-box value for maximum concurrency based on engine size, but we recommend testing with typical queries directed to specific engines to determine the best maximum concurrency values for your query workloads.

  • Dremio offers autoscaling to meet the demands of dynamic workloads with engine replicas. It is vital to assess and configure each engine's autoscaling parameters based on your organization's budget constraints and latency requirements for each workload type. You can choose the minimum and maximum number of replicas for each engine and specify any advanced configuration as needed. For example, dashboard workloads must meet stringent low-latency requirements and are prioritized for performance rather than cost. Engines added and assigned to execute the dashboard workloads may therefore be configured to autoscale using replicas. On the other hand, an engine for ad hoc workloads may have budget constraints and therefore be configured to autoscale with a maximum of one replica.

Route Workloads

Queries are routed to engines according to routing rules. You may use Dremio's out-of-the-box routing rules that route queries to the preview engines that are established by default, but Dremio recommends creating custom routing rules based on your workloads and business requirements. Custom rules can include factors such as user, group membership, job type, date and time, query label, and tag. Read Rules for examples.

The following table lists example routing rules based on query_type, query_label, and tags:

OrderRule NameRuleEngine
1Reflectionsquery_type() = 'Reflections'Reflection
2Metadataquery_type() = 'Metadata Refresh'Metadata
3Dashboardstag() = 'dashboard'Dashboard
4Ad hoc Queriesquery_type() IN ( 'UI Run' , 'REST') OR tag() = 'ad hoc'Ad hoc
5Lakehouse Queriesquery_label() IN ('COPY','DML','CTAS', 'OPTIMIZATION')Lakehouse
6All Other QueriesAll other queriesPreview

Use the sys.project.history.jobs System Table

The sys.project.history.jobs system table contains metadata for recent jobs executed in a project, including time statistics, cost, and other relevant information. You can use the data in the sys.project.history.jobs system table to evaluate the effectiveness of WLM settings and make adjustments based on job metadata.

Use Job Analyzer

Job Analyzer is a package of useful query and view definitions that you may create over the sys.projects.history.jobs system table and use to analyze job metadata. Job Analyzer is available in a public GitHub repository.