Version: current [26.x]

Capacity Planning and Scaling

You can use this framework to plan capacity, monitor performance, and make scaling decisions for your Dremio Enterprise cluster as it grows. This framework covers proactive and reactive scaling to sustain query performance under increasing workloads and user concurrency. For protecting the cluster against memory exhaustion through Workload Management guardrails, see Initial Workload Management Settings.

Principles

Workload Isolation: Different workloads must not compete for the same resources.
Standardization: Scaling is supported by adding engines or changing to a different Dremio-prescribed engine size.
Proactive Scaling: Add capacity based on leading indicators, not reactive metrics.
Reactive Controls: Use lagging indicators to trigger corrective scaling actions.
Operational Transparency: Performance is monitored through specific query error thresholds and query stage SLAs, with defined corrective actions for each.

Separate Workloads

If metadata is separate between workloads, use separate Dremio clusters to isolate change management schedules and resource and upgrade decisions.

If data or metadata is shared across workloads, use separate engines aligned to workload type:

Workload Type	Description
Reflection refreshes	Reflection planning and refreshes
Metadata refreshes	Catalog and metadata operations
User queries	Low-cost and high-cost user queries
ETL workloads	CTAS and Iceberg data modification operations

In multi-tenant environments such as multiple departments or geographies, use a low-cost and high-cost query engine per tenant to support chargeback models.

Using engines provides platform stability; if one engine goes down, it won’t affect others. You can start and stop engines on demand and size them differently based on workload patterns.

This architecture enforces isolation, prevents resource contention, and enables targeted scaling. For more information, see Managing Engines in Kubernetes.

Scale Engines

Dremio supports two scaling mechanisms:

Horizontal scaling: Add one or more engines to increase execution capacity. Each engine runs a Dremio-prescribed set of executors. This is the primary mechanism for handling growth in concurrent users, query volume, and new workloads.
Vertical scaling: Change an engine to a larger Dremio-prescribed size to increase the resources available to that engine. For the main coordinator, vertical scaling addresses bottlenecks in query planning and metadata operations.

Scaling outside of these mechanisms, such as by adding arbitrary executor nodes or resizing outside of prescribed sizes, is not supported. For supported engine sizes, see Sizes.

For recommended configurations, see Server or Instance Hardware.

Plan Capacity

Capacity decisions are driven by two types of indicators. Leading indicators signal that demand is growing and allow you to scale proactively before performance degrades. Lagging indicators signal that the cluster is already under pressure and require corrective action.

Indicator	Scenario	Action
Leading	New use cases or analytics workloads	Deploy a minimum of two engines of the appropriate size. Dremio distributes queries evenly across engines using round-robin distribution.
Leading	Concurrency or volume growth	Deploy one or more additional engines of the appropriate size.
Leading	Growth in query volume or complexity	Provision 25% buffer capacity within each engine.
Lagging	Query execution errors (for example, out-of-memory or resource errors)	Deploy one or more additional engines of the appropriate size and rebalance workloads using Workload Management (WLM) rules.
Lagging	SLA breaches on pending, planning, or execution planning stages	Vertically scale the main coordinator.

If WLM memory limits have not yet been configured, set these before adding engines. For guidance, see Initial Workload Management Settings.

Monitor Scaling Triggers

The following tables define recommended thresholds and corrective actions for query execution errors and query stage performance.

Query Execution Errors

Use outcomeReason in queries.json to identify resource-related query failures.

Error	Recommended Threshold	Action
`OUT_OF_MEMORY`	1% of queries running out of direct memory	Add engine and move workload
`RESOURCE_ERROR`	1% of queries running out of heap memory	Add engine and move workload
Execution Setup Exception	1% of queries exhibiting node disconnects	Add engine and move workload
ChannelClosedException (fabric server)	1% of queries exhibiting node disconnects	Add engine and move workload
`CONNECTION ERROR: Exceeded timeout`	1% of queries exhibiting node disconnects	Add engine and move workload

Query Stage Performance SLAs

Use queries.json via an external monitoring tool or the query profile to monitor query stage durations. p90 (the 90th percentile) means that 90% of queries complete within the measured duration.

Stage	Field in `queries.json`	Recommended Threshold	Action
0: Total Duration	`finish - start`	p90 — align with business need	See stages 1–8
1: Pending	`pendingTime`	p90 never above 2 seconds	Vertically scale the main coordinator
2: Metadata Retrieval	`metadataRetrievalTime`	p90 never above 5 seconds	Switch to Iceberg table format if using raw Parquet
3: Planning	`planningTime`	p90 never above 2 seconds	Vertically scale the main coordinator
4: Engine Start	`engineStartTime`	N/A	N/A
5: Queued	`queuedTime`	p90 never above 2 seconds	Add engine and move workload
6: Execution Planning	`executionPlanningTime`	p90 never above 2 seconds	Vertically scale the main coordinator
7: Starting	`startingTime`	p90 never above 2 seconds	Add engine and move workload
8: Running	`runningTime`	p90 — align with business need	Add engine and move workload

Principles​

Separate Workloads​

Scale Engines​

Plan Capacity​

Monitor Scaling Triggers​

Query Execution Errors​

Query Stage Performance SLAs​