Autoscaling in Dremio Cloud is implemented to dynamically manage the query workload. Dremio Cloud starts and stops engine replicas as required to provide a seamless query execution by monitoring the engine replica health.
The following table describes the engine parameters along with their role in autoscaling.
|Size||Designates the number and type of AWS instances that make up the engine.|
|Max Concurrency||Maximum number of jobs that can be run concurrently on an engine replica.|
|Last Replica Auto-Stop (secs)||Time to wait (in seconds) before deleting the last replica if the engine is not in use. Not valid when the minimum engine replicas is 1 or higher. The default value is 3600 seconds.|
|Enqueued Time Limit||If there are no available resources, the query waits in the engine’s queue for a period of time that is set by this parameter. When this time limit exceeds, the query gets canceled. You are notified with the timeout during slot reservation error if the query gets canceled due to the query time limit being exceeded. The default value is 300 seconds.|
|Drain Time Limit||Time (in seconds) until an engine replica continues to run after the engine is resized, disabled, or deleted before it is terminated and the running queries fail. The default value is 3600 seconds. If there are no queries running on a replica, the engine is terminated without waiting for the drain time limit.|
How Autoscaling Works
Each engine contains a size configuration parameter, which also defines the number of nodes. Each engine has a minimum replica and a maximum replica configuration.
For a query that is submitted to execute on an engine, the Dremio Cloud control plane assigns an engine replica to that query. Replicas are dynamically created and assigned to queries based on the query workload. The Dremio Cloud control plane observes the query workload and current active engine replicas to determine whether to scale up or scale down replicas. Replica is assigned to the query until the query execution is done. For a given engine, Dremio Cloud does not scale up replicas beyond the configured maximum replicas and it does not scale them down below the configured minimum replicas.
The following diagram provides a high-level overview of the autoscaling components.
Monitoring Executor Health
The Dremio Cloud control plane monitors the engines health and manages unhealthy replicas to provide a seamless query execution experience. The executors send periodic heartbeats to the Dremio Cloud control plane, which determines their liveness. If Dremio Cloud does not receive periodic heartbeats from an executor, the Dremio Cloud control plane marks that executor as unhealthy and replaces it with a healthy executor.