Skip to main content
Version: current [25.x]

Azure AKS

Learn more about the deployment architecture, requirements, and recommendations for installing Dremio in a Kubernetes cluster on Azure Kubernetes Service (AKS).

Architecture

Azure AKS Diagram

High Availability

High availability is dependent on the Kubernetes infrastructure. If any of the Kubernetes pods go down for any reason, Kubernetes brings up another pod to replace the pod that is out of commission.

  • The Dremio master-coordinator and secondary-coordinator pods are each StatefulSet. If the master-coordinator pod goes down, it recovers with the associated persistent volume and Dremio metadata preserved.

  • The Dremio executor pods are a StatefulSet with an associated persistent volume. secondary-coordinator pods do not have a persistent volume. If an executor pod goes down, it recovers with the associated persistent volume and data preserved.

Requirements

Kubernetes Version

To ensure compatibility and support, Dremio requires regular updates to your Kubernetes version in AKS to stay on an officially supported version. For details on supported versions, refer to the Microsoft documentation on Kubernetes versions.

Helm Charts

Dremio requires regular updates to the latest version of the official Dremio Helm charts, which is available in the dremio-cloud-tools repository on GitHub. When using the official Dremio Helm charts, it is strongly recommended to commit the Helm configuration file (i.e. values.yaml file) along with the rest of the Helm chart to a version control system such as Git.

Dremio Docker Image

Dremio requires using the official Dremio Docker image. Any modifications to this image must be preapproved by Dremio before use, and Dremio does not support the inclusion or execution of other applications within the Dremio image. Separate applications must be run in their own containers to avoid potential interference with the Dremio application.

Distributed Storage

Dremio requires Azure Data Lake Storage Gen2 (ADLS) to be configured as distributed storage on AKS.

ZooKeeper Pods

Three ZooKeeper pods are required, which can run on the default AKS system pool. Each ZooKeeper pod requires an additional node and should have 1 CPU and 2 GB of memory. For resiliency against individual node failures, these pods are expected to run on separate nodes.

Disk Storage Class

Dremio requires the AKS Storage Class “managed-premium” (i.e. Azure Managed Disks) for the following storages:

  • Coordinator volume: 512 GB
  • Executor volume #1 (results & spilling): 256 GB
  • Executor volume #2 (Columnar Cloud Cache): 256 GB
  • ZooKeeper volume: 16 GB

It is explicitly not recommended to use Azure Files-based storage, such as NFS, for the coordinator volume due to insufficient I/O performance for the KV store.

note

For executors, if the underlying node provides at least 500 GB of local SSD/NVMe storage (e.g., Azure Standard_D32ds_v5 VMs with 1200 GB), it is recommended to use these local disks instead of Azure Managed Disks. Since executor data is cached or consists of result sets, these disks do not need to be reattachable in case of a node failure. Local NVMe storage provides significantly faster I/O compared to network-attached "managed-premium" disks.

Recommendations

Container Registry

Required Docker images (e.g., Dremio Enterprise Edition, ZooKeeper, Nessie, etc.) should be pushed to a private container registry (e.g., Azure Container Registry), because Dremio does not provide any service-level agreements (SLAs) for Quay.io and docker.io repositories.

Node Sizes

When determining node sizes, the amount of CPU and memory available for Dremio will always be less than the total node capacity, because Kubernetes, the system pods, and the operating system also consume resources. Typically, 2 CPUs and 10-20 GB of memory are subtracted from the node's theoretical maximum when allocating resources to the Dremio pods.

For Azure, Dlsv5 (no local storage but Premium disk support for coordinators) and Ddsv5 (local storage and Premium disk support for executors) VM series are recommended. Both are part of the same family as Ddv5, which is used in Dremio Cloud.

Coordinators

For coordinators, Azure D32pls_v5VMs (or equivalent) are recommended, with at least 32 CPUs and 64 GB of memory, offering a CPU-to-memory ratio of 1:2. In Helm charts, this configuration results in approximately 30 CPUs and 54 GB of memory allocated to the pod.

For more information on JVM Garbage Collection (GC) parameters, see G1GC settings for the Dremio JVM.

If workload demands exceed the limits of a single coordinator, additional scale-out coordinators should be added. For more information on when and how to scale coordinators, see this PDF guide on Evaluating Coordinator Scaling.

note

To enable flexible scaling, it is recommended to set the AKS coordinator node pool to autoscale.

Executors

For executors, Azure Standard_D32ds_v5 VMs are recommended to achieve a node size of 32 CPUs and 128 GB of memory, maintaining a CPU-to-memory ratio of 1:4. In Helm charts, this results in approximately 30 CPUs and 108 GB of memory allocated to the pod.

note

To allow elastic resources scaling, it is required to set the executor node pool in AKS to autoscale.

Load Balancing

Although the provided Helm charts include a basic load balancer server configuration, it is strongly recommended to operate the load balancer independently (e.g., via Nginx Helm charts or a load balancer provided by the AKS platform).

Network

The network bandwidth should be at least 10 Gbps.

For container networking within the AKS cluster, it is strongly recommended to select AzureCNI during cluster creation, as this cannot be changed later. Kubenet is not recommended.

Setting Up an AKS Cluster

To set up a Kubernetes cluster on AKS, use the Azure portal or Azure CLI. The following steps are for the Azure Portal:

  1. Sign in to the Azure portal.

  2. Create an AKS cluster (see Quickstart: Deploy an Azure Kubernetes Service (AKS) cluster using the Azure portal for more information).

    • Make sure to create a node group with an instance type that has 16CPU and 128GB of memory (E16s_v3 is recommended).

    • The number of allocated nodes in the AKS cluster must be equivalent to the number of Dremio executors plus one (1) for the Dremio master-coordinator.

  3. Connect to the cluster (see Quickstart: Deploy an Azure Kubernetes Service (AKS) cluster using the Azure portal for more information).

  4. Install Helm (see Install existing Apps with Helm in Azure Kubernetes Service (AKS) for more information).

    note

    If you plan on using a local shell for access, install Helm.

  5. Deploy Dremio on AKS following the steps in the dremio-cloud-tools repository on GitHub for Installing Dremio on Kubernetes.

For More Information