Administer Dremio on Kubernetes
This section includes topics about administering Dremio on supported Kubernetes environments, including information about monitoring logs, scaling pods, changing configurations, performing basic administrative tasks such as backing up, restoring, cleaning, and upgrading Dremio.
Monitoring Logs and Usage
Monitoring the cluster's resource usage (e.g., heap and direct memory, CPU, disk I/O, etc.) is crucial to maintaining long-term stability as the system scales. For this reason, Dremio recommends setting up a monitoring stack, such as Prometheus and Grafana. For a detailed setup tutorial and an overview of which metrics to track, see Dremio Monitoring in Kubernetes. For more information, see this PDF guide on the Dremio Enterprise Edition (Software) Shared Responsibility Model.
Managing Workloads
Most workloads can be handled with a Large (8 executors) or X-Large (12 executors) engine, each with 32 CPUs per executor. Larger engine sizes may be required for certain workloads. Over-parallelization of queries can cause performance degradation. Thus, packing workloads of all shapes or sizes onto a few very large engines is ill-advised. Workloads should be divided into high-cost and low-cost queries, and dedicated queues should be configured for tasks such as reflections, metadata refresh, and table optimization jobs. These can then be divided between right-sized engines. For more information, see Dremio's Well-Architected Framework.
Changing Your Configuration
If you need to update your configuration, you can do so after the installation by editing the configuration files and then upgrading using an upgrade command, for example:
helm upgrade <chart release name> oci://quay.io/dremio/dremio-helm -f <your-local-path>/values-overrides.yaml --version <helm-chart-version>
The upgrade process pushes your changes to all pods in your Kubernetes cluster and restarts the pods.
For example, to permanently change the resources of your coordinator pod:
-
Edit the
values-overrides.yaml
file and change the resources specified for the coordinator. In this example,memory
is32Gi
andcpu
is8
.coordinator:
resources:
limits:
memory: 32Gi
requests:
cpu: 8
memory: 32Gi -
Run the upgrade command. Replacing the template values:
helm upgrade <chart release name> oci://quay.io/dremio/dremio-helm -f <your-local-path>/values-overrides.yaml --version <helm-chart-version>
noteIf the command takes longer than a few minutes to finish, check the status of the pods with the
kubectl get pods
command. If the pods are pending scheduling due to limited memory or CPU, adjust the values you specified for the properties in thevalues-overrides.yaml
file or add more resources to your Kubernetes cluster.
Using Support Keys
Use support keys only when instructed by Dremio Support. If misused, they can alter the application's behavior and lead to unexpected failures.
Using the Dremio Admin CLI on Kubernetes
The Dremio Admin CLI is the mechanism to back up, restore, add internal users, etc. For more information on the various commands the see CLI reference previously linked. In order to run the CLI commands you need to access either the dremio-master-0
or dremio-admin
pod. This requires the use of the kubectl
command line tool and access to the Kubernetes cluster and namespace where Dremio is deployed.
Some CLI commands like Backup require Dremio to be online. This means Dremio must be deployed normally per Deploying Dremio to Kubernetes. When inspecting Dremio's pods, dremio-master-0
must be present and RUNNING
to be considered online.
Some CLI commands like Clean require Dremio to be offline. To use them, Dremio must be deployed and running in admin mode. If not, you must redeploy Dremio in admin mode. The requirements section for each command will note whether Dremio should be online or offline. If it is not mentioned, then the command will work in either case.
To redeploy Dremio in admin mode, you must run a helm upgrade
command where the DremioAdmin
flag is set to true
. Here is a templated example command:
helm upgrade <chart-release-name> oci://quay.io/dremio/dremio-helm -f <your-local-path>/values-overrides.yaml --version <helm-chart-version> --set DremioAdmin=true
This command will cause the shutdown of the Coordinators and Executors. In their place will start the dremio-admin
pod. Crucially, this pod will mount the dremio-master-0
volume allowing for operations on the constituent KV store.
To get command line access to the dremio-master-0
, dremio-admin
, or any pod for that matter, you would use the kubectl exec
command. Here is an example using the -it
option for interactive, and the -- bash
option to enter a bash session:
kubectl exec -it <pod-name> -- bash
Once you've entered the pod, you can run typical shell commands to explore the file system and execute commands. For more information, see kubectl exec.
The dremio-admin
utility is within the /opt/dremio/bin
directory of both the master and admin pods and can be used to execute the various Dremio Admin CLI commands.
To exit Dremio admin mode and restart the normal service, you must redeploy Dremio again using the command above and setting only DremioAdmin=false
.
Upgrading Dremio
This section assumes you're running Dremio version 26.0 or above.
To upgrade the Dremio platform, update the Helm chart version to the most recent, which tags the version of Dremio you want to upgrade to. The Dremio release notes will provide the corresponding Helm chart version. There will be Dremio Helm chart releases that do not upgrade Dremio but update some other component. However, with every Dremio release, there will be a Helm chart release where the image tags for the various services are updated. Enterprise customers can view a list of all Helm charts and Image tags on quay.io.
During the upgrade process, existing pods are terminated and new pods are created with the new images. After all the newly created pods are restarted and running, your Dremio cluster is upgraded.
If you do not know your Helm chart release name, use helm list
to list the Helm deployments in a selected namespace.
To upgrade Dremio:
- Ensure that your Dremio is backed up. For more information, see Backup.
- Ensure that no queries are running on the cluster, as any running queries will fail when services start terminating.
- Construct the appropeate
helm upgrade
command, for example:helm upgrade <chart-release-name> oci://quay.io/dremio/dremio-helm -f <your-local-path>/values-overrides.yaml --version <new-desired-version>
- Execute the helm upgrade command.
- Pods will begin restarting with the new images and, once finished, Dremio will be accessible.
Upgrading to Dremio Version 26
This section assumes you're running Dremio version 24 or 25, and are trying to upgrade to version 26.
For Enterprise customers, version Dremio 26 brought the v3 Helm charts with it. The former v2 Helm charts, distributed via the dremio-cloud-tool GitHub, used for Dremio versions 24 and 25, are not compatible with version 26.
It is possible to upgrade an existing deployment. However, Enterprise customers need to migrate from the v2 Helm charts to the v3 Helm charts before any upgrade can take place. The v3 Helm charts are distributed via our image repository Quay.io.
Customers must move the relevant content from their existing value.yaml
(and any other deployment-specific configurations like Identity Provider authentication) into the new values-overrides.yaml
configuration file, as detailed in Configuring your Values.
Some configurations can be left behind. For example, the new UI experience has superseded the executor configuration in the charts. For more information, see Managing Engines in Kubernetes.
Skip the next paragraph if you did not use the Executor HPA and node life cycle policy.
Before upgrading to version 26 if you intend to continue to use Classic Engines, the no longer supported node life cycle policy should be disabled. To check for this option, look at the executor section in your old Helm Charts values.yaml
and see if node_lifecycle_service_enabled: true
is set. If it's set to true
change it to false
and redeploy Dremio. If it's not present, that is the same as false
. Despite this if post upgrade you note the Executors of a Classic Engine marked as paused on the node activity panel you can resolve this with a call to Dremio's Blacklist API see, Allowing all Nodes.
Once the new values-overrides.yaml
and other deployment configurations are prepared, you can proceed with the upgrade.
For help with this process, please reach out to Dremio Support and your Account Executive. More detailed guides and help from Dremio's professional services team can be provided.
To upgrade Dremio:
- Ensure you have created a new
values-overrides.yaml
configuration file with relevant values from your existing deployment ported over per Configuring your Values - Ensure that your Dremio is backed up. For more information, see Backup.
- Ensure that no queries are running on the cluster, as any running queries will fail when services start terminating.
- Uninstall your existing Dremio deployment:
This will delete existing pods and remove other elements of the existing Dremio deployment. Crucially, it will not delete the
helm uninstall <chart-release-name>
dremio-master-0
volume, which contains the KV store and Dremio's state. - Confirm the
dremio-master-0
volume still exists in the namespace you want to reinstall Dremio. This can be confirmed with:Each of your executors should have left behind two volumes, but the master should have left only one.kubectl get pvc --namespace <dremio-install-namespace>
- We're now ready to install version 26. Follow the instructions in Deploying Dremio to Kubernetes to complete the installation.
Dremio must be deployed to the same location as the previous version to mount the dremio-master-0
volume. It's the content of this volume that is being upgraded.