Administering Dremio on AKS
This topic discusses administration activities such as monitoring logs; scaling pods; changing configurations; performing basic administrative tasks such as backing up, restoring, and cleaning; and upgrading Dremio.
The example commands listed below assume that the current command line location is within the latest Dremio Helm chart, dremio-cloud-tools/charts/dremio_v2
, on the client machine that interacts with Kubernetes.
You must maintain any changes you make to the Helm values or configuration files in the dremio-cloud-tools/charts/dremio_v2
directory in your local copy of dremio-cloud-tools.
Monitoring Logs and Usage
Monitoring the cluster's resource usage (e.g., heap and direct memory, CPU, disk I/O, etc.) is crucial to maintaining long-term stability as the system scales. For this reason, it is highly recommended to set up a monitoring stack, such as Prometheus and Grafana. For a detailed setup tutorial and an overview on which metrics to track, see Dremio Monitoring in Kubernetes. For more information, see this PDF guide on the Dremio Shared Responsibility Model.
For monitoring logs, see Logs for more information. You can retrieve logs from the Dremio console or directly from Kubernetes. You can also write logs to a file on disk in addition to stdout. Read Writing Logs to a File for details.
Managing Workloads
Limit engine sizes to a maximum of 10 executor pods (with 32 CPUs and 128 GB of memory) to prevent over-parallelization of queries. Workloads should be split into high-cost and low-cost queries, and dedicated queues should be configured for reflections, metadata refresh, and table optimization jobs. For more information, see Dremio's Well-Architected Framework.
Scaling Up or Down
Scaling up or down refers to increasing or decreasing the number of Dremio pods (executors or scale-out coordinators). All scaling values remain in effect until you run another helm upgrade
command.
Scaling up and down the master-coordinator is not supported. Do not update the master-coordinator pod count from the default value, 1, as there must be exactly one master-coordinator in the Dremio cluster to maintain stability and ensure connectivity. Scaling to 0 effectively terminates the Dremio cluster.
Scaling down the number of executor pods, whether temporarily or permanently, may cancel queries if you are not using Dremio Enterprise Edition 24.3 or above with autoscaling.
-
Run the
Get chart release namehelm list
command to retrieve the chart release name. In the example below, the chart release name is plundering-alpaca.helm list
NAME REVISION UPDATED STATUS CHART NAMESPACE
plundering-alpaca 1 Wed Jul 18 09:36:14 2018 DEPLOYED dremio-0.0.5 default -
Run the
Helm upgrade command examplehelm upgrade --wait <chart_release_name> . --set <dremio pod=value>
command. Replace<chart_release_name>
with your chart release name. For example, Dremio executor pods could be scaled up or down with the following command, which changes the pod count to5
:helm upgrade --wait <chart_release_name> . --set executor.count=5
Resetting to Defaults
After you scale up or down the number of Dremio pods, if you run helm upgrade
again (whether for scaling, changing your configuration, or upgrading), the configuration resets to the defaults specified in the values.yaml file.
All scaling values remain in effect until you run the helm upgrade
command. When you run a subsequent helm upgrade
command, values are reset to the default in the value.yaml file. For example, if you scale up the secondary-coordinators to 3 and then scale up the executors to 5, the secondary-coordinator is reset to 0 (default) after the executor is scaled up to 5.
Scaling all of the Dremio pods down to 0
effectively shuts down the Dremio cluster.
To permanently change your default values, update the values.yaml file. See Changing your Configuration for more information.
Changing Your Configuration
If you need to update your configuration, you can do so after the installation by editing the configuration files and then upgrading using the helm upgrade <chart release name> .
command. The upgrade process pushes your changes to all of the pods in your Kubernetes cluster and restarts the pods.
For example, to permanently change the number of Dremio executor pods:
-
Edit the values.yaml file and change the number of executor pods specified for the
Example executor property valuesexecutor.count
property. In this example,executor.count
is5
. The other executor defaults remain unchanged.executor:
memory: 16384
cpu: 4
count: 5
volumeSize: 20Gi -
Run the upgrade command. Replace <chart_release_name> with your chart release name:
Helm upgrade command examplehelm upgrade --wait <chart release name> .
noteIf the command takes longer than a few minutes to finish, check the status of the pods with the
kubectl get pods
command. If the pods are pending scheduling due to limited memory or CPU, adjust the values you specified for the properties in the values.yaml file or add more resources to your Kubernetes cluster.
Using Support Keys
Support keys should only be used when instructed by Dremio Support, as they can alter the application's behavior and lead to unexpected failures if misused.
Backing Up the KV Store
Dremio stores important metadata in a metastore, referred to as the KV store, which is local to the master coordinator node. Regular backups of the KV store are highly recommended. As of Dremio 25.1.0+, these backups can be automated or scheduled as a cron job. You can test the backup restore process by performing a full cluster restore every 6 to 12 months.
Dremio Admin Commands
You can run the Dremio administration commands listed in the table below on the Dremio Kubernetes cluster. Dremio must be shut down and offline to run all commands except the Dremio backup
command.
Command | Offline/Online | Notes |
---|---|---|
backup | online | /opt/dremio/bin/dremio-admin backup See Backup Dremio for more information. |
clean | offline | /opt/dremio/bin/dremio-admin clean See Metadata Cleanup for more information. |
restore | offline | /opt/dremio/bin/dremio-admin restore See Restore Dremio for more information. |
set-password | offline | /opt/dremio/bin/dremio-admin set-password See Reset Password for more information. |
Backup
Run the backup
command on the master-coordinator pod from a bash shell. Dremio must be online to run the backup
command.
To run the backup command:
-
Connect to the master-coordinator pod using the exec command.
Connect to master coordinator podkubectl exec -it dremio-master-0 -- bash
-
Run the command from the bash shell. See Backup Dremio for more information.
Run bash shell command/opt/dremio/bin/dremio-admin backup \
-u <DREMIO_ADMIN_USER> \
-p <DREMIO_ADMIN_PASS> \
-d <BACKUP_PATH> -
Store the backup files in a persistent volume or copy the files from the local pod.
Clean, Restore, and Set-Password
To run the clean
, restore
, and set-password
commands, Dremio must be offline.
To temporarily shut down Dremio, delete the Dremio helm release or enable the DremioAdmin
pod.
To run these offline commands, create a Dremio Admin pod with the Dremio image and mount the master-coordinator pod's persistent volume:
-
Run the following command to create a Dremio Admin pod to run the dremio-admin commands. Replace
Create Dremio Admin pod<chart_release_name>
with your chart release name:helm upgrade --wait <chart release name> . --set DremioAdmin=true
-
Run the dremio-admin commands from the bash shell on the Dremio Admin pod. See Advanced Administration for more information about each command. The following commands connect you to the pod and allow you to perform the offline command:
Connect to pod and run offline commandkubectl exec -it dremio-admin -- bash
bin/dremio-admin <offline command> -
Upgrade helm to disable the
Delete podDremioAdmin
pod. Replace<chart release name>
with your chart release name.helm upgrade --wait <chart release name> . --set DremioAdmin=false
-
Restart your Dremio cluster.
Upgrading Dremio
To upgrade Dremio, update the image
value in the values.yaml file to the new Dremio version and run the helm upgrade
command.
During the upgrade process, existing pods are terminated and new pods are created with the new image. After all of the newly created pods are restarted and running, your Dremio cluster is upgraded.
To upgrade Dremio:
-
Ensure that your Dremio+Kubernetes cluster is backed up. See Backup for more information.
-
Ensure that there are no queries are running on the cluster.
-
Update the Dremio image tag in your values.yaml file. For example, to change the Dremio CE image:
Change Dremio CE imageimage: dremio/dremio-oss
imageTag: 11.0.0
...noteIf you are changing the Dremio Enterprise Edition image, you do not need to change the
imagePullSecrets
property. -
Run the
Get chart release namehelm list
command to retrieve the chart release name. In the example below, the chart release name is plundering-alpaca.helm list
NAME REVISION UPDATED STATUS CHART NAMESPACE
plundering-alpaca 1 Wed Jul 18 09:36:14 2018 DEPLOYED dremio-0.0.5 default -
Run
helm upgrade --wait <chart_release_name> .
to upgrade the deployment. Replace<chart_release_name>
with your chart release name.noteThe pods are restarted automatically after upgrading. If it takes longer than a couple of minutes to restart, check the status of the pods with the
kubectl get pods
command. If the pods are pending scheduling due to limited memory or CPU, adjust the values you specified for the properties in the values.yaml file (see Changing your Configuration) or add more resources to your Kubernetes cluster.