Administering Dremio on AKS
This topic discusses administration activities such as monitoring logs; scaling pods; changing configurations; performing basic administrative tasks such as backing up, restoring, and cleaning; and upgrading Dremio.
The example commands listed below assume that the current command line location is within the latest Dremio Helm chart, dremio-cloud-tools/charts/dremio_v2
, on the client machine that interacts with Kubernetes.
You must maintain any changes you make to the Helm values or configuration files in the dremio-cloud-tools/charts/dremio_v2
directory in your local copy of dremio-cloud-tools.
Monitoring Logs and Usage
Monitoring the cluster's resource usage (e.g., heap and direct memory, CPU, disk I/O, etc.) is crucial to maintaining long-term stability as the system scales. For this reason, it is highly recommended to set up a monitoring stack, such as Prometheus and Grafana. For a detailed setup tutorial and an overview on which metrics to track, see Dremio Monitoring in Kubernetes. For more information, see this PDF guide on the Dremio Enterprise Edition (Software) Shared Responsibility Model.
For monitoring logs, see Logs for more information. You can retrieve logs from the Dremio console or directly from Kubernetes. You can also write logs to a file on disk in addition to stdout. Read Writing Logs to a File for details.
Managing Workloads
Limit engine sizes to a maximum of 10 executor pods (with 32 CPUs and 128 GB of memory) to prevent over-parallelization of queries. Workloads should be split into high-cost and low-cost queries, and dedicated queues should be configured for reflections, metadata refresh, and table optimization jobs. For more information, see Dremio's Well-Architected Framework.
Scaling Up or Down
Scaling up or down refers to increasing or decreasing the number of Dremio pods (executors or scale-out coordinators). All scaling values remain in effect until you run another helm upgrade
command.
Scaling up and down the master-coordinator is not supported. Do not update the master-coordinator pod count from the default value, 1, as there must be exactly one master-coordinator in the Dremio cluster to maintain stability and ensure connectivity. Scaling to 0 effectively terminates the Dremio cluster.
Scaling down the number of executor pods, whether temporarily or permanently, may cancel queries if you are not using Dremio Enterprise Edition 24.3 or above with autoscaling.
-
Run the
Get chart release namehelm list
command to retrieve the chart release name. In the example below, the chart release name is plundering-alpaca.helm list
NAME REVISION UPDATED STATUS CHART NAMESPACE
plundering-alpaca 1 Wed Jul 18 09:36:14 2018 DEPLOYED dremio-0.0.5 default -
Run the
Helm upgrade command examplehelm upgrade --wait <chart_release_name> . --set <dremio pod=value>
command. Replace<chart_release_name>
with your chart release name. For example, Dremio executor pods could be scaled up or down with the following command, which changes the pod count to5
:helm upgrade --wait <chart_release_name> . --set executor.count=5