Skip to main content
Version: current [25.x]

Pillar 3 - Cost Optimization

While getting the best performance possible with Dremio is important, it is also important to optimize your costs associated with managing the Dremio platform.

Principles

Minimize Running Executor Nodes

While Dremio can scale to many hundreds of nodes, any given cluster should only have as many nodes as it needs to satisfy the current load and meet Service Level Objectives.

Dynamically Scale Executor Nodes Up and Down

When configuring Dremio engines, administrators can use horizontal pod autoscaling (HPA) to automatically scale up and down as needed to dynamically expand and contract capacity based on load.

Eliminate Unnecessary Data Processing

Unnecessarily building reflections and metadata can detract from the overall performance of your system, and it will contribute to the load and, therefore, the cost of operating the system.

Best Practices

Size Engines to Minimum Nodes Required

To avoid unnecessary cost, consider setting up a script, external to Dremio, that can reduce the number of active nodes in your engines down to the bare minimum (certainly one but maybe even zero) during times when you know the cluster will be getting minimal or no use, such as overnight weekdays or weekends. An equivalent script can be used to scale the number of executors in your engines back to operational capacity a short time prior to the cluster being put to normal daily use.

Remove Unused Reflections

Analysis of Dremio’s query history, joined with data present in system tables like sys.reflections, sys.project.reflections, and sys.materializations can provide details about how often each reflection in Dremio is being leveraged. For reflections that are not being leveraged, further analysis can determine if any of them are still being refreshed, how many times they have been refreshed in the reporting period, and how many hours of cluster execution time they have been consuming.

Identifying and removing unused reflections is good practice because it can reduce clutter in the reflection configuration. More importantly, it can free up hours of cluster execution cycles that can be used for more critical workloads.

Optimize Metadata Refresh Frequency

tip

See Optimize Metadata Refresh Frequency to understand metadata in Dremio, why it is important, and best practices for setting and adjusting the frequency of metadata refresh for datasets.