Dremio implements hash aggregation spilling. When memory limits are reached, Dremio spills the data to disk as and when necessary. The query tries to complete within the memory envelope under which Dremio operates.
This feature is useful for the following:
A bare minimum amount of memory is needed by the hash aggregation operator to even start processing the first batch of data. As long as the memory given to hash aggregation is equal to or more than the minimum, Dremio can complete the query.
If the minimum memory is unavailable when the query is submitted, the job fails immediately (prior to starting the job) with an “Error: Failed to preallocate memory for single batch in partitions” message. This error message is shown in the exception stack details.
Based on the amount of system memory that you have provisioned, Dremio automatically determines the memory envelope for memory intensive operators that can spill.
As of Dremio 3.1.3, Hash aggregation spilling is enabled by default. In previous Dremio releases, hash aggregation spilling is disabled by default.
To fine-tune hash aggregation spilling performance, Contact Dremio Support at firstname.lastname@example.org.
To disable, navigate to Admin > Cluster > Support > Support Keys
exec.operator.aggregate.vectorize.use_spilling_operatorkey, click Show.
To re-enable, click Reset for the
Hash aggregation queries complete successfully regardless of the amount of data being sent to the operator, as long as:
For hash aggregation spilling information, go to Jobs > Details or Jobs > Profile > HASH_AGGREGATE_OPERATOR > Operator Metrics.