Optimizing Data Reflections
Based on query patterns and properties of the underlying dataset, Data Reflections can be further optimized by specifying partitioning, sorting, and distribution. This can be configured when defining accelerations in the UI or by using SQL commands.
These optimizations can be used with both Raw and Aggregation Reflections.
Data Reflections can be partitioned on one or more columns. When specified, Dremio creates multiple files based on partitioning configuration.
Low cardinality fields are ideal for partitioning (e.g. Day-Month-Year). Ideally, the overall cardinality should be less than 10,000 values -- a smaller number of partitions is preferred.
Dremio optimizes performance by pruning partitions when a query has a filter on a partitioned column.
Data Reflections can be locally sorted on one or more columns. Sorting ensures that the records are sorted within each node and partition (if any).
Sorting is especially useful in scenarios with range queries and filters. If sorting is enabled, during query execution, Dremio skips over large blocks of records based on filters on sorted columns.
Dremio recommends sorting on single fields only.