Optimizing Data Reflections
You can create reflections that are partitioned, sorted, or both. Base your selection of partition and sort columns on the patterns of queries that are run on the underlying tables and views. Look on the Jobs page to find these queries.
You can also base your selection of partition and sort columns on the properties of the tables and views that your reflections are based on.
No matter how you select partition and sort columns, you might need to test iteratively to get the best results.
You can specify partition columns and sort columns in the Advanced view of the reflections editor or in SQL commands for creating reflections.
Partitioning and sorting are supported for both raw and aggregation reflections.
When you create reflections, ignore the Distribution option in the reflections editor and the DISTRIBUTE BY option in the SQL commands. These options have no effect.
Data Reflections can be partitioned on one or more columns. When specified, Dremio creates multiple files based on partitioning configuration.
Low cardinality fields are ideal for partitioning (e.g. Day-Month-Year). Ideally, the overall cardinality should be less than 10,000 values – a smaller number of partitions is preferred.
Dremio optimizes performance by pruning partitions when a query has a filter on a partitioned column.
Data Reflections can be locally sorted on one or more columns. Sorting ensures that the records are sorted within each node and partition (if any).
Sorting is especially useful in scenarios with range queries and filters. If sorting is enabled, during query execution, Dremio skips over large blocks of records based on filters on sorted columns.
Dremio recommends sorting on single fields only.