On this page

    Creating Aggregation Reflections

    Aggregation reflections are summarized representations of data. Most BI tools generate aggregation and GROUP BY queries. Aggregation reflections optimize these kinds of query patterns.

    When you create aggregation reflections, keep in mind these best practices:

    • Use dimensions that have relatively low cardinality in a dataset. The higher the cardinality of a dimension, the less benefit an aggregation reflection has on query performance. Lower cardinality aggregation reflections require less time to scan.

    • For a single dataset, create one aggregation reflection for each important subset of dimensions in your queries, rather than one aggregation reflection that includes all dimensions. Multiple small aggregation reflections (versus one large one) are good for isolated pockets of query patterns on the same dataset that do not overlap. If your query patterns overlap, use fewer larger aggregation reflections.

      There are two cautions that accompany this advice, however:

      • Be careful of creating aggregation reflections that have too few dimensions for your queries.

        If a query uses more dimensions than are included in an aggregation reflection, the reflection cannot satisfy the query and the query optimizer does not run the query against it.

      • Be careful of creating more aggregation reflections than are necessary to satisfy queries against a dataset.

        The more reflections you create, the more time the query optimizer requires to plan the execution of queries. Therefore, creating more aggregation reflections than you need can slow down query performance, even if your aggregation reflections are low-cardinality.

    • If you want to include a computed measure, first create a virtual dataset with the computed column to use as a measure, and then create the aggregation reflection on the virtual dataset.

    Dremio recommends that you also follow the best practices listed in “Best Practices for Creating Raw and Aggregation Reflections” when you create customized aggregation reflections.

    Creating Default Aggregation Reflections

    You can use the Basic view of the reflections editor to create one aggregation reflection that includes fields, from the anchor or supported anchor dataset, that are recommended for use as dimensions or measures. You can add or remove dimensions and measures, too.

    Restrictions

    • You can create only one aggregation reflection in the Basic view. If you want to create multiple aggregations reflections at a time, use the Advanced view.
    • You cannot select fields for sorting or horizontally partitioning.
    • The name of the reflection is restricted to “Aggregation Reflection”.

    Procedure

    To create an aggregation reflection in the Basic view of the reflections editor:

    1. Open the reflections editor.
      See “Locations of the Reflections Editor” to find out where you can open the editor from.

      In the Aggregations Reflections section of the editor, fields to use as dimensions and measures are recommended for you.

    2. In the Aggregation Reflection section of the editor, modify or accept the recommendation for dimension fields and measure fields.

    3. To make the reflection available to the query optimizer after you create it, click the toggle switch on the left side of the Aggregation Reflections bar.

    4. Click Save.

    Dremio creates and enables the aggregation reflection. For details, see “Results”.

    For tips on what to do now after your aggregation reflection is created and enabled, see “What to Do Next”.

    Creating Customized Aggregation Reflections

    You can use the Advanced view of the reflections editor to create one or more aggregation reflections that select which fields in the anchor or supporting anchor dataset to use as dimensions and measures. For each field that you use as a measure, you can use one or more of these SQL functions: APPROX_DISTINCT_COUNT, COUNT, MAX, and MIN. You can also choose sort fields and fields for partitioning horizontally.

    Procedure

    To create an aggregation reflection in the Advanced view of the reflections editor:

    1. Open the reflections editor.
      See “Locations of the Reflections Editor” to find out where you can open the editor from.
    2. Click the Advanced View button in the top-right corner of the editor.
    3. Click Aggregation Reflections. The Aggregation Reflections section is displayed, and one table for refining the aggregation reflection that appeared in the Basic view is ready.
    4. (Optional) Click in the name to rename the reflection.
      The purpose of the name is to help you understand, when you read job reports, which reflections the query optimizer considered and chose when planning queries.
    5. In the columns of the table, follow these steps, which you don’t have to do in any particular order:

      Note:
      Ignore the Distribution column. Selecting fields in it has no effect on the reflection.

    • Click in the Dimensions column to include or exclude fields to use as dimensions.

    • Click in the Measures field to include or exclude fields to use as measures. You can use one or more of these SQL functions for each measure: APPROX_DISTINCT_COUNT, COUNT, MAX, and MIN.

      The full list of SQL aggregation functions that Dremio supports is not supported in the reflections editor. If you want to create a reflection that aggregates data by using the sQL function AVG, CORR, HLL, SUM, VAR_POP, or VAR_SAMP, you must create a virtual dataset that uses the function, and then create a raw reflection from that virtual dataset.

    • Click in the Sort column to select fields on which to sort the data in the reflection. For guidance in selecting a field on which to sort, see the section “Sort Reflections on High-Cardinality Fields” in “Best Practices for Creating Raw and Aggregation Reflections”.

    • Click in the Partition column to select fields on which to horizontally partition the rows in the reflection. For guidance in selecting fields on which to partition, see the section “Horizontally Partition Reflections that Have Many Rows”.

    1. (Optional) Optimize the number of files used to store the reflection. You can optimize for fast refreshes or for fast read performance by queries. Follow these steps:

      a. Click the gear icon in the table in which you are defining the reflection.

      b. In the field Reflection execution strategy, select either of these options:

      • Options
        • Select Minimize Time Needed To Refresh if you need the reflection to be created as fast as possible. This option can result in the data for the reflection being stored in many small files. This is the default option.
        • Select Minimize Number Of Files when you want to improve read performance of queries against the reflection. With this option, there tend to be fewer seeks performed for a given query.
    2. (Optional) Have Dremio convert data from your reflection’s Parquet files to the Apache Arrow format when copying that data to executor nodes.

      Normally, Dremio copies data as-is from the Parquet files as-is to caches on executor nodes, which are nodes that carry out the query plans devised by the query optimizer.

      Enabling this option can improve query performance even more. However, data in the Apache Arrow format requires more space on the executor nodes than data in the default format.

      You can use this option if your distributed data storage supports Dremio’s Columnar Cloud Cache:

      • Amazon Simple Cloud Storage (S3)
      • S3-compatible object storage
      • HDFS
      • Microsoft Azure Data Lake Storage
      • Microsoft Azure Storage

      Follow these steps:

      a. Click the gear icon in the table in which you are defining the reflection.

      b. Click the Arrow caching toggle switch to turn the feature on.

    3. Click Save when you are finished.

    Dremio creates and enables the aggregation reflection. For details, see “Results”.

    For tips on what to do now after your aggregation reflection is created and enabled, see “What to Do Next”.

    Results

    If you used the reflections editor in the Dataset Settings window or the Acceleration window, the window is closed.

    By default, Dremio runs two jobs to create the aggregation reflection:

    • The first returns the result set for creating the reflection, running a REFRESH REFLECTION statement.
    • The second creates the metadata that the query optimizer can use to find out the definition and structure of the reflection, running a LOAD MATERIALIZATION METADATA statement.

    If the support key dremio.iceberg.enabled is turned on, then Dremio runs only the first job. When Dremio creates a reflection as an Apache Iceberg table, the metadata for the reflection is generated at the same time.

    This screenshot shows two jobs that Dremio ran to create a reflection named “Super-duper reflection”:

    The first pin shows the two jobs, and the second pin show the name of the reflection.

    What to Do Next

    After you create an aggregation reflection that is enabled, test whether the query optimizer is making queries use it. See “Testing Reflections” for the steps.

    When you are sure that the reflection is being used, set the refresh type for all reflections on the underlying physical dataset and set the schedule according to which they are refreshed. See “Refreshing Reflections”.