On this page

    Creating Raw Reflections

    You can use the reflections editor to create two types of raw reflection:

    • A default raw reflection that includes all of the columns of the anchor dataset, but does not sort or horizontally partition on any columns

    • A raw reflection that includes all or a subset of the columns of the anchor dataset, and that does one or both of the following things:

      • Sorts on one or more columns
      • Horizontally partitions the data according to the values in one or more columns

    Prerequisites

    • If you want to accelerate queries on unoptimized data or data in slow storage, create a virtual dataset that is itself created from a physical dataset in a non-columnar format or on slow-scan storage. You can then create your raw reflection from that virtual dataset.
    • If you want to accelerate “needle-in-a-haystack” queries, create virtual dataset that includes a predicate to include only the rows that you want to scan. You can then create your raw reflection from that virtual dataset.
    • If you want to accelerate queries that perform expensive transformations, create a virtual dataset that performs those transformations. You can then create your raw reflection from that virtual dataset.
    • If you want to accelerate queries that perform joins, create a virtual dataset that performs the joins. You can then create your raw reflection from that virtual dataset.

    Creating Default Raw Reflections

    In the Basic view of the reflections editor, you can create a raw reflection that includes all of the fields that are in a physical dataset or virtual dataset. Creating a basic raw reflection ensures that Dremio never runs user queries against the underlying dataset, when the raw reflection is enabled.

    Restrictions of the Basic View

    • You cannot select fields to sort or create horizontal partitions on.
    • The name of the reflection that you create is restricted to “Raw Reflection”.
    • You can create only one raw reflection. If you want to create multiple raw reflections at a time, use the Advanced view.

    Procedure

    To create a raw reflection in the Basic view of the reflections editor:

    1. Open the reflections editor.
      See “Locations of the Reflections Editor” to find out where you can open the editor from.

    2. Click the toggle switch on the left side of the Raw Reflections bar.

    3. Click Save.

    Dremio creates and enables the raw reflection. For details, see “Results”.

    For tips on what to do now after your raw reflection is created and enabled, see “What to Do Next”.

    Creating Customized Raw Reflections

    In the Advanced view of the reflections editor, you can create one or more raw reflections that include all or a selection of the fields that are in the anchor or supported anchor dataset. You can also choose sort fields and fields for partitioning horizontally.

    Dremio recommends that you follow the best practices listed in “Best Practices for Creating Raw and Aggregation Reflections” when you create customized raw reflections.

    If you make any of the following changes to a raw reflection when you are using the Advanced view, you cannot switch to the Basic view:

    • Deselect one or more fields in the Display column. By default, all of the fields are selected.
    • Select one or more fields in the Sort, Partition, or Distribute column.

    Procedure

    To create a raw reflection in the Advanced view of the reflections editor:

    1. Open the reflections editor.
      See “Locations of the Reflections Editor” to find out where you can open the editor from.

    2. If the Advanced view is not already displayed, click the Advanced View button in the top-right corner of the editor.

    3. Click the toggle switch in the table labeled Raw Reflection to enable the raw reflection.
      Queries do not start using the reflection, however, until after you finished editing the reflection and click Save in a later step.

    4. (Optional) Click in the label to rename the reflection.
      The purpose of the name is to help you understand, when you read job reports, which reflections the query optimizer considered and chose when planning queries.

    5. In the columns of the table, follow these steps, which you don’t have to do in any particular order:

      Note:
      Ignore the Distribution column. Selecting fields in it has no effect on the reflection.

    • Click in the Display column to include fields in or exclude them from your reflection.
    • Click in the Sort column to select fields on which to sort the data in the reflection. For guidance in selecting a field on which to sort, see the section “Sort Reflections on High-Cardinality Fields” in “Best Practices for Creating Raw and Aggregation Reflections”.
    • Click in the Partition column to select fields on which to horizontally partition the rows in the reflection. For guidance in selecting fields on which to partition, see the section “Horizontally Partition Reflections that Have Many Rows”.
    1. (Optional) Optimize the number of files used to store the reflection. You can optimize for fast refreshes or for fast read performance by queries. Follow these steps:

      a. Click the gear icon in the table in which you are defining the reflection.

      b. In the field Reflection execution strategy, select either of these options:

      • Options
        • Select Minimize Time Needed To Refresh if you need the reflection to be created as fast as possible. This option can result in the data for the reflection being stored in many small files. This is the default option.
        • Select Minimize Number Of Files when you want to improve read performance of queries against the reflection. With this option, there tend to be fewer seeks performed for a given query.
    2. (Optional) Have Dremio convert data from your reflection’s Parquet files to the Apache Arrow format when copying that data to executor nodes.

      Normally, Dremio copies data as-is from the Parquet files as-is to caches on executor nodes, which are nodes that carry out the query plans devised by the query optimizer.

      Enabling this option can improve query performance even more. However, data in the Apache Arrow format requires more space on the executor nodes than data in the default format.

      You can use this option if your distributed data storage supports Dremio’s Columnar Cloud Cache:

      • Amazon Simple Cloud Storage (S3)
      • S3-compatible object storage
      • HDFS
      • Microsoft Azure Data Lake Storage
      • Microsoft Azure Storage

      Follow these steps:

      a. Click the gear icon in the table in which you are defining the reflection.

      b. Click the Arrow caching toggle switch to turn the feature on.

    3. Click Save when you are finished.

    Dremio creates and enables the raw reflection. For details, see “Results”.

    For tips on what to do now after your raw reflection is created and enabled, see “What to Do Next”.

    Results

    If you used the reflections editor in the Dataset Settings window or the Acceleration window, the window is closed.

    By default, Dremio runs two jobs to create the raw reflection:

    • The first returns the result set for creating the reflection, running a REFRESH REFLECTION statement.
    • The second creates the metadata that the query optimizer can use to find out the definition and structure of the reflection, running a LOAD MATERIALIZATION METADATA statement.

    If the support key dremio.iceberg.enabled is turned on, then Dremio runs only the first job. When Dremio creates a reflection as an Apache Iceberg table, the metadata for the reflection is generated at the same time.

    This screenshot shows two jobs that Dremio ran to create a reflection named “Super-duper reflection”:

    The first pin shows the two jobs, and the second pin show the name of the reflection.

    What to Do Next

    After you create a raw reflection that is enabled, test whether the query optimizer is making queries use it. See “Testing Reflections” for the steps.

    When you are sure that the reflection is being used, set the refresh type for all reflections on the underlying physical dataset and set the schedule according to which they are refreshed. See “Refreshing Reflections”.