On this page

    Refreshing Reflections

    The data in a reflection can become stale and need to be refreshed. The refresh of a reflection causes two updates:

    • The data stored in the Apache Iceberg table for the reflection is updated.
    • The metadata that stores details about the reflection is updated.

    Both of these updates are implied in the term “reflection refresh”.

    Types of reflection refresh

    There are two ways that the data for reflections can be refreshed:

    • Full refresh

      Dremio drops the table in which the data for a reflection is stored, creates a new table, and loads that table. This type of refresh is necessary when a reflection sorts, partitions, or both on one or more fields. It is also possible to use this type when a reflection is not sorted or partitioned.

    • Incremental refresh

      Dremio appends data to the existing data for a reflection. Incremental refreshes are faster than full refreshes for large reflections, and are appropriate for reflections that are not sorted, partitioned, or both. If a physical dataset was promoted from a file, rather than from a folder, reflections derived from it cannot be refreshed incrementally.

    Best practice: Time reflection refreshes to occur after metadata refreshes of physical datasets

    Time your refresh reflections to occur only after the metadata for their underlying physical datasets is refreshed. Otherwise, reflection refreshes do not include data from any files that were added to a physical dataset since the last metadata refresh, if any files were added.

    For example, suppose a data source that is promoted to a physical dataset consists of 10,000 files, and that the metadata refresh for PDS is set to happen every three hours. Subsequently, reflections are created from visual datasets on that PDS, and the refresh of reflections on the PDS is set to occur every hour.

    Now, one thousand files are added to the PDS. Before the next metadata refresh, the reflections are refreshed twice, yet the refreshes do not add data from those one thousand files. Only on the third refresh of the reflections does data from those files get added to the reflections.

    Setting the Refresh Policy for Reflections

    In the settings for a data source, you specify the schedule for refreshes of all reflections that are on the physical datasets in that data source. The default schedule is Never refresh.

    In the settings for a physical dataset, you specify the type of refresh to use for all reflections that are ultimately derived from the physical dataset, and you can specify a schedule for reflection refreshes that overrides the schedule specified in the settings for the physical dataset’s data source. The default refresh type is Full refresh, and the default schedule is the schedule set at the source of the physical dataset.

    Procedures

    To set the refresh schedule on a data source:

    1. Right-click a data lake or external source.
    2. Select Edit Details.
    3. In the sidebar of the Edit Source window, select Reflection Refresh.
    4. When you are done making your selections, click Save. Your changes go into effect immediately.

    To set the refresh type and schedule on a physical dataset:

    1. Locate a physical dataset.
    2. Click the gear icon to its right.
    3. In the sidebar of the Dataset Settings window, click Reflection Refresh.
    4. When you are done making your selections, click Save. Your changes go into effect immediately.

    Viewing the Refresh History for Reflections

    You can find out whether a refresh job for a reflection has run, and how many times refresh jobs for a reflection have been run.

    Procedure

    1. Go to the space that lists the dataset from which the reflection was created.
    2. Hover over the row for the dataset.
    3. In the Actions field, click the gear.
    4. In the sidebar of the Dataset Settings window, select Reflections.
    5. Click History in the heading for the reflection.

    Result

    The Jobs page is opened with the ID of the reflection in the search box and only jobs related to that ID listed.

    When a reflection is created, or refreshed (and the refresh type set at the underlying physical datasets is “Full refresh”), Dremio runs two jobs by default:

    • The first returns the result set for creating the reflection, running a REFRESH REFLECTION statement.
    • The second creates the metadata that the query optimizser can use to find out the definition and structure of the reflection, running a LOAD MATERIALIZATION METADATA statement.

    If the support key dremio.iceberg.enabled is turned on, then Dremio runs only the first job. When Dremio creates a reflection as an Apache Iceberg table, the metadata for the reflection is generated at the same time. When Dremio fully refreshes a reflection created as an Iceberg table, again the metadata for the reflection is generated at the same time.

    If the refresh type set at the underlying physical datasets is “Incremental”, whether or not the support key dremio.iceberg.enabled is turned off, Dremio runs only the first job.

    Setting the Maximum Number of Attempts for Failures to Refresh Reflections

    You can specify how many times a job should retry refreshing a reflection after the first attempt fails. Doing so can help keep reflection-refresh jobs moving at an acceptable rate through an engine queue, so that the data in the corresponding reflections does not become too stale.

    Procedure

    1. Open the Reflections page, hover over the gear in the sidebar, select Project Settings, and then select Reflections in the sidebar of the page that is opened.
    2. On the Reflections page, click the gear in the top-right corner and select Acceleration Settings.
    3. In the Maximum attempts for reflection job failures field, specify the number of retries to allow.
    4. Click Save. The change goes into effect immediately.

    Routing Refresh Jobs to Particular Engines

    You can use an SQL command to route jobs for refreshing reflections directly to specified engines. See Engine Routing in the SQL reference.