On this page

    Refreshing Data Reflections

    Refresh Policy: Refresh Interval and Expiration

    The system periodically updates the reflections in the Reflection Store to keep Data Reflections fresh. An administrator can specify the desired Refresh Policy for any table or data source – determining the refresh interval and expiration of reflections. All reflections based on a table or source will be refreshed accordingly. Refresh Policy options for a table will override the value for the source.

    Dremio will refresh Data Reflections at the provided refresh interval and serve them until the provided expiration.

    note:

    Manual Refresh: Disabling and enabling reflections for a table or view in the Dremio UI will cause those reflections to refresh. Also for a given table, all dependent reflections can be refreshed.

    Full and Incremental Refresh

    Dremio’s default behavior is to perform a full update of the Data Reflection on each update.

    However, for reflections that are based on large tables or views that data is only appended to, it is better to enable incremental updates.

    warning:

    Use incremental refreshes only for reflections that are based on tables and views that are appended to. If records can be updated or deleted in a table or view, use full refreshes for the reflections that are based on that table or view.

    There are two ways in which Dremio can identify new records:

    • For directory datasets in file-based data sources like S3 and HDFS: Dremio can automatically identify new files in the directory that were added after the prior refresh.
    • For all other datasets (such as datasets in relational or NoSQL databases) and for Iceberg tables: An administrator specifies a monotonically increasing field, such as an auto-incrementing key, that must be of type BigInt, Int, Timestamp, Date, Varchar, Float, Double, or Decimal. This allows Dremio to find and fetch the records that have been created since the last time the acceleration was incrementally refreshed.

    note:

    In these two cases, Dremio always uses full refreshes, rather than incremental refreshes:

    • A reflection is created on a table that was promoted from a file, rather than from a folder, or is created on a view that is based on such a table.

    • A reflection is created from a view that uses nested group-bys, joins, unions, or window functions.

    To specify incremental refresh for your dataset:

    1. Go to your source’s promoted folder.
    2. Click on the settings icon for the promoted folder.
    3. Select Reflection Refresh.
    4. Select Incremental Update.

    Routing Refresh Jobs to Particular Queues

    You can use an SQL command to route jobs for refreshing reflections directly to specified queues. See Queue Routing in the SQL reference.

    Changes to Anchor and Upstream Datasets

    Changes in definitions of anchor and/or upstream (i.e. parents, parents of parents) datasets require administrators to re-create affected reflections (including reflections on downstream datasets) to ensure that they are up-to-date.

    Dremio guarantees data correctness without any modifications, however, if affected reflections are not re-created when dataset definitions change,
    queries may not be able to use those reflections.

    Updating a reflection definition causes a full refresh of that reflection.