Refreshing Data Reflections
Refresh Policy: Refresh Interval and Expiration
The system periodically updates the reflections in the Reflection Store in order to keep Data Reflections fresh. An administrator can specify the desired Refresh Policy for any physical dataset or data source -- determining the refresh interval and expiration of reflections. All reflections based on a physical dataset or source will be refreshed accordingly. Refresh Policy options for a physical dataset will override the value for the source.
Dremio will refresh Data Reflections at the provided refresh interval and serve them until the provided expiration.
[info] Manual Refresh
Disabling and enabling reflections for a dataset in Dremio UI will cause that reflections to refresh. Also for a given physical dataset, all dependent reflections can be refreshed.
Full and Incremental Refresh
Dremio's default behavior is to perform a full update of the Data Reflection on each update. However, for larger datasets it is better to enable incremental updates. There are two ways in which the system can identify new records:
- Directory datasets in file-based data sources like S3 and HDFS: The system can automatically identify new files in the directory.
- All other datasets (physical and virtual): An administrator specifies a monotonically increasing field such as an auto-incrementing key that must be of type BigInt. Incremental updating is not available for datasets without any BigInt fields. This allows the system to fetch the records that have been created since the last time the acceleration was updated.
To set incremental refresh for your dataset:
- Go to your source's promoted folder.
- Click on the settings icon for the promoted folder.
- Select Reflection Refresh.
- Select Incremental Update.
- Only append-only datasets are supported for Incremental Update Mode. Updates and deletions of underlying files leads to incorrect results. Dremio recommends using Full Refresh in this case.
- Reflections on virtual datasets that include joins cannot be incrementally updated. Dremio falls back to using full refresh for these datasets.
Changes to Anchor and Upstream Datasets
Changes in definitions of anchor and/or upstream (i.e. parents, parents of parents) datasets require administrators to re-create affected reflections (including reflections on downstream datasets) to ensure that they are up-to-date.
Dremio guarantees data correctness without any modifications, however without re-creating the reflections, queries may no longer be use reflections due to changes in dataset definitions, until they are updated.
Updating a reflection definition will cause a full refresh of that reflection.