The system periodically updates the reflections in the Reflection Store to keep Data Reflections fresh. An administrator can specify the desired Refresh Policy for any physical dataset or data source – determining the refresh interval and expiration of reflections. All reflections based on a physical dataset or source will be refreshed accordingly. Refresh Policy options for a physical dataset will override the value for the source.
Dremio will refresh Data Reflections at the provided refresh interval and serve them until the provided expiration.
Disabling and enabling reflections for a dataset in Dremio UI will cause that reflections to refresh. Also for a given physical dataset, all dependent reflections can be refreshed.
Dremio’s default behavior is to perform a full update of the Data Reflection on each update. However, for larger datasets it is better to enable incremental updates. There are two ways in which the system can identify new records:
- As of Dremio 3.2, incremental refresh is supported for datasets with columns fields of
BigInt, Int, Timestamp, Date, Varchar, Float, Double, and Decimal data types.
- In releases prior to Dremio 3.2, incremental refresh is supported for datasets with BigInt columns only.
To specify incremental refresh for your dataset:
- Only append-only datasets are supported for Incremental Update Mode. Updates and deletions of underlying files leads to incorrect results. Dremio recommends using Full Refresh in this case.
- Reflections on virtual datasets that include joins cannot be incrementally updated. Dremio falls back to using full refresh for these datasets.
This feature is only available when using instances of Dremio v18.0+.
Metadata refreshes for reflections now take place in near-real-time when completing a reflection job.
To activate this functionality, use the
dremio.execution.support_unlimited_splits flags. Enabling flags is done from the Support Settings page.
Using these support keys will enable new functionalities in Dremio that may cause unexpected behaviors with your existing datasets. We recommend testing this functionality first in a test environment as described here.
This improvement to metadata refreshes does not support PDFs as a storage method.
We recommend also enabling Near-Real-Time Metadata Refreshes as this removes the limitation on unlimited splits, allowing you to more easily utilize reflections on larger datasets where metadata refreshes may be slower.
Changes in definitions of anchor and/or upstream (i.e. parents, parents of parents) datasets require administrators to re-create affected reflections (including reflections on downstream datasets) to ensure that they are up-to-date.
Dremio guarantees data correctness without any modifications, however,
if affected reflections are not re-created when dataset definitions change,
queries may not be able to use those reflections.
Updating a reflection definition causes a full refresh of that reflection.