Source Metadata Caching
Dremio has various options for caching data source metadata configurable for individual sources.
Caching of data source metadata has the following options:
- Dataset Discovery
- Detaset Detail
Dataset Discovery option determines the refresh interval for top-level source object names such as names of databases, tables, indexes, etc. The dafault is one hour. This refresh is a lightweight operation. Dataset Discovery option is not available for file-system sources such as HDFS, MapR-FS or NAS.
Dataset Details is the metadata Dremio needs for query planning such as information on fields, types, shards, statistics and locality information.
There are three fetch modes:
Only Queried Datasets- Dremio updates details for previously queried objects in a source. This mode increases query performance as less work needs to be done at query time for these datasets.
All Datasets- Dremio updates details for all datasets in a source. This mode increases query performance as less work needs to be done at query time.
As Needed- Dremio updates details for a dataset at query time. This mode minimizes metadata queries on a source when not used, but might lead to longer planning times.`
Dremio expires the metadata it knows about datasets after the provided
Expire after value.
- The Dataset Discovery option is not available for file-system sources such as HDFS, MapR-FS or NAS.
- Datasets are limited to a maximum width of 800 columns (as of Dremio version 3.1.3). Datasets that have already exceed the limit are not queryable after their metadata is refreshed.