Caching Source Metadata
This topic describes how to configure the cache settings for data source metadata. Various caching options are available for individual data sources.
To configure caching setting for data source metadata:
- Open the settings for the data source. Data source configuration settings can be set either when adding the data source or after the data source has been added.
- Navigate to Metadata > Metadata Refresh.
- Modify the settings for the following:
- Dataset Discovery
- Detaset Detail
For more information about Metadata settings for specific data sources, see the each data source. See HDFS for a list of data sources.
Metadata Refresh Settings
This section describes the configurable caching settings.
Dataset Discovery option determines the refresh interval for top-level source object names such as names of databases, tables, indexes, etc. The dafault is one hour. This refresh is a lightweight operation. Dataset Discovery option is not available for file-system sources such as HDFS, MapR-FS or NAS.
Dataset Details is the metadata Dremio needs for query planning such as information on fields, types, shards, statistics and locality information.
The following fetch modes are available:
Only Queried Datasets- Dremio updates details for previously queried objects in a source. This mode increases query performance as less work needs to be done at query time for these datasets.
All Datasets- (Deprecated as of 3.3) Dremio updates details for all datasets in a source. This mode increases query performance as less work needs to be done at query time.
As Needed- (Not Available as of 3.3) Dremio updates details for a dataset at query time. This mode minimizes metadata queries on a source when not used, but might lead to longer planning times.`
Dremio expires the metadata it knows about datasets after the provided
Expire after value.
- The Dataset Discovery option is not available for file-system sources such as HDFS, MapR-FS or NAS.
- Datasets are limited to a maximum width of 800 columns (as of Dremio version 3.1.3). Datasets that have already exceed the limit are not queryable after their metadata is refreshed.