Source Metadata Caching

Dremio has various options for caching data source metadata configurable for individual sources.

Dataset Discovery

Dataset Discovery option determines the refresh interval for top-level source object names such as names of databases, tables, indexes, etc. The dafault is one hour. This refresh is a lightweight operation. Dataset Discovery option is not available for file-system sources such as HDFS, MapR-FS or NAS.

Dataset Details

Dataset Details is the metadata Dremio needs for query planning such as information on fields, types, shards, statistics and locality information.

There are three fetch modes:

  • Only Queried Datasets - Dremio updates details for previously queried objects in a source. This mode increases query performance as less work needs to be done at query time for these datasets.
  • All Datasets - Dremio updates details for all datasets in a source. This mode increases query performance as less work needs to be done at query time.
  • As Needed - Dremio updates details for a dataset at query time. This mode minimizes metadata queries on a source when not used, but might lead to longer planning times.`

Dremio will expire the metadata it knows about datasets after the provided Expire after value.


results matching ""

    No results matching ""