Source Metadata Caching

Dremio has various options for caching data source metadata configurable for individual sources.

Description

Caching of data source metadata has the following options:

  • Dataset Discovery
  • Detaset Detail

Dataset Discovery

Dataset Discovery option determines the refresh interval for top-level source object names such as names of databases, tables, indexes, etc. The dafault is one hour. This refresh is a lightweight operation. Dataset Discovery option is not available for file-system sources such as HDFS, MapR-FS or NAS.

Dataset Details

Dataset Details is the metadata Dremio needs for query planning such as information on fields, types, shards, statistics and locality information.

There are three fetch modes:

  • Only Queried Datasets - Dremio updates details for previously queried objects in a source. This mode increases query performance as less work needs to be done at query time for these datasets.
  • All Datasets - Dremio updates details for all datasets in a source. This mode increases query performance as less work needs to be done at query time.
  • As Needed - Dremio updates details for a dataset at query time. This mode minimizes metadata queries on a source when not used, but might lead to longer planning times.`

Dremio expires the metadata it knows about datasets after the provided Expire after value.

Limitations

  • The Dataset Discovery option is not available for file-system sources such as HDFS, MapR-FS or NAS.
  • Datasets are limited to a maximum width of 800 columns (as of Dremio version 3.1.3). Datasets that have already exceed the limit are not queryable after their metadata is refreshed.

results matching ""

    No results matching ""