Skip to main content
Version: 24.3.x

Metadata Storage

Dremio stores metadata about users, spaces, and datasets. By default, Dremio stores this metadata at ${DREMIO_HOME}/data. Dremio administrators can customize the location of this directory with the paths.local property of the dremio.conf configuration file.

note

Dremio requires that deployments configured for High Availability use network-attached storage (NAS) as the metadata store.

I/O Performance

The Dremio metadata store services two workload types:

Workload TypePerformance Consideration
Requests from user queries and refreshes of data reflectionsPerformance is affected by the number of concurrent queries
Metadata refreshes, where Dremio collects and records information about source datasetsPerformance is affected by the number of tables to which Dremio connects, as well as the frequency of refreshes. The requirements of your query workload and metadata refresh policies may require greater throughput.

Requests from User Queries and Refreshes of Data Reflections

The performance requirement for user queries and reflection refreshes scales linearly with the number of concurrent queries per second.

Queries/SecRequired Baseline Throughput
5060 MB/s
100120 MB/s
200240 MB/s

Metadata Refreshes

The performance requirement for metadata refreshes scales linearly with the number of datasets and the average number of columns and splits per dataset. However, the performance requirement scales inversely with the refresh interval.

Number of DatasetsAverage Columns, Splits per DatasetRequired Baseline Throughput
100020 columns per dataset, 1000 splits per dataset, refresh interval: 30 min1 MB/s
200020 columns per dataset, 1000 splits per dataset, refresh interval: 30 min2 MB/s
200020 columns per dataset, 1000 splits per dataset, refresh interval: 10 min3 MB/s

For More Information