Metadata Storage
Dremio stores metadata about users, spaces, and datasets. By default, Dremio stores this metadata at ${DREMIO_HOME}/data
. Dremio administrators can customize the location of this directory with the paths.local
property of the dremio.conf configuration file.
Dremio requires that deployments configured for High Availability use network-attached storage (NAS) as the metadata store.
I/O Performance
The Dremio metadata store services two workload types:
Workload Type | Performance Consideration |
---|---|
Requests from user queries and refreshes of data reflections | Performance is affected by the number of concurrent queries |
Metadata refreshes, where Dremio collects and records information about source datasets | Performance is affected by the number of tables to which Dremio connects, as well as the frequency of refreshes. The requirements of your query workload and metadata refresh policies may require greater throughput. |
Requests from User Queries and Refreshes of Data Reflections
The performance requirement for user queries and reflection refreshes scales linearly with the number of concurrent queries per second.
Queries/Sec | Required Baseline Throughput |
---|---|
50 | 60 MB/s |
100 | 120 MB/s |
200 | 240 MB/s |
Metadata Refreshes
The performance requirement for metadata refreshes scales linearly with the number of datasets and the average number of columns and splits per dataset. However, the performance requirement scales inversely with the refresh interval.
Number of Datasets | Average Columns, Splits per Dataset | Required Baseline Throughput |
---|---|---|
1000 | 20 columns per dataset, 1000 splits per dataset, refresh interval: 30 min | 1 MB/s |
2000 | 20 columns per dataset, 1000 splits per dataset, refresh interval: 30 min | 2 MB/s |
2000 | 20 columns per dataset, 1000 splits per dataset, refresh interval: 10 min | 3 MB/s |