On this page

    Metadata Storage

    Dremio stores metadata about users, spaces, and datasets. By default, Dremio stores this metadata at ${DREMIO_HOME}/data. Dremio administrators can customize the location of this directory with the paths.local property of the dremio.conf configuration file.

    note:

    Dremio requires that deployments configured for High Availability use network-attached storage (NAS) as the metadata store.

    I/O Performance

    The Dremio metadata store services two workload types:

    Workload Type Performance Consideration
    Requests from user queries and refreshes of data reflections Performance is affected by the number of concurrent queries
    Metadata refreshes, where Dremio collects and records information about source datasets Performance is affected by the number of physical datasets to which Dremio connects, as well as the frequency of refreshes. The requirements of your query workload and metadata refresh policies may require greater throughput.

    Requests from User Queries and Refreshes of Data Reflections

    The performance requirement for user queries and reflection refreshes scales linearly with the number of concurrent queries per second.

    Queries/Sec Required Baseline Throughput
    50 60 MB/s
    100 120 MB/s
    200 240 MB/s

    Metadata Refreshes

    The performance requirement for metadata refreshes scales linearly with the number of datasets and the average number of columns and splits per dataset. However, the performance requirement scales inversely with the refresh interval.

    Number of Datasets Average Columns, Splits per Dataset Required Baseline Throughput
    1000 20 columns per dataset, 1000 splits per dataset, refresh interval: 30 min 1 MB/s
    2000 20 columns per dataset, 1000 splits per dataset, refresh interval: 30 min 2 MB/s
    2000 20 columns per dataset, 1000 splits per dataset, refresh interval: 10 min 3 MB/s

    For More Information