On this page

    Metadata Storage

    Dremio stores metadata about users, spaces, and datasets. By default, Dremio stores this metadata at ${DREMIO_HOME}/data. Dremio administrators can customize the location of this directory with the paths.local property of the dremio.conf configuration file.

    note:

    Dremio requires that deployments configured for High Availability use network-attached storage (NAS) as the metadata store.

    I/O Performance

    The Dremio metadata store services two workload types:

    Workload TypePerformance Consideration
    Requests from user queries and refreshes of data reflectionsPerformance is affected by the number of concurrent queries
    Metadata refreshes, where Dremio collects and records information about source datasetsPerformance is affected by the number of physical datasets to which Dremio connects, as well as the frequency of refreshes. The requirements of your query workload and metadata refresh policies may require greater throughput.

    Requests from User Queries and Refreshes of Data Reflections

    The performance requirement for user queries and reflection refreshes scales linearly with the number of concurrent queries per second.

    Queries/SecRequired Baseline Throughput
    5060 MB/s
    100120 MB/s
    200240 MB/s

    Metadata Refreshes

    The performance requirement for metadata refreshes scales linearly with the number of datasets and the average number of columns and splits per dataset. However, the performance requirement scales inversely with the refresh interval.

    Number of DatasetsAverage Columns, Splits per DatasetRequired Baseline Throughput
    100020 columns per dataset, 1000 splits per dataset, refresh interval: 30 min1 MB/s
    200020 columns per dataset, 1000 splits per dataset, refresh interval: 30 min2 MB/s
    200020 columns per dataset, 1000 splits per dataset, refresh interval: 10 min3 MB/s

    For More Information