On this page

    Dataset

    Represents a dataset in Dremio. All datasets returned by the REST API have an entityType of dataset.

    Dataset Parameters

    The JSON representation of a dataset looks like this:

    Dataset object
    {
      "entityType": "dataset" [immutable after creation],
      "id": String [immutable, generated by Dremio],
      "path": [String] [immutable after creation],
      "tag": String [immutable, generated by Dremio],
      "type": String ["PHYSICAL_DATASET", "VIRTUAL_DATASET"] [immutable],
      "owner": [object] [immutable, generated by Dremio],
      "fields": [DatasetField] [immutable],
      "createdAt": String (RFC3339 date) [immutable, generated by Dremio],
      "accelerationRefreshPolicy": DatasetAccelerationRefreshPolicy [optional, only for physical datasets in a source],
      "sql": String [optional, required for virtual datasets],
      "sqlContext": [String] [optional, only for virtual datasets],
      "format": DatasetFormat [optional, required for promoted datasets],
      "approximateStatisticsAllowed": Boolean [optional, introduced in Dremio 2.1.0]
    }
    
    Name Type Description
    id String Dataset ID. Generated by Dremio, immutable.
    path [String] Dataset path. Immutable after creation.
    tag String Identifies the instance, changed each time it is modified. Generated by Dremio, immutable.
    type String The dataset type, must be either PHYSICAL_DATASET or VIRTUAL_DATASET. Immutable after creation.
    owner Object Information about the dataset’s owner. The owner object includes the owner’s UUID and the type of owner (USER or ROLE). The owner object does not appear if the dataset is owned by Dremio’s system user or if the owner is not found because their user account was deleted in Dremio or the external identity provider.
    fields [DatasetField] The dataset fields representing the schema of the dataset. Immutable.
    createdAt String RFC3339 date (example: 2017-10-27T21:08:22.858Z) representing the creation datetime. Immutable.
    accelerationRefreshPolicy DatasetAccelerationRefreshPolicy Represents the acceleration refresh policy for the dataset. Applies only to physical datasets that exist in a source.
    sql String The sql for the dataset, applies only to virtual datasets and required for them.
    sqlContext [String] The context for the sql, applies only to virtual datasets and is optional.
    format DatasetFormat The dataset format configuration, applies only to promoted physical datasets and is required.
    approximateStatisticsAllowed Boolean When set, count distinct queries will return approximate results.

    Fields Parameter

    Represents a dataset field’s schema in Dremio.

    The JSON representation of a field looks like this:

    Dataset fields example
    {
      "name": String - the field name,
      "type": {
        "name": String ["STRUCT", "LIST", "UNION", "INTEGER", "BIGINT", "FLOAT", "DOUBLE", "VARCHAR", "VARBINARY", "BOOLEAN", "DECIMAL", "TIME", "DATE", "TIMESTAMP", "INTERVAL DAY TO SECOND", "INTERVAL YEAR TO MONTH"],
        "subSchema": [DatasetField] [optional],
        "precision": Number [optional],
        "scale": Number [optional]
      }
    }
    

    For complex types (LIST, STRUCT, UNION), subSchema will provide a list of DatasetField representing the composition. For example, UNION will have a subSchema which represents all the primitive types that have been detected.

    For DECIMAL type, precision/scale are provided.

    AccelerationRefreshPolicy Parameter

    Represents the dataset acceleration refresh policy for a dataset.

    Dataset accelerationRefreshPolicy example
    {
      "refreshPeriodMs": Number,
      "gracePeriodMs": Number,
      "method": String ["FULL", "INCREMENTAL"],
      "refreshField": String [optional],
      "accelerationNeverExpire": Boolean,
      "accelerationNeverRefresh": Boolean
    }
    
    Name Type Description
    refreshPeriodMs Number How often (in milliseconds) to refresh all reflections on the dataset.
    gracePeriodMs Number How old (in milliseconds) data in a reflection can be and still be used for accelerating queries.
    method String For every refresh, either a FULL or an INCREMENTAL (only works in certain cases, please see docs) update.
    refreshField String For certain datasets, a refreshField can be set if method is INCREMENTAL.
    accelerationNeverExpire Boolean Controls whether the reflection is able to expire.
    accelerationNeverRefresh Boolean Controls whether the reflection regularly refreshes.

    Format Parameter

    Folders/files can be promoted to physical datasets by applying a format. When applying a dataset format to a folder, all files in that folder should conform to the selected type.

    Text (delimited)

    Applies to text files with delimiters (CSV, TSV, etc).

    Text type example
    {
      "type": "Text",
      "fieldDelimiter": String,
      "lineDelimiter": String,
      "quote": String,
      "comment": String,
      "escape": String,
      "skipFirstLine": Boolean,
      "extractHeader": Boolean,
      "trimHeader": Boolean,
      "autoGenerateColumnNames": Boolean
    }
    

    JSON

    JSON type example
    {
      "type": "JSON"
    }
    

    Parquet

    Parquet type example
    {
      "type": "Parquet"
    }
    

    Excel

    Excel type example
    {
      "type": "Excel",
      "sheetName": String,
      "extractHeader": Boolean,
      "hasMergedCells": Boolean
    }
    

    XLS

    XLS type example
    {
      "type": "XLS",
      "sheetName": String,
      "extractHeader": Boolean,
      "hasMergedCells": Boolean
    }
    

    Delta Lake

    Delta Lake type example
    {
      "type": "Delta"
    }
    

    Iceberg

    Iceberg type example
    {
      "type": "Iceberg"
    }