Data Types
Dremio supports the following SQL data types: numeric, string and binary, boolean, date and time, and semi-structured data.
Numeric Types
INT
A 4-byte signed integer. The supported range is from -2,147,483,648 to 2,147,483,647.
BIGINT
An 8-byte signed integer. The supported range is from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
FLOAT
A 4-byte single-precision floating-point number. A FLOAT provides six decimal digits of precision.
DOUBLE
An 8-byte double-precision floating-point number. A DOUBLE provides 15 decimal digits of precision.
DECIMAL
A DECIMAL type has precision (p) and scale (s), written as DECIMAL(p,s). Precision is the total number of digits; scale is the number of digits to the right of the decimal point. For example, 987.65 is a DECIMAL(5,2) value.
When you perform arithmetic on DECIMAL values with different precision or scale, the result will have sufficient precision and scale to hold the result—unless it overflows.
- Numeric literals cannot exceed BIGINT range. The SQL parser cannot process numeric literals larger than 9,223,372,036,854,775,807. To work with larger values, use a string literal:
CAST('9223372036854775808' AS DECIMAL(38,0)). - Overflow produces zero, not an error. When decimal arithmetic or casting results in a value that exceeds the target precision, Dremio returns 0 rather than raising an error. For example,
CAST('9999999999999999999' AS DECIMAL(38,2)) * CAST('9999999999999999999' AS DECIMAL(38,2))returns 0.00. This can cause silent data loss. - Casting to lower precision rounds the value. Dremio uses standard half-up rounding. For example,
CAST(123.56 AS DECIMAL(4,0))returns 124, whileCAST(123.44 AS DECIMAL(4,0))returns 123. If the rounded value still exceeds the target precision, the result is 0.
String and Binary Types
VARCHAR
VARCHAR stores variable-length text. Only UTF-8 encoding is supported. For functions that work with VARCHAR values, see String Functions.
VARCHAR ExampleSELECT 'Hello, world!'
-- Hello, world!
VARBINARY
VARBINARY stores variable-length binary data. By default, the maximum length is 32,000 bytes. Binary values are displayed in base64 encoding. For functions that work with VARBINARY values, see Binary Functions.
VARBINARY ExampleSELECT CAST('help' AS VARBINARY)
-- aGVscA==
UUID
A universally unique identifier stored as a 16-byte binary value. The string representation consists of 32 hexadecimal digits (0-9, a-f) displayed in five groups separated by hyphens, in the form 8-4-4-4-12. UUID generation and comparison follow RFC 4122 and RFC 9562. For functions that work with UUID values, see UUID Type Functions.
UUID is supported only for Apache Iceberg tables. Dremio compares UUID values using unsigned byte-wise comparisons in big-endian byte order. This is equivalent to lexicographic comparison of the string representations.
To generate a UUID, use the GENERATE_UUID function.
CREATE TABLE users (
user_id UUID,
name VARCHAR
);
SELECT GENERATE_UUID();
-- e740c3e0-54d2-4a01-b4f5-ad5fe5690e1c
SELECT UUID '619fbd6f-5114-40c7-a83f-1f0e114adae2'
Usage Notes
- Casting an invalid string format or incorrectly sized binary to UUID throws an error.
- Per the Iceberg specification, UUID columns cannot be changed to other data types, and columns of other types cannot be changed to UUID using
ALTER TABLE. To migrate an existing VARCHAR column to UUID, add a new UUID column, populate it withCAST(varchar_col AS UUID), then drop the original column. - UUID is supported as a primitive type and within complex types (ARRAY, STRUCT, MAP, VARIANT).
- The UUID type does not enforce uniqueness; the same value can be inserted multiple times.
- UUID values are case-insensitive.
Logical Types
BOOLEAN
The supported values for BOOLEAN are TRUE, FALSE, and NULL. For functions that work with BOOLEAN values, see Boolean Functions.
Date and Time Types
For functions that work with date and time values, see Date/Time Functions.
DATE
A date value that enables you to calculate and store consistent information about the date of events and transactions. When using a string literal for the date, YYYY-MM-DD is the only supported format. To use a different format, use the TO_DATE function.
DATE '2000-01-01'
TIME
Represents a time of day without a date component. TIME supports up to millisecond precision (3 fractional digits).
TIME ExamplesTIME '17:30:50'
TIME '17:30:50.235'
TIMESTAMP
-
Represents an absolute point in time with millisecond precision without a time zone. Timestamps are truncated to the nearest millisecond.
-
Examples:
- TIMESTAMP ‘2000-01-01 01:30:50’
- TIMESTAMP ‘2000-01-01 17:30:50’
- TIMESTAMP ‘2000-01-01 17:30:50.9’
- TIMESTAMP ‘2000-01-01 17:30:50.12’
- TIMESTAMP ‘2000-01-01 17:30:50.123’
INTERVAL
Intervals are used to represent a measure of time. Dremio supports two types of intervals: year-month, which stores the year and month (YYYY-MM); and day-time (DD HH:MM:SS), which stores the days, hours, minutes, and seconds.
| Form | Example |
|---|---|
| YEAR | INTERVAL '1' YEAR |
| MONTH | INTERVAL '3' MONTH |
| YEAR TO MONTH | INTERVAL '1-5' YEAR TO MONTH |
| DAY | INTERVAL '5' DAY |
| MINUTE | INTERVAL '5' MINUTE |
| SECOND | INTERVAL '5' SECOND |
| DAY TO HOUR | INTERVAL '4 01' DAY TO HOUR |
| DAY TO MINUTE | INTERVAL '4 01:01' DAY TO MINUTE |
| DAY TO SECOND | INTERVAL '1 2:34:56.789' DAY TO SECOND |
Semi-Structured Types
Dremio supports three semi-structured types: VARIANT, STRUCT, and MAP. For new Iceberg v3 tables, VARIANT is the recommended approach for semi-structured data because it supports flexible, schema-on-read access without requiring a fixed schema at table creation.
VARIANT
VARIANT is a self-describing type that can hold:
- Primitives – numeric types, VARCHAR, VARBINARY, UUID, BOOLEAN, DATE, TIME, TIMESTAMP, null
- Arrays – ordered lists of values
- Objects – nested key-value pairs, created via
PARSE_JSONorTO_VARIANT(struct)
Unlike STRUCT, which has a fixed schema defined at table creation, VARIANT allows different rows to have different shapes. This makes VARIANT ideal for semi-structured data where the schema varies, evolves frequently, or is unknown at write time.
VARIANT is supported only on Apache Iceberg v3 tables. See Iceberg v3 for details.
For functions that work with VARIANT values, see Variant Type Functions.
Creating a table with a VARIANT columnCREATE TABLE events (
event_id INT,
event_time TIMESTAMP,
payload VARIANT
);
INSERT INTO events (event_id, event_time, payload)
VALUES (1, TIMESTAMP '2025-01-15 10:30:00', PARSE_JSON('{"user": "Alice", "action": "login"}'));
SELECT
event_id,
VARIANT_GET(payload, '$.user' AS VARCHAR) AS user_name,
VARIANT_GET(payload, '$.action' AS VARCHAR) AS action
FROM events;
When you SELECT a VARIANT, the output displays the underlying value, not the VARIANT wrapper. Use TYPEOF to confirm the actual type.
Storage and Performance
VARIANT values are always stored using a compact binary encoding, not as raw JSON strings. The types preserved in that encoding depend on how the data was loaded:
TO_VARIANTpreserves the original SQL types. Extracted values retain their original type without conversion.PARSE_JSONpreserves only JSON's primitive types — strings, numbers, booleans, and nulls. Types like DATE, TIMESTAMP, and UUID are stored as strings because JSON has no representation for them. Extracting these values withVARIANT_GETrequires a type conversion on each read. For better performance on frequently queried fields with known types, consider extracting them once into typed relational columns.
Variant Shredding
Variant shredding is a Parquet-level optimization that extracts commonly occurring fields from a VARIANT value and stores them as separate, typed columns within the Parquet file. Instead of reading an entire binary blob to access a single field, a query engine can read only the specific columns it needs. This reduces I/O, improves compression, and enables data skipping through Parquet column statistics. Shredding is an optimization applied by the writing engine and encoded in the structure of the Parquet files. Iceberg table metadata does not track whether VARIANT data is shredded.
Dremio reads shredded VARIANT data transparently. When other engines such as Apache Spark (4.1 and later) write VARIANT data with shredding to Iceberg v3 tables, Dremio takes advantage of the shredded layout on read. Spark 4.1 enables variant shredding by default, so any Iceberg v3 table written by Spark 4.1 or later will contain shredded VARIANT data. No configuration is required in Dremio to read these files.
Dremio writes shredded VARIANT data by default. When Dremio writes VARIANT values to Iceberg v3 tables, it applies shredding to optimize read performance. While shredding significantly improves read performance, it adds overhead during writes. For write-heavy workloads where write throughput is the priority, you can disable shredding by setting the iceberg.enableVariantShredding table property to false:
ALTER TABLE my_table SET TBLPROPERTIES ('iceberg.enableVariantShredding' = 'false');
See Table Properties for additional Iceberg table properties.
Usage Notes
- VARIANT cannot hold MAP or INTERVAL types.
- VARIANT cannot be nested inside STRUCT types.
- VARIANT does not implicitly convert to or from other types in expressions. Use
VARIANT_GETto extract and cast values to a target type. - Direct path access (
payload['user']orpayload.user) is not supported. UseVARIANT_GETto extract values. VARIANT_GETextracts scalar values and arrays of scalar values. Objects and arrays of objects cannot be extracted directly — use path expressions to navigate to scalar fields. SeeVARIANT_GETfor supported cast types, array extraction withFLATTEN, and detailed usage.- VARIANT cannot be cast to STRUCT types. Use
VARIANT_GETto extract individual fields. - A variant null and SQL
NULLare distinct. UseIS_VARIANT_NULLto check for variant null values; useIS NULLto check for missing paths. - Individual VARIANT values are subject to Dremio's 16 MB row width limit. See Dataset Limits.
Client Compatibility
When you query VARIANT columns through external clients (JDBC, ODBC, Arrow Flight) or BI tools (Tableau, Power BI, Looker), Dremio automatically converts VARIANT values to JSON strings. This conversion happens transparently—no configuration or explicit casting is required.
This behavior exists because the client ecosystem does not yet support native VARIANT as a data type. All VARIANT functions (such as VARIANT_GET and IS_VARIANT_NULL) continue to work normally on the server side; only the data returned to the client is converted.
For driver-specific details, see Arrow Flight SQL JDBC and Arrow Flight SQL ODBC.
STRUCT
Used to represent collections of key-value pairs. Keys are non-empty, case-insensitive strings, and values can be of any type.
Create STRUCT LiteralSELECT CONVERT_FROM('{"name":"Gnarly", "age":7, "car":null}', 'json')
-- {"name":"Gnarly","age":7}
SELECT address['city'] FROM customerTable
Usage Notes
- Null fields are omitted from STRUCT output.
- STRUCT fields cannot contain VARIANT values.
ARRAY
Used to represent a list of arbitrary size, where the index is a non-negative integer and values can be of any single type. For functions that work with ARRAY values, see Array Type Functions in the Semi-Structured Data section.
ARRAY LiteralsSELECT ARRAY[1, 2, 3];
-- [1, 2, 3]
SELECT customerOrders[100] FROM OrderHistoryTable
MAP
The MAP type is a collection of key-value pairs. MAP keys are case-insensitive strings. All values in a given map have the same type. For example, map<string, int> represents a mapping where the keys are strings and the values are integers. To retrieve the value of a MAP element, use column['key'].
SELECT <column_name['<key_name>']> FROM <table_name>
MAP keys must be strings. MAP values must be numbers, booleans, or strings; NULL is not supported as a value. For functions that work with MAP values, see Map Type Functions in the Semi-Structured Data section.
Implicit Type Conversion
For information about supported implicit type conversions, including conversion rules for file-formatted sources, AWS Glue, DECIMAL-to-DECIMAL behavior, and COPY INTO operations, see Implicit Type Conversion.