Implicit Type Conversion
When loading data from file-formatted sources, if a data type within a file does not match the data type of the corresponding table, Dremio takes one of the following actions:
- Silent conversion: Dremio converts the file data type to match the table's schema without changing the schema.
- Schema learning promotion: Dremio promotes the table column to a wider type that can accommodate both the existing and new data, then reruns the query. When this happens, Dremio picks the narrowest type that safely holds values from both the source and column type. For example, a FLOAT column receiving INT data is promoted to DOUBLE, which can represent both integer and floating-point values without loss of precision.
- Error: If the types are incompatible, the query fails with an error message.
For metastore sources such as AWS Glue, Dremio does not update the data type since the table's schema is defined by the external metastore. Additionally, for Apache Iceberg and Delta Lake, the table format defines the schema and there is no schema learning.
Supported Conversions
Conversions not listed in this table are not supported and return an error.
| Source Type | Converts To | Column Promoted To |
|---|---|---|
| INT | BIGINT, DOUBLE, DECIMAL, VARCHAR | FLOAT column → DOUBLE |
| BIGINT | DOUBLE, DECIMAL, VARCHAR | INT column → BIGINT FLOAT column → DOUBLE |
| FLOAT | DOUBLE, DECIMAL, VARCHAR | INT column → DOUBLE BIGINT column → DOUBLE |
| DOUBLE | VARCHAR | INT column → DOUBLE BIGINT column → DOUBLE FLOAT column → DOUBLE DECIMAL column → DOUBLE |
| DECIMAL | DOUBLE, VARCHAR | INT column → DECIMAL BIGINT column → DECIMAL FLOAT column → DECIMAL DECIMAL column → adjusted |
| TIME, DATE, TIMESTAMP | VARCHAR | — |
| VARCHAR | UUID | All numeric, date, and time columns → VARCHAR |
| BOOLEAN | VARCHAR | — |
| UUID | VARCHAR | — |
| VARIANT | — (use TO_VARIANT and VARIANT_GET) | — |
| ARRAY, STRUCT, MAP | — | — |
AWS Glue Sources
For AWS Glue sources with Parquet datasets, Dremio supports the following conversions:
- INT to BIGINT
- FLOAT to DOUBLE
- DECIMAL to DECIMAL
- VARCHAR to VARCHAR[N], which truncates VARCHAR to the specified [N] limit
Date/Time Parsing from CSV Files
CSV values are read as strings. Dremio parses date and time strings into the target column type using the format options you specify (DATE_FORMAT, TIME_FORMAT, TIMESTAMP_FORMAT) or the defaults:
| Target Type | Default Format |
|---|---|
| DATE | YYYY-MM-DD |
| TIME | HH24:MI:SS.FFF |
| TIMESTAMP | YYYY-MM-DD HH24:MI:SS.FFF |
See COPY INTO for additional format options.
DECIMAL-to-DECIMAL Conversions
Since decimals can have different scales and precision, Dremio supports DECIMAL-to-DECIMAL conversions. Conversion between any DECIMAL precision and scale is supported, including decreasing.
The following symbols are used in this section:
- Precision (p): represents the number of digits in the value/cell.
- Scale (s): represents the number of digits after the decimal point in the value/cell.
Conversion between any DECIMAL precision and scale is supported, including decreasing. Runtime checks detect overflow. On overflow, returns NULL. On truncation, rounds to the nearest half.
Implicit conversions are not currently supported for filesystem and MongoDB sources.
DECIMAL-to-DECIMAL Schema Learning
When source files contain decimal values using a different scale or precision from the table's definition - Decimal (p, s) - Dremio will update the table's decimal definition according to the following logic through schema learning if possible, or otherwise attempt to convert the source decimal values to the table's definition.
Logic used for the schema learning option, where d = decimal, p = precision, and s = scale:
d1,p1,s1- The decimal type in file 1 with a givenpandsd2,p2,s2- The decimal type in file 2 with a givenpands- Precision - The new precision after schema learning
- Scale - The new scale after schema learning
The following shows the schema-learned decimal type for the column.
Schema learning logic to determine precisionPrecision = max(s1, s2) + max(p1-s1, p2-s2)
This schema provides space for the maximum fractional part (below decimal point) and integral part (above decimal point).
Scale = max(s1, s2)
If the resulting precision is greater than 38, it is reduced to 38, and the corresponding scale is reduced to prevent truncating the integral part:
- Scale is reduced to provide enough space for the integral part. The resulting scale is
MIN(precision, 38) - max(p1 - s1, p2 - s2). The resulting precision is 38. - This result causes truncation on read since the fractional part will not contain enough spaces for the original scale.
Type Conversion When Copying Data from CSV or JSON Files Into Apache Iceberg Tables
These data-type conversions are supported when the COPY INTO <table> SQL command is used to copy data from source files into Iceberg tables:
| Source Data Type | Target Dremio Data Type |
|---|---|
| String | BIGINT, BOOLEAN, DATE, DECIMAL, DOUBLE, FLOAT, INT, TIME, TIMESTAMP, UUID, VARCHAR |
| Integer | BIGINT, DECIMAL, DOUBLE, FLOAT, INT, VARCHAR |
| Floating-point numbers | DECIMAL, DOUBLE, FLOAT, VARCHAR |
| Boolean | BOOLEAN, VARCHAR |
| Object (JSON only) | STRUCT |
| Array (JSON only) | LIST |
Although the TIMESTAMP data type is supported as a target schema data type, TIMESTAMP with time zone is not supported.