Delta Lake Data Format

Dremio 14.0.0 provides read-only support for the Delta Lake data format. The feature is disabled by default, To enable support for Delta Lake, set the dremio.deltalake.enabled support key to true. Dremio supports unlimited splits for Delta Lake tables.

Limitations

Dremio support for Delta Lake has the following limitations:

  • Dremio doesn’t support disabling the feature after enabling it
  • Support is limited to the following data sources:
  • Runtime filtering on non-partitioned columns is not supported
  • CTAS and DML operations aren’t supported
  • Metadata refresh is required to query the latest version of a Delta Lake table
  • Dremio supports Z-ordered datasets and bloom filters, but does not utilize their performance benefits
  • Microsecond precision is not supported for Spark DML operations that create timestamps
  • Dremio does not support reading older versions of a Delta Lake table
  • Dremio doesn’t read min/max bounds from the Delta Lake commit log for each column of of a new Delta Lake table

Known Issue

When existing Parquet datasets are converted to Delta Lake tables using Spark without rewriting the Parquet files, the converted Delta Lake table does not have statistics about the table. Dremio estimates high row counts in this case, which can lead to queries failing due to lack of sufficient memory.

Hive and Glue Data Sources

Dremio does not support the Delta Lake data format for Hive and AWS Glue data sources. However, Dremio users can promote a physical data set in the Delta Lake format.