23.0.0+ Release Notes (October 2022)
Dremio 23.0.0+ supports only MapR 6.2.0. If you are running MapR 5.2.x or 6.1.x, you must upgrade to MapR 6.2.0 before upgrading to Dremio 23. Dremio releases up to and including 22.x do not support MapR 6.2.0, only MapR 5.2.x and 6.1.x are supported in releases prior to Dremio 23.
NOTE: MapR 6.2.0 supports only JDK 11. JDK8 is not supported.
Dremio can now read
MAPdata from Parquet files. You must run
ALTER TABLE <table_name> FORGET METADATAon tables containing
MAPdata that you have previously queried. This feature is enabled by default. If you prefer the previous behavior of representing
dremio.data_types.map.enabledto OFF under Settings > Support > Support Keys.
A preview job is no longer automatically triggered when you click on a dataset. If you have permissions to edit the dataset, you will see the original SQL in the SQL Editor. If you do not have permissions to edit the dataset, you will continue to see a
SELECT *statement pre-populated in the SQL Editor.
- In previous releases, Dremio supported a maximum of 800 leaf columns in a table, though that value was configurable with the support key
store.plugins.max_metadata_leaf_columns. If you used this support key and have upgraded to v23.0, reset the key so that you can use the maximum of 6,400 that is enabled with wide table support. See Creating and Querying Wide Tables for more information and limitations.
In this release, Dremio does not support Iceberg tables written with equality deletes.
DML operations (
MERGE) are not supported on tables with
CTASis supported on tables with
If a user was actively logged in to Dremio during the upgrade to version 23.0.0, pages under Settings will throw an
Unexpected Erroruntil the user logs out and logs back in.
This release of Dremio supports a semi-structured
MAPdata type that allows you to query map data from Apache Parquet files, Apache Iceberg, and Delta Lake. The
MAPdata type is a collection of key-value pairs and is useful for holding sparse data. See Data Types for more information.
Dremio now supports
LISTAGG, which is an aggregate function that concatenates a list of strings and places a separator between them. See LISTAGG for more information.
The Jobs Profile page includes a number of enhancements so you can quickly find the most expensive execution steps in a query, understand details of each execution step and its impact on query time, memory consumption, data volume, and the effect on upstream and downstream data volume upstream, and identify system or data issues that are causing a query to be slow or expensive. See Viewing Query Profiles for more information.
Azure Data Lake Storage (ADLS) Gen1 is now supported as a source on Dremio’s AWS Edition. For more information, see Azure Data Lake Storage Gen1.
Elasticsearch is now supported as a source on Dremio’s AWS Edition. For more information, see Elasticsearch.
In this release, embedded Nessie historical data that is not used by Dremio is purged on a periodic basis to improve performance and avoid future upgrade issues. The maintenance interval can be modified with the
nessie.kvversionstore.maintenance.period_minutesSupport Key, and you can perform maintenance manually using the
nessie-maintenanceadmin CLI command.
Dremio now supports wide tables. See Creating and Querying Wide Tables for more information and limitations.
Added a new Admin CLI command,
dremio-admin remove-duplicate-roles, that will remove duplicate LDAP groups or local roles and consolidate them into a single role. For more information, see Remove Duplicate Roles.
Dremio now supports connecting to Amazon S3 sources using an AWS PrivateLink URL. For more information, see Amazon S3.
Similar to Encrypting the LDAP Bind Password, Dremio now supports the same encryption mechanism for wire encryption setup for the following fields in dremio.conf:
Iceberg tables written with positional deletes are now supported.
Starting in Dremio 23.0.0, customers who select
dremio/dremio-eedocker image will receive an Eclipse Temurin based image for JDK, either JDK 8 or JDK 11. Dremio will no longer provide docker images based on
openjdk:jdk-8for future Dremio versions since it has been officially deprecated. Older
dremio/dremio-eeimage versions will remain available.
- Added a button that allows you to quickly copy the ID of a job on the job details page.
- When multiple metadata refresh jobs ran concurrently on the same dataset, one or more jobs could fail with
- Added table snapshot ID in the plan digest for Iceberg table scans so that the planner can distinguish between two different versions of the same table.
- Added validation to the REST endpoint so that reflections cannot be configured to expire more quickly than the refresh period.
When promoting Iceberg tables, Dremio now correctly previews underlying table content for the latest snapshot, excluding delete files.
When promoting Iceberg tables backed by external catalogs, users would see an unhelpful
Failed to get iceberg metadataerror. The message now provides more information about using a data source configured for the catalog.
After upgrading to Dremio 22.1.1, some coordinator nodes failed to start due to a failure in connecting to S3-compatible storage (sources or distributed storage configuration) that required path style access.
Following the upgrade to Dremio v22, Support Keys of type
DOUBLEwould no longer accept decimal values.
Fixed an issue that was causing
REFRESH DATASETjobs to hang when reading Iceberg metadata using Avro reader.
- Fixed an issue that was causing the status of a cancelled job to show as RUNNING or PLANNING.
- Fixed a bug in Yarn-based deployments where certain properties that were meant for customizing Dremio executor containers were also being passed on to the Application Master container.
- In some deployments, using a large number of REST API-based queries that return large result sets can create memory issues and lead to cluster instability.
Following the upgrade to Dremio 22, some queries to Hive 2 metastore external tables with data in S3 were running considerably slower than before.
In some scenarios, invalid metadata about partition statistics was leading to inaccurate rowcount estimates for tables. The result was slower than expected query execution or out of memory issues. For each table included in a query where this behavior appears, perform an
ALTER TABLE <table-name> FORGET METADATA, then re-promote the resulting file or folder to a new table. This will ensure that the table is created with the correct partition statistics.
During the reflection matching phase, for the filter pattern in some queries the planner could generate row expression nodes exponentially and exhaust heap memory.
Fixed an issue with concurrent metadata refresh requests that could result in the following error:
StatusRuntimeException: ABORTED: Tried to create a dataset that already exists.
- Changes made to the columns displayed on the Jobs page, or the order of the columns, were not being saved after leaving the page.
In some environments, Dremio was unable to read a Parquet statistics file in Hive during logical planning, and the query was cancelled because planning phase exceeded 60 seconds.
Fixed an issue that was causing the error
GandivaException: Failed to make LLVM module due to Function double abs(double) not supported yetfor certain case expressions used as input arguments.
- When a materialization took too long to deserialize, the job updating the materialization cache entry could hang and block all reflection refreshes.
- This release includes a number of fixes that resolve potential security issues.
For some users, when clicking on certain items on the Settings page, they were being redirected to the Dremio home screen.
Automatic reflection refreshes were failing with the following error:
StatusRuntimeException: UNAVAILABLE: Channel shutdown invoked
In rare cases, an issue in the planning phase could result in the same query returning different results depending on the query context.
Profiles for some reflection refreshes included unusually long setup times for
Wait time for
WRITER_COMMITTERwas excessive for some reflection refreshes, even though no records were affected.
After changing the engine configuration, some queries were failing with an
- When skipping the current record from any position, Dremio was not ignoring line delimiters inside quotes, resulting in unexpected query results.
Following the upgrade to Dremio 21.2, some Delta Lake tables could not be queried, and the same tables could not be formatted again after being unpromoted.
Fixed an issue handling
CONVERT_FROMduring reflection matching when the materialization cache was enabled.
On occasion, projecting complex data types would result in a
Schema change exception.
Some queries on Parquet datasets in an ElasticSearch source were failing with a
SCHEMA_CHANGEerror, though there had been no changes to the schema.
In some cases, deleted reflections were still being used to accelerate queries if the query plan had been cached previously.
Reflection refreshes were failing on ElasticSearch views that used the
When a query that used a reflection was executed multiple times, some of the jobs used the reflection and some did not.
Clicking Edit Original SQL for a view in the SQL editor was producing a generic
Something went wrongerror.
Fixed issue that was causing the
LENGTHfunction to return incorrect results.
Fixed an issue that was causing metadata refresh on some datasets to fail continuously.
Some queries were failing with
INVALID_DATASET_METADATA ERROR: Unexpected mismatch of column namesif duplicate columns resulted from a join because Dremio wasn’t specifying column names.
When unlimited splits were enabled, users were seeing an
Access deniederror for queries run against Parquet files if impersonation was enabled on the source.
Fixed an issue causing the error “Offset vector not large enough for records” when copying list columns.
Some queries that used the
FLATTEN()function were showing results for a Preview, but no data was returned when using Run.
- Removed the ‘unsafe-eval’ directive from the content security policy.
- Dremio no longer includes server name and version in the response header.
- Fixed an issue with external LDAP group name case sensitivity, which was preventing users from accessing Dremio resources to which they had been given access via their group/role membership.
If issues were encountered when running queries against a view, Dremio was returning an error that was unhelpful. The error returned now includes the root cause and identifies the specific view requiring attention.
When using the Catalog API to create a folder in a space, if the folder already existed in the space, the API was returning the
HTTP/1.1 500 Internal Server Errorinstead of
HTTP/1.1 409 Conflict.
Row count estimates for some Delta Lake tables were changing extensively, leading to single-threaded execution plans.
When a Hive source was added or modified, shared library files created in a new directory under
/tmpwere not being cleaned up and leading to disk space issues.
In some cases, queries using the
<operator would fail when trying to decode a timestamp column in a Parquet file.
JDBC clients could not see parent objects (folders, spaces, etc.) unless they had explicit
SELECTprivileges on those objects, even if they had permissions on a child object.
Fixed an issue in the scanner operator that could occur when a parquet file had multiple row-groups, resulting in a query failure and the following system error:
Illegal state while reusing async byte reader
When promoting a folder using the REST API, incremental refresh settings were not being returned in the
Frequent, consecutive requests to the Job API endpoint to retrieve a Job’s status could result in an
Parentheses were missing in the generated SQL for a view when the query contained
UNION ALLin a subquery, and the query failed to create the view.
- Upgraded 3rd party XML parsing library stax2-api dependency (used while parsing XML responses from S3) from 3.1.4 to 4.2 as required by woodstox-core:5.2.1.
- Updated the PostgreSQL JDBC Driver to version 42.4.1 to address CVE-2022-31197.
- Updated com.squareup.okhttp3:okhttp to version 4.9.2.
- Updated the Freemarker library to version 2.3.31. While Dremio was not subject to any vulnerabilities in the previous version, the version was updated to comply with security and development best practices.
- Updated the Apache Xerces Java library to version 2.12.2.
- Updated com.google.protobuf:protobuf-java to version 3.19.4.
- Updated com.google.code.gson:gson to version 2.9.0.
23.0.1 Release Notes (October 2022)
- In some cases, queries against a table that was promoted from text files containing Windows (CRLF) line endings were failing or producing an
Only one data line detectederror.
23.1.0 Release Notes (November 2022)
- If you previously installed the community connector from Dremio Hub, you must remove it and the existing driver. For more information, see Snowflake.
Table location (
locationUri) for Hive and Glue sources is now supported for Iceberg Tables. See Creating Apache Iceberg Tables for more information.
This release includes a new SQL function,
ARRAY_CONTAINSwhich returns whether a list contains a given value. For more information, see ARRAY_CONTAINS.
In this release, a new source connector allows you to query data from other Dremio clusters. For more information, see Connecting to Another Dremio Software Cluster.
This release adds support for a new connector that allows querying data from Snowflake data warehouses. If you previously installed the community connector from Dremio Hub, you must remove it and the existing driver. For more information, see Snowflake.
If you specify an alias for a column or expression in the
SELECTclause, you can now refer to that alias elsewhere in the query. For more information, see Table SQL Statements.
SELECTstatements now support a new
QUALIFYclause, which allows you to filter the results of window functions. For more information, see SELECT.
This release includes performance improvements for incremental metadata refreshes on partitioned Parquet tables.
queries.logfile was showing zero values for
metadataRetrieval, even though valid values were included in the job profile.
For Parquet sources on Amazon S3, files were being automatically formatted/promoted even though the auto-promote setting had been disabled.
When saving a view, datalake sources were showing up as a valid location for the view, but such sources should not have been allowed as a destination when saving a view.
Following the upgrade to Dremio 20.x,
is_member(table.column)was returning zero results on views that used row-level security.
Improved reading of double values from ElasticSearch to maintain precision.
Fixed an issue that was causing queries to fail when adding or subtracting an integer to
Following the upgrade to Dremio 22.1.2, when promoting JSON files to tables and building views from those tables, queries against the views were failing with a
The width of the Tag field for datasets has been expanded to ensure that the full name of a tag will be displayed.
Reflection footprint was 0 bytes when created on a view using the
CONTAINSfunction on an Elasticsearch table. The reflection could not be used in queries and
To address potential security concerns, AWSE CloudFormation now enforces IMDSv2, HTTP tokens are now required, and endpoints are enabled.
An error in schema change detection logic was causing refresh metadata jobs for Hive tables to be triggered at all times, even if there were no changes in the table.
Updated org.apache.parquet:parquet-format-structures to address a potential security vulnerability [CVE-2021-41561].
Dremio was generating unnecessary exchanges with multiple unions, and changes have been made to set the proper parallelization width on JDBC operators and reduce the number of exchanges.
Fixed an issue that was causing
NULLIFcalls to not get pushed down to Oracle.
On catalog entities, ownership granted to a role was not being inherited by users in that role.
If you clicked on a job to view details, your position on the page was reset when clicking the Back button or the Jobs link on the page header. Your position on the main Jobs page is now maintained in these scenarios.
Some queries using a filter condition with
flattenfield under a multi-join were generating a NullPointerException.
In Dremio 22.0.x, users who were not assigned the
ADMINrole were getting 0-byte files when attempting to download query results, while downloads were working as expected in previous releases.
CONVERT_FROM()did not support all ISO 8601 compliant date and time formats.
The AWSE activation page was no longer showing the expiration date for a license key.
An aggregate reflection that matched was not being chosen due to a cost difference generated during pre-logical optimization.
Fixed an issue that was affecting the accuracy of cost estimations for queries against Delta tables (i.e., some queries where showing very high costs).
Fixed an issue that was causing an error when using the Tableau OAuth sign-in method when using the “oauth+ldap” mode.
Formatting and comments in a view definition were not being preserved as they had been entered in the SQL Editor.
If Dremio was stopped while a metadata refresh for an S3 source was in progress, some datasets within the source were getting unformatted/deleted.
Fixed an issue where Glue tables with large numbers of columns and partitions would not return results for all partitions in the table. Before this fix will take effect, you will need to refresh metadata via
ALTER TABLE REFRESH METADATA.