22.0.0 Release Notes (June 2022)
Known Issues
-
Azure Data Explorer and Microsoft Azure Synapse Analytics sources are not supported and cannot be added in the MapR edition of Dremio 22.
-
When multiple SQL statements are executed in the SQL Runner, any jobs that may have failed are not listed in the job summary below the SQL Editor.
-
The
fields
parameter is not returned for tables in external sources when fetching table details via/api/v3/catalog{id}
if the table has not been queried. Fixed in 22.1.1. -
Dremio fails to parse queries on a view when the query originates from Power BI, or another JDBC/ODBC client, that has the
quoting
connection property set to a non-default value. Fixed in 22.1.1.
What’s New
-
This release adds support for SQL scalar user-defined functions (UDFs), which are callable routines that make it easier for you to write and reuse SQL logic across queries. UDFs let you extend the capabilities of Dremio SQL, provide a layer of abstraction to simplify query construction, encapsulate business logic, and support row and column policies for access control.
-
Dremio now supports row-access and column-masking policies for row and column controls over user query access to sensitive tables, views, and columns. This allows administrators to dynamically exclude or mask private data at the column and row levels prior to query execution and without physically altering the original values.
-
This release adds to existing Iceberg DML capabilities allows users to run
DELETE
,UPDATE
,MERGE
, andTRUNCATE
statements against Iceberg tables. See SQL Commands for Apache Iceberg Tables for more information. -
You can now add Azure Data Explorer (ADX) as a database source in Dremio. For more information, see Azure Data Explorer.
-
Autocomplete is now available in the SQL Editor. When enabled, autocomplete lets you view and insert possible completions in the editor using the mouse or the arrow keys with Tab or Enter. Autocomplete can provide suggestions for SQL keywords, catalog objects, and functions while you are constructing SQL statements. Suggestions depend on the current context. The autocomplete feature can be enabled or disabled for all users under Settings > SQL.
-
The SQL Runner now allows you to save your SQL as a script. See Querying Your Data for more information.
-
Script owners are indicated with a small orange flag next their username. Script owners cannot be removed or have their privileges changed.
-
You can share scripts with others in your organization by adding users and assigning privileges to View, Modify, Manage Grants, or Delete.
-
When adding or modifying script privileges, the
View
privilege is enabled automatically if any of the other privileges are enabled -
The option to save a script will be disabled if the user already has 100 scripts, which is the maximum per user.
-
-
Added support for internal schema using SQL commands, which lets the user override the data type of a column instead of using the type that Dremio automatically detected.
-
Iceberg is the default CTAS format for all filesystem sources in Dremio 22.0.0+.
-
The
DATEDIFF
andADD_MONTHS
Hive functions are supported in queries.
-
The option to enable Arrow caching in advanced reflection settings has been removed because Arrow caching is not supported with unlimited splits.
-
ALTER TABLE
commands are now supported to add, drop, or modify columns in MongoDB sources.
- Users can now resize the Data panel when viewing the SQL Editor on the Datasets or SQL Runner page.
-
The Zookeeper version used in Kubernetes deployments has been upgraded to 3.8.0 to address known security vulnerabilities. As part of any upgrade, it is best practice to back up configuration files and stateful volumes.
-
Fields in MongoDB tables can be converted to
VARCHAR
using internal schema (ALTER
commands) and incompatible data types will fall back toVARCHAR
instead of failing when querying MongoDB tables.
- If the user tried to cancel a completed job, Dremio was returning an internal server error. The message now indicates that the job may have already completed and cannot be cancelled.
- Improved logging and now providing a more meaningful error message when invalid characters are encountered in a password or PAT.
- In this release, we have updated various elements of the Dremio UI to provide a more uniform and intuitive user experience.
- In this release, many messages provided in Dremio have been updated to provide information that is more accurate and more helpful.
-
Iceberg now supports metadata functions for inspecting a table’s history, snapshots, and manifests.
-
New commands are available for
ALTER
keyword. By issuing theALTER FOLDER
orALTER SPACE
command, user can now set reflection refresh routing at folder and space level. -
Users can now add a primary key to a table or drop a primary key with the following commands:
alter table <table name> add primary key (col1, ...)
alter table <table name> drop primary key
-
Dremio’s OAuth config now supports
access_token
as a valid token type to provide identity when authenticating via OpenID Connect SSO.
- Dremio now supports OIDC + LDAP mode, which allows the use of OpenID Connect (OIDC) for authentication while still using LDAP for user and groups lookup.
- Updated to ElasticSearch 6.8.23 client libraries to address CVE-2020-7019, CVE-2020-7020, CVE-2019-7611, and CVE-2019-7614. Note that support for ElasticSearch 5.x servers is now deprecated and no longer supported.
- In this release, json-smart was upgraded to version 2.4.8 to address CVE-2021-27568.
- Updated WildFly OpenSSL to 1.1.3.Final to address CVE-2020-25644.
- Updated the Postgres JDBC driver from version 42.2.18 to version 42.3.4 to address CVE-2022-21724.
- In this release, the Apache Arrow version has been upgraded to 8.0.0 to address issues with some current functions and add support for new functions.
- FasterXML/Jackson was upgraded to version 2.13.2 in Parquet to address a number of vulnerabilities.
-
This release includes a new consent page where you can permit Tableau to access resources on your behalf when connecting via Tableau SSO.
-
Along with ROW and ARRAY keywords, STRUCT and LIST keywords are now supported to represent complex data types:
STRUCT < x : BIGINT, y : LIST < BIGINT >>, LIST <STRUCT < x : INT >>
-
This release adds support for the
MODIFY
privilege onSYSTEM
that will allow non-admin users to manage Node Activity, Engines, Queues, Engine Routing, and Support Keys. -
Dremio now supports
MODIFY COLUMN
on MongoDB sources, and the internal schema changes will not be erased by metadata refresh. -
The
SELECT
privilege can be granted to users and roles on specific system tables, allowing those users to view the specified tables.
-
In the job details page, the automatic truncation message will appear in the job summary if a query’s output was truncated.
-
Dremio admins can allow or disable the creation of local users by adding the
services.coordinator.security.permission.local-users.create.enabled:<flag>
setting todremio.conf
. Set the flag totrue
to allow local users orfalse
to disable the creation of local users. -
Added the
UPLOAD
privilege, letting non-admin users upload files to their home space. This privilege can be overridden if theui.upload.allow
support key is disabled.
- Added a plus button to the upper-right corner of the page for spaces that allows users to quickly add a new folder, table, or view. For user home spaces, the button also allows you to upload files.
- Dremio now supports SSO authentication from Tableau. See Tableau for more information about supported versions and configuration steps.
- If the context is truncated in the SQL Editor, you can now hover the cursor over the field and the full context will be displayed in a tooltip.
-
User can now see the wiki for a folder if they can access the folder, even if implicitly via a shared dataset that is nested inside.
-
This release includes a new argument for the
dremio-admin clean
CLI command to purge dataset version entries that are not linked to existing jobs. See Clean Metadata for more information. -
Users who have been granted the
CREATE ROLE
privilege can view and update role members.
Issues Fixed
-
Fixed a number of issues that could cause the “An Unexpected Error Occurred” dialog box to be displayed in Dremio, providing no potential solution except to call Dremio Support.
-
Running
ALTER PDS
to refresh metadata on a Hive source was resulting in the following error:PLAN ERROR: NullPointerException
-
Some queries were taking longer than expected because Dremio was reading a
STRUCT
column when only a single nested field needed to be read.
-
On first run, some queries were failing with an assertion error at the planning stage when a complex type was defined within a view.
-
Following the upgrade from Dremio 20.x to 21.0.0 and if Nessie was in use, metadata refreshes were failing with
Unknown type ICEBERG_METADATA_POINTER
.
-
The Tableau and PowerBI buttons were not showing up or remaining hidden as expected, and they are now enabled all the time in the SQL Editor.
-
After enabling Iceberg, files with
:
in the path or name were failing with aRelative path in absolute URI
error.
-
Reflection refresh jobs were consuming too much peak memory on each executor node.
-
CAST
operations were added to pushed down queries for RDBMS sources to ensure consistent data types, and specifically for numeric types where precision and scale were unknown. In some cases, however, addingCAST
operations at lower levels of the query was disabling the use of indexes inWHERE
clauses in some databases. Dremio now ensures thatCAST
operations are added as high up in the query as possible. -
Following an upgrade, queries with
TO_NUMBER(_Column_,'###')
were failing.
-
In environments with high memory usage, if an expression contained a large number of splits, it could eventually lead to a heap outage/out of memory exception.
-
Fixed an issue that was causing the following error when trying to open a view in the Dataset page:
Some virtual datasets are out of date and need to be manually updated.
-
When using Postgres as the data source, expressions written to perform subtraction between doubles and integers, or subtraction between floats and integers, would incorrectly perform an addition instead of the subtraction.
-
When running a specific query with a
HashJoin
, executor nodes were stopping unexpectedly with the following error:SYSTEM ERROR: ExecutionSetupException
- At times, in Dremio’s AWS Edition, the preview engine was going offline and could not be recovered unless a reboot was performed.
- Dremio was generating a NullPointer Exception when performing a metadata refresh on a Delta Lake source if there was no checkpoint file.
- Partition expressions were not pushed down when there was a type mismatch in a comparison, resulting in slow queries compared to prior Dremio versions.
- Fixed an issue that was causing large spikes in direct memory usage on coordinator nodes, which could result in a reboot.
- When Iceberg features were enabled, the location in the API was incorrect for some tables in S3 sources.
22.0.3 Release Notes (Enterprise Edition Only, July 2022)
Enhancements
- Azure Data Lake Storage (ADLS) Gen1 is now supported as a source on Dremio’s AWS Edition. For more information, see Azure Data Lake Storage Gen1.
- Elasticsearch is now supported as a source on Dremio’s AWS Edition. For more information, see Elasticsearch.
22.1.1 Release Notes (August 2022)
Enhancements
-
Dremio now supports connecting to Amazon S3 sources using an AWS PrivateLink URL. For more information, see Amazon S3.
-
In this release, embedded Nessie historical data that is not used by Dremio is purged on a periodic basis to improve performance and avoid future upgrade issues. The maintenance interval can be modified with the
nessie.kvversionstore.maintenance.period_minutes
Support Key, and you can perform maintenance manually using thenessie-maintenance
admin CLI command.
-
If OAuth sign-in for Tableau is enabled, all newly generated TDS files will use OAuth for authentication. If disabled, username/password authentication will be used.
-
Users with the
CREATE ROLE
privilege will now have access to the Roles tab under Settings, allowing them to add new roles.
- Improved the error message that is displayed when trying to run DML commands that are not supported on views saved from Iceberg tables.
- This release enables non-partition column runtime filters with row level pruning.
Issues Fixed
-
The
fields
parameter was not returned for tables in external sources when fetching table details via/api/v3/catalog{id}
if the table had not been queried. -
Dremio was failing to parse queries on a view when the query originated from Power BI, or another JDBC/ODBC client, that had the
quoting
connection property set to a non-default value.
- In some scenarios, invalid metadata about partition statistics was leading to inaccurate rowcount estimates for tables. The result was slower than expected query execution or out of memory issues. For each table included in a query where this behavior appears, perform an
ALTER TABLE <table-name> FORGET METADATA
, then re-promote the resulting file or folder to a new table. This will ensure that the table is created with the correct partition statistics.
-
For some users, when clicking on certain items on the Settings page, they were being redirected to the Dremio home screen.
-
Automatic reflection refreshes were failing with the following error:
StatusRuntimeException: UNAVAILABLE: Channel shutdown invoked
-
Profiles for some reflection refreshes included unusually long setup times for
WRITER_COMMITTER
. -
Wait time for
WRITER_COMMITTER
was excessive for some reflection refreshes, even though no records were affected.
-
In Dremio’s AWS Edition, upgrades from any 21.x.x version to version 22 were failing.
-
Metadata queries (queries using the
TABLE_FILES()
function) that were run on tables that had been altered were failing or returning incorrect results.
-
Some database sources, such as Snowflake, Databricks Spark, and MSAccess, were showing up under Object Storage when adding a source, and they could not be browsed or managed in the Datasets page.
-
Some queries on Parquet datasets in an ElasticSearch source were failing with a
SCHEMA_CHANGE
error, though there had been no changes to the schema. -
dremio-admin clean
is now limited to only temporary dataset versions during cleanup.
- Fixed an issue that was causing metadata refresh on some datasets to fail continuously.
-
Objects whose names included non-latin characters were not behaving as expected in Dremio. For example, folders could not be promoted and views were not visible in the home space.
-
When unlimited splits were enabled and running incremental metadata refreshes on a file-based table, running subsequent raw reflections would fail with a
DATA_READ
error. -
INSERT
,MERGE
,UPDATE
,TRUNCATE
, andDELETE
queries in the SQL Runner were failing with anInvalid path
error when using a partial key/path.
-
In some cases, the number of records returned by CTAS or DML operations did not match the number reported in the query summary below the SQL Editor.
-
GROUP BY
queries that usedGROUPING SETS
were failing withAssertionError
.
- If issues were encountered when running queries against a view, Dremio was returning an error that was unhelpful. The error returned now includes the root cause and identifies the specific view requiring attention.
-
When adding a new S3 source, the Encrypt connection option was not enabled by default, though it was enabled for other sources.
-
CONVERT_FROM
queries were returning errors if they included an argument that was an empty binary string. This issue has been fixed, and such queries have been optimized for memory utilization. -
When using the Catalog API to create a folder in a space, if the folder already existed in the space, the API was returning the
HTTP/1.1 500 Internal Server Error
instead ofHTTP/1.1 409 Conflict
. -
Reflection refreshes on a MongoDB source table were failing with the following error:
unknown top level operator: $not
-
The ODBC driver was ignoring the
StringColumnLength
withSTRUCT
data types, resulting in truncated results.
- Row count estimates for some Delta Lake tables were changing extensively, leading to single-threaded execution plans.
-
In environments with high memory usage, if an expression contained a large number of splits, it could eventually lead to a heap outage/out of memory exception.
-
When a Hive source was added or modified, shared library files created in a new directory under
/tmp
were not being cleaned up and leading to disk space issues.
-
Fixed an issue that was causing slow query performance against Redshift datasources.
-
JDBC clients could not see parent objects (folders, spaces, etc.) unless they had explicit
SELECT
privileges on those objects, even if they had permissions on a child object. -
Fixed an issue in the scanner operator that could occur when a parquet file had multiple row-groups, resulting in a query failure and the following system error:
Illegal state while reusing async byte reader
-
Fixed an issue that could cause the Arrow Flight endpoint performing long queries to encounter a
gRPC GOAWAY
code.
22.1.2 Release Notes (Enterprise Edition Only, October 2022)
Enhancements
- Added a new Admin CLI command,
dremio-admin remove-duplicate-roles
, that will remove duplicate LDAP groups or local roles and consolidate them into a single role. For more information, see Remove Duplicate Roles.
Issues Fixed
-
After upgrading to Dremio 22.1.1, some coordinator nodes failed to start due to a failure in connecting to S3-compatible storage (sources or distributed storage configuration) that required path style access.
-
Following the upgrade to Dremio 22, Support Keys of type
DOUBLE
would no longer accept decimal values. -
Field size for CSV files was limited to 65536 characters, and setting the
limits.single_field_size_bytes
Support Key to a higher value than the limit was not being honored. -
Fixed an issue that was causing
REFRESH REFLECTION
andREFRESH DATASET
jobs to hang when reading Iceberg metadata using Avro reader. -
The
LENGTH
function was returning incorrect results for Teradata sources. -
Fixed an issue that was causing the status of a cancelled job to show as RUNNING or PLANNING.
-
In some deployments, using a large number of REST API-based queries that return large result sets can create memory issues and lead to cluster instability.
-
Following the upgrade to Dremio 22, some queries to Hive 2 metastore external tables with data in S3 were running considerably slower than before.
-
During the reflection matching phase, for the filter pattern in some queries the planner could generate row expression nodes exponentially and exhaust heap memory.
-
Fixed an issue that was causing a
GandivaException: Failed to make LLVM module due to Function double abs(double) not supported yet
for certain case expressions used as input arguments. -
This release includes a number of fixes that resolve potential security issues.
-
In rare cases, an issue in the planning phase could result in the same query returning different results depending on the query context.
-
When skipping the current record from any position, Dremio was not ignoring line delimiters inside quotes, resulting in unexpected query results.
-
Following the upgrade to Dremio 21.2, some Delta Lake tables could not be queried, and the same tables could not be formatted again after being unpromoted.
-
Fixed an issue that was causing failures in Microsoft SQL Server queries that contained a boolean filter set to
true
. -
In some cases, deleted reflections were still being used to accelerate queries if the query plan had been cached previously.
-
Clicking Edit Original SQL for a view in the SQL editor was producing a generic
Something went wrong
error. -
Some queries were failing with
INVALID_DATASET_METADATA ERROR: Unexpected mismatch of column names
if duplicate columns resulted from a join because Dremio wasn’t specifying column names. -
In some cases, queries using the
<
operator would fail when trying to decode a timestamp column in a Parquet file. -
Parentheses were missing when generating the SQL for a view when the query contained
UNION ALL
in a subquery, and the query failed to create the view.
22.1.4 Release Notes (Enterprise Edition Only, October 2022)
Issues Fixed
-
In some cases, queries against a table that was promoted from text files containing Windows (CRLF) line endings were failing or producing an
Only one data line detected
error. -
Following the upgrade to Dremio 22.1.2, when promoting JSON files to tables and building views from those tables, queries against the views were failing with a
NullPointerException
. -
In Dremio 22.1.1, some queries that included a
WHERE
clause were failing with aNullPointerException
during the planning phase. -
Reflection footprint was 0 bytes when created on a view using the
CONTAINS
function on an Elasticsearch table. The reflection could not be used in queries andsys.reflection
output showedCANNOT_ACCELERATE_SCHEDULED
. -
In Dremio 22.0.x, users who were not assigned the
ADMIN
role were getting 0-byte files when attempting to download query results, while downloads were working as expected in previous releases. -
Fixed an issue that was causing certain queries to fail with a
Max Rel Metadata call count exceeded
error. -
After changing the engine configuration, some queries were failing with an
IndexOutOfBoundsException
error. -
JDBC clients could not see parent objects (folders, spaces, etc.) unless they had explicit
SELECT
privileges on those objects, even if they had permissions on a child object.
22.1.5 Release Notes (Enterprise Edition Only, November 2022)
Issues Fixed
-
The
queries.log
file was showing zero values forinputRecords
,inputBytes
,outputRecords
,outputBytes
, andmetadataRetrieval
, even though valid values were included in the job profile. -
For Parquet sources on Amazon S3, files were being automatically formatted/promoted even though the auto-promote setting had been disabled.
-
When saving a view, datalake sources were showing up as a valid location for the view, but such sources should not have been allowed as a destination when saving a view.
-
Improved reading of double values from ElasticSearch to maintain precision.
-
An error in schema change detection logic was causing refresh metadata jobs for Hive tables to be triggered at all times, even if there were no changes in the table.
-
This release includes performance improvements for incremental metadata refreshes on partitioned Parquet tables.
-
Dremio was generating unnecessary exchanges with multiple unions, and changes have been made to set the proper parallelization width on JDBC operators and reduce the number of exchanges.
-
On catalog entities, ownership granted to a role was not being inherited by users in that role.
-
In some environments, Dremio was unable to read a Parquet statistics file in Hive during logical planning, and the query was cancelled because planning phase exceeded 60 seconds.
-
Some queries using a filter condition with
flatten
field under a multi-join were generating a NullPointerException. -
When a materialization took too long to deserialize, the job updating the materialization cache entry could hang and block all reflection refreshes.
-
When trying to use some custom garbage collection value in JVM options, the option was being switched to
UseParallelGC
, which would cause performance degradation. -
CONVERT_FROM()
did not support all ISO 8601 compliant date and time formats. -
An aggregate reflection that matched was not being chosen due to a cost difference generated during pre-logical optimization.
-
Fixed an issue causing the error “Offset vector not large enough for records” when copying list columns.
-
Fixed an issue that was affecting the accuracy of cost estimations for DeltaLake queries (i.e., some queries where showing very high costs).
-
If Dremio was stopped while a metadata refresh for an S3 source was in progress, some datasets within the source were getting unformatted/deleted.
-
Frequent, consecutive requests to the Job API endpoint to retrieve a Job’s status could result in an
UNKNOWN
StatusRuntimeException error. -
Fixed an issue where Glue tables with large numbers of columns and partitions would not return results for all partitions in the table. The fix requires table metadata to be refreshed via
ALTER TABLE REFRESH METADATA
to take effect. -
Updated
org.apache.parquet:parquet-format-structures
to address a potential security vulnerability [CVE-2021-41561].
22.1.7 Release Notes (Enterprise Edition Only, December 2022)
Issues Fixed
-
Fixed an issue that was affecting fragment scheduling efficiency under heavy workloads, resulting in high sleep times for some queries.
-
After offloading a column with type
DOUBLE
and offloading again to change the type toVARCHAR
, the column type was stillDOUBLE
and read operations on the table failed with an exception. -
ALTER TABLE
, when used with a column masking policy, was not handling reserved words with double quotes. -
Heap usage on some coordinator nodes was growing over time, requiring a periodic restart to avoid out of memory errors.
-
The AWSE activation page was no longer showing the expiration date for a license key.
22.1.20 Release Notes (Enterprise Edition Only, March 2023)
What’s New
-
This release adds support for reading
TIME
andTIMESTAMP
microseconds in Parquet files. Microseconds are truncated and the value is stored as milliseconds. -
Added support for
timestamp
tobigint
coercion in Hive-Parquet tables.
Issues Fixed
-
Following the upgrade to Dremio 22.1.7, Power BI Desktop and Gateway may not have been able to connect to Dremio via Azure Active Directory.
-
In some cases, XML responses from AWS Glue were not being handled properly and causing queries to fail.
-
Fixed an issue that was causing slow query performance if the query contained an
ORDER BY
clause. -
Users were able to upload files outside of their home space. File uploads are only permitted from a user’s home space or from any folders in the home space.
-
Fixed an issue with Decimal functions that was leading to bad results when
exec.preferred.codegenerator
was set tojava
. -
Incorrect values were being returned for
boolean
columns during filtering at Parquet scan. -
Fixed some issues that were causing performance degradation with the
REGEXP_LIKE
SQL function. -
In some cases, a
MERGE
query with anINSERT
clause was inserting columns in the wrong order. -
Fixed an issue with the Jobs page that could lead to high heap memory usage when the content of the SQL query was unusually large.
-
Dremio now performs coercion to compatible types (such as
INT
andBIGINT
toBIGINT
), instead of strict matching, to address an issue with Elasticsearch mappings forgotten during refresh. -
If a subquery expression was used after an aggregate and the same expression was duplicated in a
WHERE
clause, a validation exception was being encountered. -
Fixed an issue that was resulting in repeated role lookups during privilege checks and causing performance issues.
-
Metadata refresh queries that were cancelled because metadata was already available no longer show as failed.