On this page

    20.0.0 Release Notes (December 2021)

    Known Issue

    Issue:
    After upgrading from 18.x or 19.x to 20.0, users encountered the error “Failed to get iceberg metadata” when querying datasets. This issue occurs because of how the user’s metadata was stored in Iceberg prior to the upgrade.

    Workaround:
    After the upgrade to 20.0 is complete, do the following for all affected datasets:

    1. Use ALTER PDS to forget the metadata for affected datasets (see Forgetting Physical Dataset Metadata).
    2. Use ALTER PDS to refresh the metadata for affected datasets (see Refreshing Physical Dataset Metadata).

    Breaking Change

    A new logback.xml file is included as part of Dremio 20.0’s new structured logging functionality. This file is included with every Dremio installation/upgrade files and is typically skipped during installation. However, with Dremio 20.0 your original logback.xml file must be overwritten with the file provided in the installer. If you do not use the new file provided with the upgrade, then Audit Logging will not work and queries.json will remain empty.

    What’s New

    Audit Logging

    For organizations subject to compliance and regulation where auditing is regularly required, Dremio now offers full audit logging. With this log file, all user activities performed within Dremio are tracked and traceable via the audit.*.json. Each time a user performs an action within Dremio, such as logging in or creating a virtual dataset, the audit log captures the user’s ID and username, object(s) affected, action performed, event type, SQL statements used, and more.

    By default, audit logging is enabled and stored in the same location as all other log files.

    Aggregation Spilling in All Cases (Preview)

    Previously, Dremio spilled to disk when performing all aggregation operations, with two exceptions: 1) when calculating the approximate count distinct of a column and 2) when a minimum or maximum was applied to a string column. If you processed more data than could be handled by the system’s available memory, customer queries would fail due to a lack of sufficient memory needed to complete the query.

    These calculations, min/max on string column (generally available) and NDV() (preview), have been moved to the vectorized hash aggregation spill operator. Now, in the event of a query requiring more memory than is presently available in the system, the operator containing these calculations will spill data to disk as needed, thus allowing the query to continue processing and ultimately complete.

    To use the NDV() function with the vectorized hash aggregation spill operator, enable the support key: exec.operator.vectorized_spill.ndv.

    Support for Authenticating through Azure Active Directory from Power BI

    Support now exists for using an organization’s Power BI credentials with Azure Active Directory (AAD) as an identity provider (IdP). As part of this functionality, AAD gives the Dremio service a JSON web token (JWT) at the end of the Azure AD OAuth flow, after which Dremio verifies the token and authorizes a user session until its associated expiration.

    Ranger Row Filtering & Column-Masking

    For Hive sources with Apache Ranger authorization configured, Dremio now offers full support of external column-masking and row-filtering via Ranger security policies. This functionality offers row and column controls over the previous whole-table/view access controls, local row permissions, and column-masking in queries offered historically. Using the Ranger external security service, Dremio now enforces external policies at query runtime.

    The following filtering/masking options are supported:

    • Row Filtering
      • Valid WHERE clauses on the table
    • Column-Masking
      • Redact - Replaces all alphabetic characters with x and all numeric characters with n.
      • Partial mask: show last 4 - Displays only the last four characters of the full column value's.
      • Partial mask: show first 4 - Displays only the first four characters of the full column value's.
      • Hash - Replaces all characters with a hash of the entire cell's value.
      • Nullify - Replaces all characters in the cell with a NULL value.
      • Unmasked (retain original value) - No masking is applied to the cell.
      • Date: show only year - Displays the year portion of a date string, defaulting the month and day to 01/01.
      • Custom - Specifies a custom column masked value or valid Dremio expression. Custom masking may not use Hive UDFs.

    Microsoft Azure Synapse Analytics Support

    An ARP connector is now available on Dremio that allows for integration with Azure Synapse Analytics dedicated SQL pools. This option is available for immediate use by adding a new External Source from the Dremio interface.

    Logback Updated

    Logback was updated to v1.2.9 to mitigate CVE-2021-44228. This utilizes a new version of the library, which disables certain JNDI features known to cause issues with log4j 2.x. While Dremio is not vulnerable due to logback configurations being inaccessible externally and not using JNDI/JDBC features, this was done as a general security best practice.

    Other Enhancements

    • As of v20.0, Dremio now supports JDK 11 for on-premise installations. YARN and AWSE are not supported. Docker images will be available for both JDK 8 and 11.
    • When deleting a user from Dremio, the username or email address associated with the record will display in the confirmation message.
    • When reading data from MongoDB, users may now set the batch size for reading data via the Sample Size source setting. Simply enter a custom value to indicate the number of documents Dremio must sample to determine the schema. Additionally, users may also specify if the sample should occur from the beginning or end of the collection.
    • Dremio users may now create nested roles, or child roles assigned to a parent role. These nested roles inherit of the privileges set at the parent level in addition to those granted specifically to the nested role. This allows for even more fine-grain access management for users based on role type. Currently, this may only be done via the SQL editor using the GRANT ROLE TO ROLE command.
    • For organizations using ADLS v2 sources, Dremio now supports adding whitelisted containers using AAD credentials without the need for Azure role-based access control (IAM role). Only permissions to access the container (read and write) must be set. From the source’s Settings dialog, under the Advanced Options tab, users may set a specific directory inside a container using AAD credentials wherein subdirectories of that path may be accessed using only read permissions or read/write access (read/write must be granted at the container levels at minimum, or also the end directory to add sources). The path must follow the format of container_name/dir0/.../dir_name.
    • Dremio now offers an expression splitting cache, which helps to avoid performing splitting work for the same expression repeatedly. This allows for the separation of actual data from the instructions regarding how to handle these splits, the main benefit being to reduce your bandwidth significantly. This cache may be enabled or disabled using the exec.expression.splits_cache.enabled support key. By default, this functionality is enabled for all organizations that upgrade to 20.0.
    • A new column is available on the Job Profile page under the Phase section, which now allows you to see peak memory consumed by incoming buffers.
    • Added a new environment variable to the dremio-env file (DREMIO_GC_LOG_TO_CONSOLE="no") to configure whether garbage collection sends messages only to the console or logs. If set to "yes", the DREMIO_LOG_DIR environmental variable is ignored and GC logs are sent only to the console. If set to no, logs are instead sent to the log file.
    • Updated Dremio’s supported version of the Azure.Storage.Common library to v12.14.1, at the recommendation of Microsoft. Organizations using older versions of Azure storage libraries occasionally encountered data corruption issues, which is addressed with the newer SDK version.

    Deprecations

    Mixed Types Support Key Disabled

    In v18.0, support for mixed data types became deprecated. However, the support key to continue using mixed types was left active for users to prepare more fully for this transition. As of Dremio 20.0, the support key for mixed data types is disabled and may no longer be used from the Support Keys page.

    Fixed Issues

    Users attempting to obtain Oracle row counts noticed a significant delay.
    This issue has been addressed so that the Oracle RDBMS source will now use table statistics to determine the row count of a table, provided this information is present and not stale. If this fails, then Dremio will revert to the slower COUNT(*) query.

    Users encountered error messages with MongoDB and Elasticsearch plugins due to nodes being unable to copy.
    This issue has been addressed so that users may now copy nodes without triggering error messages.

    Users attempted to run queries with a join clause on an Oracle datasource, but JDBC read them as individual queries for each table despite the clause.
    This issue has been addressed by pushing down TO_DATE(timestamp) and TO_CHAR(numeric, formatStr) for RDBMS sources.

    When attempting to query the sys.privileges table with large catalogs, users encountered an error about Dremio being unable to get the profile for the job.
    This issue has been addressed so that users may now successfully query the sys.privileges table.

    For customers using PostgreSQL, users encountered the error ERROR: collations are not supported by type "char" when selecting columns of the CHAR data type.
    This issue has been addressed so that when selecting columns of the CHAR data type with PostgreSQL, users will no longer receive an error about unsupported collations.

    When querying Oracle, customers received an error stating Invalid row type due to an inability to detect the data type.
    This issue has been addressed so that retrieving the Oracle ROWID columns will no longer trigger an error, but properly retrieve it as VARCHAR.

    Dremio would return the error DATA_READ ERROR: Failure while attempting to read from database when a query was submitted with unsigned integer types.
    This issue has been addressed so that MySQL unsigned integer types are now mapped as bigint to allow for the full range of possible values.

    Customers couldn’t query min/max variable length fields on datasets due to query failure. These failures were caused by insufficient memory due to the group by clauses being unable to spill.
    This issue has been addressed by adding variable length fields to the vectorized hash aggregator operator, which allows spills.

    Users encountered issues with splits in aggregates when encountering an expression blocked by agg-join pushdowns.
    This issue has been addressed by adding a normalizer rule that better-matches aggregate reflections against queries grouped by expressions.

    User queries encountered Gandiva exceptions indicating that Dremio “could not allocate memory for output string.
    This issue has been addressed by fixing an unexpected behavior within the SPLIT_PART function.

    Users encountered issues where converting OR and IN clauses caused issues when expressions were used.
    This issue has been addressed by adding support to handle cases of converting OR/IN clauses with expressions.

    Organizations using JSON sources encountered errors when a NULL field was encountered.
    This has been addressed by not projecting any fields in Dremio with a NULL value.

    Organizations with 1000+ users encountered noticeable load delays when attempting to use the user filter drop-down from the Jobs screen.
    This has been addressed by optimizing the drop-down so that users are loaded rapidly without any performance issues.

    20.1.0 Release Notes (January 2022)

    Known Issue

    Issue:
    After upgrading from 18.x or 19.x to 20.1, users encountered the error “Failed to get iceberg metadata” when querying datasets. This issue occurs because of how the user’s metadata was stored in Iceberg prior to the upgrade.

    Workaround:
    After the upgrade to 20.1 is complete, do the following for all affected datasets:

    1. Use ALTER PDS to forget the metadata for affected datasets (see Forgetting Physical Dataset Metadata).
    2. Use ALTER PDS to refresh the metadata for affected datasets (see Refreshing Physical Dataset Metadata).

    Enhancements

    • PageHeaderWithOffset objects will be excluded from the heap when reading Dremio Parquet files. Instead, column indexes will be used to optimize performance and reduce heap usage when generating page headers and stats.
    • This release adds a new support key, authorizer.auth.cache.expiration_ms, for overriding the default authorization expiry for sources using a Table Authorizer plugin. For sources that support impersonation, the global default expiration is 24 hours and can be changed in the UI. For plugins that do not support impersonation, however, the new support key is the only way to modify the authorization expiry.

    Fixed Issues

    • In some cases, if a Parquet file in a Delta Lake table had many row groups, count(*) queries were failing due to a divide by 0 exception.
    • When using a Hive 2.x data source in an HDP environment with storage based authentication, queries on any Hive tables were resulting in a null pointer exception.
    • In cases involving multiple tables in joins along with filters, RDBMS query pushdowns could result in queries that ambiguously reference columns, resulting in invalid identifier errors.
    • If every value in one column of a MongoDB physical dataset was an empty array, queries were failing with a Schema change detected error. To address this issue, Dremio properly eliminates columns that would result in a NULL data type when doing schema inference from the Mongo records.
    • Running select * on some system tables was failing with the following error: UNAVAILABLE: Channel shutdown invoked
    • When Parquet files contained too many row groups, Parquet metadata was using too much memory and causing outages on the Executor. To avoid this issue, Dremio limits reuse of the Parquet footer when Parquet files contain too many row groups.
    • Non-admin users who had been granted view permissions for job history could view jobs from other users, but the User filter was not available. In this release, non-admin users with permission to view job history can access the User filter on the Jobs page.

    20.2.0 Release Notes (Enterprise Edition Only, March 2022)

    Enhancements

    • In this release, Dremio is now pushing down computation for extra hash join conditions.

    Issues Fixed

    • After upgrading from 18.x or 19.x to 20.x, users encountered the error Failed to get iceberg metadata when querying datasets. This issue occurred because of how the user’s metadata had been stored in Iceberg prior to the upgrade.
    • Fixed a column index issue in RelMetadata that was resulting in some queries on views failing with VALIDATION ERROR: Using CONVERT_FROM(*, 'JSON').
    • Fixed an issue that was causing sockets to remain in a CLOSE_WAIT state while running metadata refresh on an ORC dataset. This resulted in Too Many Open File errors and the cluster had to be restarted to resolve the condition.
    • Following the upgrade to v20.0, and if Force Double Precision was enabled on an Elasticsearch source (Advanced Options), Dremio was trying to coerce non-float fields to double.
    • In environments with high memory usage, if an expression contained a large number of splits, it could eventually lead to a heap outage/out of memory exception.
    • In previous versions of Dremio, for some relational sources that did not support boolean type, using the CAST function to expand a boolean value to a boolean expression was resulting in an Incorrect syntax near the keyword 'AS’ error.
    • In some cases, certain ‘SELECT’ queries that included an ‘ORDER BY’ statement were returning the following error: Serialization is only allowed for SelectionVectorMode.NONE
    • In some queries, window expressions were not getting normalized after substitution, resulting in a Cannot convert RexNode to equivalent Dremio expression error.
    • Queries that worked in previous versions of Dremio were failing with the following error: Job was cancelled because the query went beyond system capacity during query planning. Please simplify the query
    • Some complex join filters were getting dropped, resulting in incorrect query results.
    • The same SELECT query, using the IS_MEMBER() function, was returning different results in different versions of Dremio.
    • When formatting GCS data at a folder level into a PDS or when selecting data from an existing PDS built on GCS, if any data values in the partitioning field included a space, the action would fail with: RuntimeException: the specified key does not exist

    20.2.1 Release Notes (Enterprise Edition Only, March 2022)

    Issues Fixed

    • The IS_MEMBER() function was not working with internal roles, returning false when it should have been returning true.
    • Accelerated queries were not being written to queries.json.

    20.2.2 Release Notes (Enterprise Edition Only, March 2022)

    Issues Fixed

    • When running a specific query with a HashJoin, executor nodes were stopping unexpectedly with the following error: SYSTEM ERROR: ExecutionSetupException
    • Resolved an issue with dropping float columns for ElasticSearch data sources when Force Double Precision was enabled.

    20.2.3 Release Notes (Enterprise Edition Only, April 2022)

    Issues Fixed

    • Fixed an issue that could result in duplicate column names being written by the planner when an expression in the project included a field named *.
    • Fixed an upgrade issue related to RBAC that was generating an unknown error when clicking the Privileges tab on a filesystem-based source.

    20.3.0 Release Notes (Enterprise Edition Only, May 2022)

    Issues Fixed

    • When a CASE was used in a WHERE filter with an AND or an OR, it would be incorrectly wrapped in a CAST, resulting in the following error: DATA_READ ERROR: Source 'sqlGrip' returned error 'Incorrect syntax near the keyword 'AS'.'
    • Fixed an issue that could result in duplicate column names being written by the planner when an expression in the project included a field named *.
    • The is_member SQL function was failing with UnsupportedOperationException when concatenating with a table column.
    • At times, in Dremio’s AWS Edition, the preview engine was going offline and could not be recovered unless a reboot was performed.
    • Resolved an issue with dropping float columns for ElasticSearch data sources when Force Double Precision was enabled.
    • When using Postgres as the data source, expressions written to perform subtraction between doubles and integers, or subtraction between floats and integers, would incorrectly perform an addition instead of the subtraction.
    • In some cases, out of memory errors on Delta Lake tables were occurring if commitInfo was the last line of the JSON, resulting in incorrect file estimates for netFilesAdded, netBytesAdded, and netOutputRows.
    • In this release, json-smart was upgraded to version 2.4.8 to address CVE-2021-27568.
    • A query with not in was returning incorrect result if more than two values were in predicate for certain Hadoop and Hive datasets.
    • Following an upgrade, queries with TO_NUMBER(_Column_,'###') were failing.
    • CAST operations were added to pushed down queries for RDBMS sources to ensure consistent data types, and specifically for numeric types where precision and scale were unknown. In some cases, however, adding CAST operations at lower levels of the query was disabling the use of indexes in WHERE clauses in some databases. Dremio now ensures that CAST operations are added as high up in the query as possible.
    • Intermittent jobs were failing with an IndexOutOfBounds exception while preparing operator details information for runtime filters.
    • Some queries were taking longer than expected because Dremio was reading a STRUCT column when only a single nested field needed to be read.
    • Running ALTER PDS to refresh metadata on a Hive source was resulting in the following error: PLAN ERROR: NullPointerException*

    20.3.1 Release Notes (Enterprise Edition Only, May 2022)

    Issues Fixed

    • Fixed an upgrade issue related to RBAC that was generating an unknown error when clicking the Privileges tab on a file system source.
    • Floats and float lists were not being handled correctly when forcing float fields to double in ElasticSearch.
    • Some IdPs were missing the expires_in field in the /token endpoint response. Dremio will fall back to the exp claim in the JWT. If this claim is missing from the JWT, the default expiration timeout will be set to 3600 seconds.

    20.4.0 Release Notes (Enterprise Edition Only, May 2022)

    Enhancements

    • This release includes a new argument for the dremio-admin clean CLI command to purge dataset version entries that are not linked to existing jobs. See Clean Metadata for more information.
    • The -j argument of the dremio-admin clean CLI command has been extended to purge temporary dataset versions associated with deleted jobs. See Clean Metadata for more information.

    Issues Fixed

    • Updated the Postgres JDBC driver from version 42.2.18 to version 42.3.4 to address CVE-2022-21724.
    • Some IdPs were missing the expires_in field in the /token endpoint response. Dremio will fall back to the exp claim in the JWT. If this claim is missing from the JWT, the default expiration timeout will be set to 3600 seconds.
    • Floats and float lists were not being handled correctly when forcing float fields to double in ElasticSearch.
    • Fixed an upgrade issue related to RBAC that was generating an unknown error when clicking the Privileges tab on a file system source.
    • A NULL constant in reflection definition was causing a type mismatch while expanding the materialization.
    • After enabling Iceberg, files with : in the path or name were failing with a Relative path in absolute URI error.
    • Plan serialization time was not being accounted for in the Sql-To-Rel conversion phase, resulting in planning time missing from profiles as well as longer than usual planning times.
    • An issue with plan serialization was causing longer than usual planning times.
    • When attempting to download certain query results as JSON or Parquet files, the downloaded file size was zero bytes and resulted in an IndexOutofBounds exception.

    20.4.1 Release Notes (Enterprise Edition Only, June 2022)

    Issues Fixed

    • The dremio-admin clean CLI parameter -d (or --delete-orphan-datasetversions) was deleting named dataset versions during clean-up. With this release, only temporary tmp.UNTITLED dataset versions will be deleted.

    20.5.0 Release Notes (Enterprise Edition Only, July 2022)

    Enhancements

    • Added a new Admin CLI command, dremio-admin remove-duplicate-roles, that will remove duplicate LDAP groups or local roles and consolidate them into a single role. For more information, see Remove Duplicate Roles.

    Issues Fixed

    • The dremio-admin clean CLI parameter -d (or --delete-orphan-datasetversions) was deleting named dataset versions during clean-up. With this release, only temporary tmp.UNTITLED dataset versions will be deleted.
    • CONVERT_FROM queries were returning errors if they included an argument that was an empty binary string. This issue has been fixed, and such queries have been optimized for memory utilization.
    • Row count estimates for some Delta Lake tables were changing extensively, leading to single-threaded execution plans.
    • JDBC clients could not see parent objects (folders, spaces, etc.) unless they had explicit SELECT privileges on those objects, even if they had permissions on a child object.

    20.6.0 Release Notes (Enterprise Edition Only, August 2022)

    Issues Fixed

    • This release includes two fixes to resolve potential security issues.
    • Fixed an issue with adding incremental partitions on a MapR-FS source.
    • Some queries on Parquet datasets in an ElasticSearch source were failing with a SCHEMA_CHANGE error, though there had been no changes to the schema.
    • Reflection refreshes were failing on ElasticSearch views that used the CONTAIN keyword.
    • Objects whose names included non-latin characters were not behaving as expected in Dremio. For example, folders could not be promoted and views were not visible in the homespace.
    • Dremio was generating a NullPointer exception when performing a metadata refresh on a Delta Lake source if there was no checkpoint file.
    • In some cases, after adding a new file to a promoted folder on an HDFS source, the file was not reflected in new queries following a refresh.

    20.7.0 Release Notes (Enterprise Edition Only, October 2022)

    Issues Fixed

    • In some cases, queries against a table that was promoted from text files containing Windows (CRLF) line endings were failing or producing an Only one data line detected error.
    • Fixed an issue that was causing REFRESH REFLECTION and REFRESH DATASET jobs to hang when reading Iceberg metadata using Avro reader.
    • Reflection footprint was 0 bytes when created on a view using the CONTAINS function on an Elasticsearch table. The reflection could not be used in queries and sys.reflection output showed CANNOT_ACCELERATE_SCHEDULED.
    • Following the upgrade to Dremio v20.3, the Admin CLI remove-duplicate-roles command was failing, and output was empty for dry runs.
    • This release includes a number of fixes that resolve potential security issues.
    • Clicking Edit Original SQL for a view in the SQL editor was producing a generic Something went wrong error.
    • Some queries were failing with INVALID_DATASET_METADATA ERROR: Unexpected mismatch of column names if duplicate columns resulted from a join because Dremio wasn’t specifying column names.
    • When unlimited splits were enabled, users were seeing an Access denied error for queries run against Parquet files if impersonation was enabled on the source.
    • When Iceberg features were enabled, the location in the API was incorrect for some tables in S3 sources.
    • If unlimited splits and Iceberg features were enabled and a table contained a null column, metadata refresh jobs and queries were failing.
    • Following upgrades to Dremio 18, promotion of HDFS-based datasets was failing if both unlimited splits and the use of Iceberg tables were enabled.

    20.8.0 Release Notes (Enterprise Edition Only, November 2022)

    What’s New

    • Added support key store.parquet.async.enable_timestamp_check with the default value set to true. Setting this key to false disables the timestamp check for asynchronous reads.

    Issues Fixed

    • Following the upgrade to Dremio 20.x, is_member(table.column) was returning zero results on views that used row-level security.

    • Improved reading of double values from ElasticSearch to maintain precision.

    • If Dremio was stopped while a metadata refresh for an S3 source was in progress, some datasets within the source were getting unformatted/deleted.