On this page

    24.0.0 Release Notes (February 2023)

    Breaking Changes

    Mixing Implicit and Explicit Joins

    If you mix implicit and explicit joins, only the last of the implicitly joined tables can be in the ON clause. Otherwise you will receive a “Table not found” error. For example, the following query results in the error Table 'c' not found.

    select *
    from
            NAS2."customer.parquet" c,
            NAS2."nation.parquet" n
            left join
            NAS2."orders.parquet" as o
     on c.c_custkey = o.o_custkey
    

    The solution is to replace the comma with an explicit cross join like this:

    select *
    from
            NAS2."customer.parquet" c
            cross join
            NAS2."nation.parquet" n
            left join
            NAS2."orders.parquet" as o
     on c.c_custkey = o.o_custkey
    

    This is functionally equivalent since implicit joins implement a cross product of the two tables.

    Broadcast Table Hints

    Dremio v24 supports BROADCAST hints in queries. Hints must be entered as /*+ <hint> */, which is standard across data warehouses. In previous versions of Dremio, text enclosed in /* */ was treated as as a comment. Dremio will continue to treat text enclosed in /* */ as a comment unless the first character is +. The use of unrecognized hints will result in an error. For more information, see Distributing Data Evenly Across Executor Nodes During Joins.

    Known Issues

    • This version of Dremio does not support Iceberg tables written with equality deletes.

    • DML operations (INSERT, UPDATE, DELETE, MERGE) are not supported on tables with MAP columns. CTAS is supported on tables with MAP columns.

    • Currently, Dremio cannot read timestamp microseconds from Parquet files which have been written with dictionary encoding. Queries involving a microsecond column will not return any data.

    What’s New

    • This release adds support for Sign-On (SSO) with Microsoft Power BI. For more information, see Enabling Single Sign-On.

    • You can optimize Iceberg tables to maximize the speed and efficiency of data retrieval. Rewrite data files using a compaction process to combine small files into larger files or split large files to reduce metadata overhead and runtime file open costs. For more information, see Optimizing Tables.

    • You can roll back to a previous state of an Iceberg table using either a snapshot ID or a timestamp reference. For more information, see Rolling Back Tables.

    • Dremio’s new COPY INTO SQL command makes it even easier and faster to load data into Apache Iceberg tables, which are a foundational component data lakehouses. With one command, you can now copy data from CSV and JSON stored in Amazon S3, Azure Data Lake Storage (ADLS), HDFS, and other supported data sources into Apache Iceberg tables using the columnar Parquet file format for performance. Dremio efficiently distributes the copy operation across the entire engine to load data more quickly. For more information, see Copying Data Into Apache Iceberg Tables.

    • In the SQL editor, you can now format your SQL using the Format SQL shortcut (Cmd + Shift + f or Ctrl + Shift + f). As long as the current syntax is valid, the SQL formatter applies a conventional style to your query by aligning commands for readability. For more information, see SQL Editor.

    • This release supports the use of BROADCAST hints in queries to distribute data across all executor nodes. For more information, see Distributing Data Evenly Across Executor Nodes During Joins.

    • The LIKE SQL function now supports the ANY, SOME, and ALL keywords. For more information, see LIKE.

    • If you specify an alias for a column or expression in the SELECT clause, you can now refer to that alias elsewhere in a query. For more information, see Table SQL Statements.

    • This release implements a new operator for vectorized hash-join that supports spilling to disk if a query runs out of memory. Spill support for hash-join queries can be turned on by enabling the exec.op.join.spill Support Key.

    • Dremio now includes a new connector for adding IBM Db2 databases as sources. For more information, see IBM DB2.

    • The MongoDB source configuration contains a new setting under Advanced Options to treat field names as case insensitive. When enabled, Dremio will record all known variations of a field name when learning the schema and use all known variations when pushing an operation down to Mongo.

    • The Reflections page under Settings > Reflections now provides real-time observability for reflections, including status, refresh, and usage information. You can use this page to monitor reflections in real time and take advantage of usage metrics to identify and trim reflections that are not accelerating queries.

    • In this release, Delta Lake is a supported table format in Hive sources. Dremio identifies Delta Lake tables if they are created with STORED BY 'io.delta.hive.DeltaStorageHandler'.

    • Dremio now supports parentheses around JOIN subclauses to handle queries from some 3rd party tools.

    • This version of Dremio supports sub-queries in user defined functions (UDFs).

    • Added support for timestamp to bigint coercion in Hive-Parquet tables.

    Issues Fixed

    • Following the upgrade to Dremio 22.1.7, Power BI Desktop and Gateway may not have been able to connect to Dremio via Azure Active Directory.

    • In some cases, a MERGE query with an INSERT clause was inserting columns in the wrong order.

    • Fixed an issue with runtime filter evaluation in cases where columns having a physical data type of timestamp were represented as bigint at the table level.

    • In some cases, with the Arrow Flight SQL ODBC driver, users were getting an error when testing the connection to Microsoft Excel in the ODBC Administrator on Windows.

    • Fixed an issue with Decimal functions that was leading to bad results when exec.preferred.codegenerator was set to java.

    • In some cases, incorrect values were being returned for boolean columns during filtering at Parquet scan.

    • Some queries were failing for MongoDB v4.9+ sharded collections because MongoDB would use UUID instead of namespace.

    • Fixed some issues that were causing poor performance when using the REGEXP_LIKE SQL function.

    • Some queries were performing poorly if they contained an ORDER BY clause.

    • After offloading a column with type DOUBLE and offloading again to change the type to VARCHAR, the column type was still DOUBLE and read operations on the table failed with an exception.

    • Dremio was generating unnecessary exchanges with multiple unions, and changes have been made to set the proper parallelization width and reduce the number of exchanges.

    • ALTER TABLE, when used with a column masking policy, was not handling reserved words with double quotes.

    • LIKE was returning null results when using ESCAPE if the escaped character was one of the Perl Compatible Regular Expressions (PCRE) special characters.

    • Fixed an issue that was affecting fragment scheduling efficiency under heavy workloads, resulting in high sleep times for some queries.

    • In some cases, a MERGE query with an INSERT clause was inserting columns in the wrong order.

    • Heap usage on some coordinator nodes was growing over time, requiring a periodic restart to avoid out of memory errors.

    • Moved from strict matching of types to coercion to compatible types such as INT and BIGINT -> BIGINT, to address an issue with forgotten Elasticsearch mappings during refresh

    • Updated the apiVersion for PodDisruptionBudget to policy/v1 in Helm charts v2 due to the decommissioning of Kubernetes version 1.20.

    • Fixed an issue that was causing a DATA_READ ERROR: Failed to initialize Hive record reader error when trying to read ORC tables.

    • If a query contained CONVERT_FROM() on a large json literal string, the query was failing with an OutOfMemoryException error.

    • Fixed an issue that was resulting in repeated role lookups during privilege checks and causing performance issues.

    • The manifest list table function was causing performance issues for some queries.

    • Fixed an issue where Dremio was not auditing SSO logins that used OpenID identity providers.

    • The Dremio Helm chart admin pod will now use the coordinator service account by default, if configured, to run backup, restore, and other admin tasks that require access.

    • Dremio no longer includes server name and version in the response header.

    • Updated the following libraries to address potential security issues:

      • protobuf-java core to version 3.21.9 [CVE-2022-3171].

      • com.amazonaws:aws-java-sdk-core version to 1.12.261 [CVE-2022-31159].

      • org.yaml:snakeyaml to version 1.30 [CVE-2022-25857].

      • org.apache.calcite.avatica:avatica-core to version 1.22.0 [CVE-2022-36364].

      • com.fasterxml.jackson.core:jackson-databind to version 2.13.0 [CVE-2020-36518].

      • org.curioswitch.curiostack:protobuf-jackson to version 1.1.8 [CVE-2020-7768].