25.0.0 Release Notes (April 2024)
Breaking Changes
Dremio no longer supports Java 8. A Java 11 SE JDK is now required. Failing to install a Java 11 SE JDK will result in an error at startup. In your
dremio-env
config files, you may need to remove any Java command line options that are not supported by Java 11 from theDREMIO_GC_OPTS
andDREMIO_JAVA*EXTRA_OPTS
variables. Yarn users may need to change the engine configuration to provide the path to a valid Java 11 environment by setting theJAVA_HOME
environment in the engine properties.DX-86534ZooKeeper 3.4 has reached end-of-life and is no longer supported. Using ZooKeeper 3.4 will result in an error at startup. Dremio recommends ZooKeeper 3.6 or later.
DX-88450Dremio now throws an error and logs a warning event for queries that include ambiguous columns, including queries for creating views. The error message indicates that the column name is ambiguous:
com.dremio.common.exceptions.UserRemoteException: VALIDATION ERROR: Column '$col_name' is ambiguous
. For example, in the following query, the columnid
is ambiguous:DX-83702 DX-86763 DX-87539 DX-85230SELECT * FROM (SELECT id, 2 AS id FROM (VALUES (1, 'one')) AS t(id, name))
To resolve the issue, rewrite the query to remove the ambiguity. For example:
SELECT * FROM (SELECT id, 2 AS id0 FROM (VALUES (1, 'one')) AS t(id, name))
The
24.2-hive-universal
package is deprecated in 25.0.0. If you have a Hive 2 data source, follow the instructions for upgrading to 25.0.0. We recommend that you invest extra time to test Hive 2 use cases in a test environment before deploying to production.DX-86273Renamed support key
planner.writer.round_robin'
toplanner.writer.round_robin
.DX-85350
Known Issues
As of version 25.0.0, Dremio supports encrypted data source credentials. For this reason, when you upgrade to Dremio 25.0.0, if you want RocksDB to contain only encrypted credentials for your existing data sources, you must clear the RocksDB cache using the following steps:
- Run
dremio-admin upgrade
. - Run
dremio start
and wait for Dremio to start up. - Run
dremio stop
. - Run
dremio-admin clean --compact
. - Run
dremio start
.
To confirm that all existing data source credentials were encrypted successfully, check the server log from step 2 for messages like these:
2024-03-19 18:17:02,209 [main] INFO c.dremio.exec.catalog.PluginsManager - Successfully migrate the source [s3]. Took 4531 milliseconds.
2024-03-19 18:17:02,236 [main] INFO c.dremio.exec.catalog.PluginsManager - Successfully migrate the source [glue]. Took 26 milliseconds.
2024-03-19 18:17:02,236 [main] INFO c.dremio.exec.catalog.PluginsManager - Did not need to migrate the source [<source_name>]. Took 26 milliseconds.
2024-03-19 18:17:02,236 [main] INFO c.dremio.exec.catalog.PluginsManager - Completed sources migration. Total: 4611 milliseconds.
What's New
Enabled the memory arbiter by default in order to monitor the usage of four key operators: HASH_AGGREGATE, HASH_JOIN, EXTERNAL_SORT, and TOP_N_SORT. This usage is monitored across all queries running on an executor to improve how the executor utilizes its direct memory and to reduce OutOfMemoryException errors.
DX-48798- If the memory arbiter detects that the memory usage is too high, then the memory usage will be reduced in these two ways:
- Starting with the biggest consumers, some of these operators will need to reduce their memory usage mainly by spilling to disk.
- Memory allocations will be blocked.
- If the memory arbiter detects that the memory usage is too high, then the memory usage will be reduced in these two ways:
Changes to the logback configuration are now automatically applied without requiring a restart. To ensure that this feature is enabled when you upgrade to Dremio 25.0.0, take care to avoid replacing the installed
conf/logback.xml
file with your backup copy.DX-56684Enabled HASH_JOIN to spill to disk by default when the memory allocated for a query is fully utilized.
DX-48798Out-of-the-box observability metrics are now available for user activity and jobs such as most active users, longest running jobs, most queried datasets, and more. See the Settings > Monitor page to see these metrics.
DX-86592 DX-83785Improved the robustness of the embedded metadata pointer store.
DX-85034Added support for column mapping within Delta Lake tables, effectively supporting minReaderVersion 2.
DX-62046 DX-87465Enabled checksum-based verification for Azure Blob Storage and Data Lake Gen 2 sources to ensure data integrity during network transfers.
DX-66932Added support for the ARRAY_FREQUENCY SQL function. It takes an array as input and produces MAP with array values as keys and corresponding frequencies as values.
DX-67298You can use the Recommendations API to submit job IDs of jobs that ran SQL queries, and receive recommendations for aggregation reflections that can accelerate those queries. See Recommendations for more information.
DX-68447Added support for creating reflections on views and tables with row-access and column-masking policies defined on any of the underlying anchor datasets. See more information.
DX-68923 DX-89495Added support for configuring reflection refreshes to occur on a schedule.
DX-68532Added the configuration option
services.coordinator.web.auth.login_additional_latency_millis
for ensuring that login successes and failures take about the same amount of time. This makes all login requests (successful or not) slower, which makes brute force attacks harder. This configuration option can be turned off. It is on by default.DX-83373Added the SKIP_FILE option to the COPY INTO SQL command. The SKIP_FILE option specifies that the COPY INTO operation should stop processing the input file at the first error it encounters.
DX-84448You can now refresh reflections by using an API method,
ALTER TABLE
, andALTER VIEW
. You can also refresh reflections on views by using the Catalog API.DX-84529Added support for getting recommendations about what default raw reflections to create.
DX-84616Added support for showing the date and time that a reflection's data was last refreshed. If the refresh is running, failing, or disabled, the value is 12/31/1969 23:59:59. The date and time are available in the Dremio console and via the Reflection API.
DX-84702Added two new ways for starting the refresh of a reflection:
- On the Settings > Reflections page, hover over the row about the reflection and click the refresh icon.
- In the Advanced view of the reflections editor, click the refresh icon above the table that describes the content of the reflection.DX-84774
Added support for reading Apache Iceberg tables with equality deletes.
DX-84522Added support for Hive on GCS.
DX-84898Added a new refresh status: Pending. This status means that the refresh of a reflection will begin after the refreshes of its anchor and all downstream tables and views are finished.
DX-84941Added support for ZooKeeper 3.5.6 and later.
DX-53228Disabled C3 caching during the loading of Parquet source files via the COPY INTO operation, thereby reducing cache contention with other query workloads.
DX-85365Improved Dremio's capabilities for concurrent DML operations on Iceberg tables and improved error messaging for concurrent load failures.
85437Added to Reflection Summary objects of the Reflection API and the SYS.PROJECT.REFLECTIONS table the error message that explains the most recent failure of a refresh of a reflection. No message appears if no refresh has yet been attempted, no failure has occurred, or a successful refresh has followed a failed one.
DX-85499Added support for performing incremental refreshes on reflections that are defined on views that use joins.
DX-84768DX-85818Changed the tabs in the SQL runner to display the most recent results of a query, if the results are available from the job history, without the user having to run the query again.
DX-85843Added support for copy_errors() table function on Parquet tables.
DX-87332Removed the following support keys because they were enabled by default over several major releases:
dremio.deltalake.enabled
(introduced in 14.0, enabled by default in 17.0)store.deltalake.hive_support.enabled
(introduced as enabled by default in 24.0)store.deltalake.spark_support.enabled
(introduced as enabled by default in 24.1)dremio.deltalake.time_travel.enabled
(introduced as enabled by default in 24.2)dremio.execution.support_unlimited_splits
(introduced as enabled by default in 21.0)dremio.iceberg.enabled
(introduced in 11.0, enabled by default in 21.0)dremio.iceberg.ctas.enabled
(introduced as enabled by default in 22.0)dremio.iceberg.rollback.enabled
(enabled by default in 24.0)DX-87789 DX-87491 DX-53796 DX-87898
Added support for limiting access to specified databases on Glue sources.
DX-87812 DX-88223 DX-88420 DX-87811Upgraded Netty libraries to version 4.1.104.
DX-86156Added daily catalog maintenance tasks to trim history of views to a maximum of 50 records per view. This limits the storage needed for datasetVersions records in the KV store.
DX-86156 DX-87549To improve reflection observability, in the Reflection tab in the settings, the Dataset column is now wider and truncates after two lines. Also, users now receive a notification if the materialization cache is uninitialized for reflections as well as a message when hovering on the status icon for reflections whose caches are initializing.
DX-86891 DX-86890In the Reflection tab in the settings, users can now retry a refresh on all unavailable reflections.
DX-86889Reflection recommendations are now associated with the corresponding job IDs.
DX-86726 DX-86672Improved reliability and memory efficiency for Dremio coordinators.
DX-86245 DX-86675Privilege changes are processed more quickly in the Dremio console.
DX-87547To improve performance, users can now push filters past sort operations.
DX-88119No data is read in the REFRESH REFLECTION job for reflections that are dependent only on Iceberg, Parquet, Avro, non-transactional ORC datasets, or other reflections and have no new data since the last refresh.
DX-86353
Issues Fixed
Fixed the handling of SQL functions, such as LOWER, UPPER, and REVERSE, in queries on system tables.
DX-52626Reduced the heap memory used by the SORT operator.
DX-53594TCP-DS queries no longer fail with an error that says the table or column is not found.
DX-87797AWSE upgrades no longer fail with the error
Unexpected global state
.DX-88393Fixed gRPC exceptions in the Dremio console due to improper handling of transient server errors.
DX-25300The APPROX_COUNT_DISTINCT function now properly calculates the approximate count distinct rather than the exact count distinct.
DX-84197The Save button for reflections defined on views in spaces would be enabled for public users who have only SELECT, EDIT, and VIEW REFLECTION privileges. Such users still were correctly prevented from modifying reflections, as clicking Save did nothing.
DX-84684Discontinued the hive-universal build. As of this change, Hive 2.x sources are driven by Hive 3 plugin in the main build. Hive 2 libraries and artifacts (and the Hive 2 Dremio plugin itself) are omitted from the installation directory.
DX-85203Added the
dremio-job-id
property to the metadata for Iceberg tables in Glue sources.DX-85379Fixed an issue where certain queries returned incorrect results when multiple Nullable columns were referenced in conditions with OR operators.
DX-85581Added a check to determine whether users running the COPY INTO command have SELECT privileges on either the source storage location specified in the FROM clause or on each individual source file mentioned in the FILES clause.
DX-85977Fixed an issue that allowed reflections to be created when their definitions included UDFs that contained context-sensitive functions.
DX-86078Dremio no longer caches CURRENT_DATE_UTC and CURRENT_DATE during query planning, which was causing incorrect results. As a result, queries that use CURRENT_DATE_UTC and CURRENT_DATE have some performance latency in favor of accurate results.
DX-86078Fixed an issue that caused an aggregation reflection sometimes to be created automatically when a raw reflection was created.
DX-85098Fixed an issue that caused a message about a failed query to appear after the switch from one SQL tab to another.
DX-86514Fixed an issue in the SQL Runner where expanding the large data field by using the ellipsis (...) caused the results to be unresponsive when the data included DateTime objects.
DX-86541Fixed an issue that caused the SQL function APPROX_COUNT_DISTINCT to return null instead of 0 in some cases.
DX-86597Ensured that group policy grants are respected in AWS Lake formation when Dremio is used with Okta.
DX-86923Fixed an issue that occurred if "All tables" was selected during AWS Lake formation and the granting of a new permission that was meant to apply to all tables within the selected database.
DX-86925Fixed an issue that caused the details of jobs not to be updated in the Dremio console when jobs were running.
DX-86983Fixed an issue that caused the creation of a new branch to update the context of the SQL Runner automatically.
DX-87039Fixed an issue that could cause the
skip_file
option of theCOPY INTO
SQL command not to handle Parquet file corruption issues if they are in the first page of a row group.DX-87884Reduced the severity of log messages about function lookup for Hive functions so that they are no longer listed as errors.
DX-83930The Settings button is now shown at the top-right of the page when navigating to a Nessie source.
DX-88053Authentication with a secret resource URL now works properly for Amazon Redshift, Oracle, and PostgreSQL data sources.
DX-88293In Kubernetes environments, the Dremio load balancer service now remains active during dremio-admin operations.
DX-85396Reading Iceberg tables with positional deletes no longer causes an IndexOutOfBoundsException.
DX-87252The Details panel is no longer blank when opened from the menu in a Nessie source.
DX-87923The commit history for MERGE commands run in the Dremio console no longer show the user ID instead of the user email.
DX-88377Creating a raw reflection on a dataset on which no reflections are already defined no longer creates an aggregation reflection.
DX-86098The Go to Table () button now appears on the Datasets page for tables and views when the Query on click preference is disabled. The button also appears on lineage graphs for tables.
DX-85964 DX-84694You can disable analytics data from being sent to Intercom using the
dremio.ui.outside_communication_disabled
support key.DX-86316