Changelog
This changelog provides a detailed record of the previous 12 months of updates and enhancements we have made to improve your Dremio Cloud experience.
December 17, 2024
Fixed a filter pushdown issue that could cause a query to run slowly or return incorrect results.
Fixed an SSL negotiation issue when connecting to Dremio servers through secure connections.
Updated password change behavior in the Dremio console to more effectively handle UI session termination after password rotation.
Fixed an issue that could cause reading tables from the AWS Glue Data Catalog to be slow.
Fixed an issue where a duplicated table schema could be written to its metadata file.
Fixed an issue where REFLECTION REFRESH
jobs could fail for reflections involving joins in the query plan if field-based incremental refresh was configured on the underlying datasets. These reflection refreshes will now succeed using full refreshes.
Fixed a issue that could occur when you request reflection recommendations for a specific job and the query you want to accelerate contains a subquery.
Fixed an issue that could cause the SQL Runner to display the view definition of the last executed preview instead of the saved view definition.
Navigating to a dataset with dots in the name on the History tab of the Datasets page will now work as expected.
The SHOW TBLPROPERTIES
SQL command will now return the format version for Iceberg tables.
Fixed an issue for SELECT
queries when using LIMIT
and OFFSET
for a value greater than the maximum value for a signed integer. Now LIMIT
and OFFSET
cannot exceed the maximum integer value.
Deprecated COMPACTION
and LOAD MATERIALIZATION
for reflection jobs because they optimize non-Iceberg materialization and Dremio now supports only Iceberg materialization. The sys.project.materializations
table now only shows REFRESH REFLECTION
jobs. COMPACT
and LOAD
are no longer considered reserved keywords for SQL queries.
Fixed an issue that would cause a query on the sys.project.history
tables using 'SYS' uppercase to fail.
Fixed an issue for SELECT
queries when using LIMIT
and OFFSET
for a value greater than the maximum value for a signed integer. Now LIMIT
and OFFSET
cannot exceed the maximum integer value.
December 3, 2024
Fixed an issue that could cause a query failure due to an exceeded timeout after an elapsed query runtime limit.
Fixed an issue that could cause users without the VIEW JOB HISTORY
privilege or ADMIN
role to view jobs executed by other users on the Jobs page.
Fixed an issue that could prevent users from enabling or disabling system-wide acceleration on the Reflections page.
Fixed an issue where CLUSTER
and CLUSTERING
were accidentally added as reserved keywords. They are no longer treated as reserved.
November 19, 2024
The Arrow Flight SQL JDBC driver now supports the project ID parameter for connecting to non-default projects in Dremio Cloud.
Dark Mode is now available in Dremio! You can now choose between light mode, dark mode, or system settings. Try it out by going to Account Settings > Appearance.
In preparation for the upcoming End of Life (EOL) of Amazon Linux 2, Dremio has transitioned the base operating system for its executors from Amazon Linux 2 to Ubuntu LTS. This shift ensures continued support, security updates, and improved compatibility with modern infrastructure and libraries.
Updated the following libraries to address potential security issues:
- Ranger client in Dremio from version 1.1 to 1.2 DX-93529
- Avro from 1.11.3 to 1.11.4 [CVE-2024-47561] DX-96442
Fixed a NullPointerException
that could occur during a metadata refresh due to closing a filesystem object already evicted from the cache.
Improved the sync time for reflection recommendations.
Fixed an issue that could prevent async Azure reads due to a time zone issue in locations east of Greenwich Mean Time (GMT).
Fixed an issue that could prevent users from being able to run or preview a query in the SQL Runner after viewing the History tab for the query on the Datasets page.
Navigating to the wiki of a dataset from the SQL Runner will no longer cause (edited) to appear next to the dataset name.
Fixed an issue that could prevent changes to the project storage settings from updating on the Project Settings page until the page was refreshed.
Fixed a "Could not update dag for engine schedule" issue that could occur when trying to save edits to the engine schedule on the Engines page.
October 30, 2024
Running a SELECT COUNT(*) query now uses Iceberg metadata instead of scanning the entire Iceberg table to return the total number of rows in a table.
For AWS accounts, fixed an issue where the Save button is disabled while editing the configuration in the catalog settings.
Fixed an issue that could prevent users from editing project settings for projects created using an AWS cloud.
Fixed a issue where decorrelating a subquery with an EXISTS statement and an empty GROUP BY clause could result in incorrect data.
October 16, 2024
You can now access the Arctic UDFs via API which supports CRUD actions.
Fixed an issue where file handles (and HTTP connections) were left opened after reading JSON commit logs for Delta tables within a AWS Glue Data Catalog.
Fixed an issue that could prevent a user from scrolling through the wiki content in the Details tab on the Datasets page.
Fixed an issue with "Go to Table" functionality on the Datasets page that could cause the table definition to be blank on the Data tab when multiple partitions from the same column are added to an Arctic table.
Dremio will now notify you when a view's metadata is out-of-date due to schema changes in the underlying views or tables. The notification will appear on the Data panel in the SQL Runner and in the Details and Lineage tabs on the Datasets page.
Fixed an issue that could cause query results to appear in a new tab when cached results are loading in the SQL Runner.
Creating a new tab while a script is executing will now cause a confirmation dialog to appear in the SQL Runner.
Fixed an issue that prevented non-admin users from saving a view using the Save as View button in the SQL Runner.
The Start Time filter on the Jobs page no longer updates to Custom after a user selects a start time filter, leaves for a short time, and then comes back to the page.
The Visual Profile tab on the Jobs page will now show the correct error message when a visual profile cannot be generated.
When hovering over the tooltip for a reflection score on the Reflections page, the daily query accelerated value will be rounded to the nearest integer.
Fixed a NullPointerException (NPE) that could cause VACUUM jobs for reflections to fail.
HASH_JOIN will now randomize the distribution if there are nulls being generated by a join condition in order to avoid sending the data to the same thread and eventually reduce the skew.
The following words were incorrectly made reserved keywords: CLUSTER
, CLUSTERING
.
September 23, 2024
Fixed an issue that could occur when attempting to access datasets in the Data panel in the SQL Runner, resulting in a "Something went wrong" error message.
Fixed an issue that could cause views to not save properly for non-admin users when clicking the Save as View button in the SQL Runner.
September 20, 2024
In Enterprise edition, members of the admin role can now configure an OpenID Connect (OIDC) identity provider for authentication under Organization Settings on the Authentication page or using the Identity Providers API. This new authentication method allows organizations to configure SSO with OIDC-compliant identity providers.
You can now connect to Vertica as a source in Dremio.
Azure regions East US 2 and West US 2 have been added for Dremio Cloud.
Create user-defined functions (UDFs) to extend the native capabilities of Dremio’s SQL dialect and reuse business logic across queries. Because UDFs are native, first-class entities in the Arctic catalog, you can seamlessly experiment on and change UDFs using Arctic's branching capabilities.
New SQL commands have been added for UDFs: CREATE FUNCTION, DROP FUNCTION, DESCRIBE FUNCTION, and SHOW FUNCTIONS. UDFs can also be used in SELECT statements.
Fixed an issue where queries could be stuck in planning and accumulate until a coordinator restart is required.
Resolved an issue with queries against AWS Glue that were failing due to errors when loading an endpoints.json partitions
file.
Fixed an issue where a reflection is given a score of 0 if an error occurs while calculating the score. Now the score will be empty instead of 0.
When no new data is read during REFRESH REFLECTION jobs, the snapshot IDs of the datasets and reflections that they depend on are shown in the Refresh Decision section of the query profile.
Improved logout functionality.
The Edit Rule dialog now auto-populates with information from the existing rule.
You can now open the Details Panel from the options menu on the Datasets page.
The result summary table now sorts cached query results in the summary table on the SQL Runner page in the order that the queries are executed.
You can now see the selected value for a reflection's partition transformation in the reflections editor.
Fixed a compilation issue that could occur when a window function is used with an ARRAY type column.
Fixed an issue that could occur when complex types are returned when splitting a function such as ARRAY_COMPACT.
Fixed an issue that could prevent a reflection score from being provided when running USE to set the query context.
Fixed an issue where a failed reflection could show an incorrect record count and size in the sys.reflections
system table.
Fixed an issue that could cause ANALYZE TABLE to fail when table column names contained reserved keywords.
September 10, 2024
Fixed an intermittent issue that could cause project creation to fail with a ProjectConfigServiceException
. Project creation is no longer prevented due to an interruption when an asynchronous source is being created or updated, causing the source to not update properly.
September 3, 2024
Fixed an issue that could result in a leak from an unclosed connection in Microsoft SQL Server, Oracle, or Dremio cluster data sources.
Fixed an issue that could cause VACUUM CATALOG to fail with a ContainerNotFoundException
. Also fixed a bug that could cause VACUUM CATALOG to fail with an IllegalArgumentException
if a view is created in an Arctic catalog.
August 22, 2024
Dremio now supports writes using merge-on-read in the Apache Iceberg table properties, which creates positional delete files and optimizes DML operations.
A reflection score shows the value that a reflection provides to your workloads based on the jobs that have been executed in the last 7 days.
For reflections on Iceberg tables, a new type of refresh policy is available. You can now automatically refresh reflections for underlying tables that are in Iceberg format when new snapshots are created after an update.
When reflection refresh jobs fail, Dremio now retries the refresh according to a uniform policy.
You can authenticate to a Snowflake source using key pair authentication.
User impersonation is now supported for Microsoft SQL Server sources.
OPTIMIZE TABLE now supports Iceberg tables with equality deletes.
Mapping table columns to the corresponding Parquet columns has been improved for Iceberg tables that are created from Parquet files and have columns without IDs.
Fixed an issue with long calls to AWS Glue sources that could result in a deadlock, preventing the Glue database from appearing as a source in the Dremio console and privileges granted to roles and users from applying properly to that source.
Fixed an issue that could prevent reflections with a row-access or column-masking policy from accelerating queries after an upgrade.
Automatically generated reflection recommendations now appear only if they meet a minimum threshold of value to your workloads.
In the reflections editor, the Refresh Now button no longer appears for failed reflections.
Clicking on a dataset on the Datasets page or clicking the Open Results link on the Job Overview page creates a new tab that is not automatically saved as a script.
Fixed an issue that could prevent reflections from being created for queries that contain an OVER clause with a specified RANGE.
Reduced memory usage when SELECT statements are run from the information schema by adjusting the page size parameter for pagination.
Fixed an issue that could cause the CURRENT_TIME function to return incorrect data when a user's timezone is defined.
Improved the query performance for VACUUM TABLE when using EXPIRE SNAPSHOTS.
Fixed an issue that could prevent partition columns from being applied in INSERT and CREATE TABLE AS statements.
August 12, 2024
Fixed a performance issue that affected queries containing a window function and a large number of batches.
MIN_REPLICAS
and MAX_REPLICAS
are no longer considered reserved keywords for SQL queries.
July 31, 2024
You can now use role-based access control (RBAC) privileges to restrict users and groups from accessing folders and their contents. With this change, admin users must explicitly grant visibility of folders and their contents to users and roles on the Arctic catalog as described in Arctic Privileges. To revert to the previous “open by default” behavior in which all objects are visible to all users in the PUBLIC role, see Inheritance.
For a given query with views, the reflection recommender now provides an aggregation reflection recommendation if possible instead of only default raw reflection recommendations.
AWS Glue lake formation permission cache can now be invalidated by users on demand by using ALTER SOURCE or the Source API. The lake formation tag policy support is also enabled by default.
Results caching improves query performance for non-UI queries with a result set that is less than 20 MB by reusing results for subsequent jobs with the same deterministic query and without underlying dataset changes. To use this feature, you must configure the time-to-live (TTL) rule in your project store to clear the cache.
Improved query planning time for over-partitioned tables with complex partition filters.
A query with an inner join can now match with reflections that contain outer joins.
Added a new Dataset API endpoint, POST /dataset/{id}/reflection/recommendation/{type}
, for retrieving reflection recommendations by reflection type for a dataset.
The Catalog API Privileges endpoint is deprecated. We expect to remove it by July 2025.
In place of the Privileges endpoint, use the Catalog API Grants endpoint to retrieve privileges and grantees on specific catalog objects.
You can click Generate in the reflections tab to get a suggestion for creating an aggregation reflection. Statistics are no longer automatically collected and suggestions are generated when you open the reflections editor.
sys.project.pipe_summary
is a new system table that summarizes high-level statistics for autoingest pipes and is only accessible to admins.
The flow of queries is no longer coupled with query telemetry, which means that failure scenarios in the flow could affect query completion rates. Queries now succeed despite any failures with query telemetry processing or JTS availability, even in the case of incomplete profile information.
Fixed an issue with concurrent dataset modifications that could cause jobs to hang during the metadata retrieval or planning. An inline metadata refresh is now retried automatically after a failure due to a concurrent source modification.
Fixed a bug for complex queries that could result in an error message about the code being too large.
Reflections have been fixed in the following ways:
-
The default selected columns for raw reflections no longer fail to include all columns of a dataset.
DX-89497 -
Queries no longer fail if an underlying default raw reflection becomes invalid for substitution against the view. The workaround is to disable or refresh the reflection.
DX-85139
If an autoingest pipe job has been canceled by a query engine, the pipe job now retries to ingest the canceled batch.
Fixed the following NullPointerExceptions (NPEs) that could occur:
-
When failed jobs details are fetched.
92934 -
When accessing large Delta Lake tables in metastore sources.
DX-67629 -
Where the schema for a Delta Lake table was not captured correctly, leading to a failure to query the table.
DX-92477 -
When running a DML statement on an accelerated table.
DX-91682
Queries no longer fail due to a ConcurrentModificationException when runtime filters are present.
Added a CONFIGURE BILLING privilege so that non-admin users can view and modify billing account data.
To prevent unexpected out-of-memory errors, the Parquet vectorized reader allocates only the necessary amount of memory for scanning deeply nested structures.
Fixed a performance issue for Iceberg tables that could occur when Dremio reads position delete files. Previously, a position delete file could be accessed multiple times by different scan threads. Now all delete rows are read once and joined with the data files.
Fixed a bug that could cause concurrent autopromotion of the same folder path to fail.
In the Dremio console, ideographic spaces now display as regular spaces in the results.
Fixed a bug in the SQL Runner where might not be visible in the Scripts panel for a script with a long name.
Fixed a bug where the commit history may not load for tables or views that reside in hyphenated folders.
The user avatar at the bottom of the left navigation bar now shows the user's first and last initials instead of the first two letters of their username.
When you are editing the preview engine in the Edit Engine dialog, the currently selected instance family is no longer shown in the notification at the top of the dialog.
Scripts have been fixed in the following ways:
-
Switching between scripts while a job is running no longer causes the job to appear in other tabs.
DX-92260 -
Opening a script and applying a transformation on a saved job should now work as expected.
DX-92754 -
Running a subset of a script now highlights the appropriate queries when switching between results tabs.
DX-92143
The reflection data in the job summary of a raw profile will now render successfully even when the accelerationDetails field is skipped.
CAST TIME AS VARCHAR now returns the result in 'HH:mm:ss.SSS' format.
You can now clear the context for the query session by running a USE command without any parameters.
The CONVERT_TIMEZONE function now works properly for Druid data sources.
LEAD and LAG functions with the window set to a value that is greater than 1 no longer produce incorrect results.
July 9, 2024
View schema learning now occurs only for queries that are issued from the Dremio console or reflection refresh jobs.
Queries no longer hang on coordinator startup when the materialization cache takes a long time to start up.
A raw profile is now available as soon as a job is in a running state.
Fixed a bug where duplicate rows could be returned when retrieving usage
objects.
ORDER BY expressions in a subquery should be removed automatically as long as the query does not have LIMIT or OFFSET parameters, although the returned sort order cannot be guaranteed. In this example, ORDER BY deptno
should be removed:
SELECT *
FROM emp
JOIN (SELECT * FROM dept ORDER BY deptno) USING (deptno)
Some databases like Postgres and Oracle support ORDER BY expressions, so you may see different results depending on the target of your query.
July 2, 2024
Reflection recommendations automatically generate for the top 10 most effective default raw reflections based on query patterns from the last 7 days. You can view these recommendations on the Reflections page in the Dremio console.
Added a retry mechanism when reflections are expanded into the materialization cache, which adds fault tolerance to coordinator upgrades and restarts.
User impersonation is now supported for Oracle sources.
The Privileges dialog is improved for managing sources, views, tables, and folders.
You can now bulk delete scripts.
You can specify a column as a MAP data type in CREATE TABLE.
You can use VACUUM CATALOG for Arctic sources on Azure.
Deleting a project in Standard edition no longer results in autoingestion being unavailable.
The usernames in Arrow Flight JDBC/ODBC and Legacy JDBC/ODBC jobs are now shown in the same consistent case regardless of the username case in the connection URL.
Fixed an issue that could introduce duplicate rows in the results for RIGHT and FULL joins with non-equality conditions and join conditions that use calculations.
Updated error messaging for creating or deleting a folder on non-branch references.
Updated the following library to address potential security issues:
- org.postgresql:postgresql to version 42.4.5 [CVE-2024-1597] DX-91055
When you query the information schema, you can now see only the tables and views that you have access to instead of all datasets.
Added a rule that pushes an aggregate below a join if the grouping key is also a join key.
All existing engines without an instance family have been backfilled to either m5d or ddv5 depending on the cloud vendor.
Correlated subqueries that include a filter that doesn't match any rows no longer result in an error message.
Reflection recommendations now occur when plan regeneration is required and the name of the dataset is not fully qualified and contains a period (for example, "arctic1"."@username@dremio.com".v1
).
When a dataset is created in a source, the dataset inherits its owner from the source. Inheritance no longer fails if the source owner is inactive; instead, the dataset owner is now set to the system user.
The author ID no longer appears as the author's name in the commit history after a branch is merged using a SQL command.
Dataset version sorting no longer results in incorrect "not found" error messages when listing datasets in the Dremio console.
Reflections with row and column access control now produce the correct results when algebraically matched.
The current owner of a script is now correctly displayed in the Dremio console.
Certain font ligatures are no longer displayed in the results table on the SQL Runner page.
Disabling Download Query Profiles for admin and users now correctly restricts users from downloading profiles.
The raw query profile has been improved to include Execution Resources Planned and Execution Resources Allocated planning phases to help with debugging execution-related issues.
Users who do not have the required privileges to view all user and role names when using the Dremio console to manage privileges can add privileges by entering users' and roles' exact names in the Add User/Role field.
You can now use the Secret Resource URL when adding an Oracle source, which could not be used previously in an Oracle source due to a "missing password" error.
In the Advanced view of the reflections editor, you can select the SQL functions to use as measures in the Measure column for aggregation reflections.
The listing of catalog items no longer times out due to a very large number of catalog objects. To address the issue, optional pageToken
and maxChildren
parameters have been added to the API endpoints for getting catalog entities with children by ID or by path.
Indexing the same JSON into CONVERT_FROM multiple times no longer leads to incorrect results.
June 5, 2024
The Dremio JDBC driver now supports parameters in prepared statements.
You can use autoingest pipes to set up and deploy event-driven data ingestion pipelines directly in Dremio. This feature is in preview for Dremio Cloud and supports Amazon S3 as a source.
The retention period of jobs history stored in Dremio has been reduced from 30 days to 7 days, which improves job search response times. Use the jobs history system table to get the jobs history for jobs that are older than 7 days.
DML and CTAS are supported for the query_label
workload management rule.
There are two new methods to start refreshing a reflection.
When an incremental refresh materialization is deprecated, you no longer see a DROP TABLE job in the job history but the reflection data is synchronously cleaned up as a part of reflection management.
For Azure projects, you can now create a table or view when the name of the table or view has a dot such as "arctic1"."@username@dremio.com".v1
.
Users (including admin users) can now use the Scripts API to manage scripts from API clients for migration, management during owner offboarding, and other purposes.
All write operations for Arctic views are written in the new Iceberg Community View format (V1). Existing views are still supported in the old format (V0), although any update to an existing view rewrites the view in the new format. Read operations are supported for both V0 and V1. To see which view format is being used, open the Details panel or metadata card for the view. For Dialect, the V0 views show DREMIO and V1 views show DremioSQL.
ON CONFLICT and DRY RUN clauses are now available for MERGE BRANCH.
New SQL commands have been added for autoingest pipes: CREATE PIPE, ALTER PIPE, DESCRIBE PIPE, and DROP PIPE.
When a reflection that depends on certain file formats (Iceberg, Parquet, ORC, or Avro) is due for a refresh and has no new data, a full refresh is no longer performed where data is read in the REFRESH REFLECTION job. Instead, only a new refresh is planned and a materialization is created, eliminating redundancy and minimizing cost for the reflection.
Default raw reflection matching can now be used during REFRESH REFLECTION jobs.
Reflections are no longer deleted when a reflection refresh fails due to a network error or the source being down.
Duplicate default raw reflection recommendations are no longer created when querying a view that contains joins.
When multiple jobs are submitted to the reflection recommender, the reflection recommender no longer errors out if some of the jobs are ineligible for recommendation. Instead, reflections are recommended for eligible jobs.
TBLPROPERTIES
(table properties) for Iceberg tables are now saved in Apache Hive.
Reading a Delta Lake table no longer fails with an error about an invalid Parquet file.
The AWS Lake Formation tag authorizer now considers database-level tags.
Dremio now honors workload management rules that contain the query_label
function.
When using an IAM role and attempting to add an AWS Glue source, you no longer see an error message about loader constraint violation due to AWS Glue authentication.
Reflections no longer incorrectly match into queries containing ROLLUP.
On the Organization page, hovering over Learn more for Arctic and selecting the Get Started with Arctic link opens the updated Getting Started with Dremio page.
During the signup process, the catalog is no longer missing in the CloudFormation Template (CFT) parameters if the CFT failed the first time and you click Rerun CloudFormation template.
If you delete a branch or a tag that you are currently on, you are now rerouted to the Data page for the default reference instead of seeing an error message.
Tooltips on the Catalog page are now displayed correctly on Firefox.
Dataset names are no longer truncated incorrectly.
An error message no longer appears when loading results of multiple jobs that executed on different engines.
Error messages that appear when a user tries to view the wiki of the folder for which they don't have privileges now describe the problem more clearly.
Creating a new script while on a script that displays an error message no longer causes the error message to persist.
You can now use decimals in ARRAY_REMOVE and ARRAY_CONTAINS functions.
NPE has been fixed when ARRAY_CONTAINS is used in a WHERE clause.
New line characters (\n) are supported in regex matching.
Incorrect splitting no longer occurs when the value contains UNICODE characters like 'á'.
May 22, 2024
The system_iceberg_tables
plugin now uses the project ID in the project store in order to isolate data for each project.
May 14, 2024
Integrating with AWS Lake Formation is now supported, which provides access controls for datasets in the AWS Glue Data Catalog and defines security policies from a centralized location that may be shared across multiple tools.
The query profile now contains the origin of the error such as COORDINATOR or EXECUTOR in addition to the error type that is prefixed to the error message. In case of "OUT_OF_MEMORY ERROR:" error types, the type of memory causing the error such as HEAP or DIRECT_MEMORY and the additional information about the current memory usage can now be seen in the verbose section of the error in the profile.
You can now use MongoDB as a data source in Dremio Cloud in Azure. For details, see MongoDB.
Call failures to the JTS when sending an intermediate executor profile are now ignored.
Reliability for Dremio coordinators has improved.
Privileges are now available for folders in Arctic catalogs, and CREATE FOLDER, CREATE TABLE, and CREATE VIEW privileges have been added for Arctic catalogs. Privileges are also now inherited for objects in Arctic catalogs.
Support has been added for reading Apache Iceberg tables with equality deletes.
The CSV reader now uses direct memory instead of heap memory.
Queries now succeed even if telemetry storage fails. While a query is running, the executors and the coordinator send telemetry about the query execution to the JTS, which is written to a persistent store when the query completes or fails. This incomplete telemetry is indicated in the Job Details page for transparency. is_profile_incomplete
has been added in the system.project.jobs
table to indicate the profile status and incomplete data.
In Enterprise edition, you can now select a secrets management option in place of existing secrets/password fields inside of the source creation and source edit dialogs.
Out-of-the-box observability metrics are now available for user activity and jobs such as most active users, longest running jobs, most queried datasets, and more.
If the job profile results are incomplete, you are notified and the options to download the profile and see the raw profile are unavailable.
A Record fetch size has been added as a parameter in the settings for Snowflake sources. The default fetch size is 2000.
Arctic source pages are accessible for commits, tags, and branches. After you open an Arctic source, you can see for settings in the top right of the Datasets page actions.
When creating an organization, you have additional role options and can select multiple roles.
A new sys.reflection_lineage
table function lists all reflections that will be refreshed if a refresh is triggered for a particular reflection.
Reflection refreshes can now be configured based on a schedule. You can pick a time of day (UTC) and days of the week to refresh reflections for sources and tables.
Rate limits have been added to the Jobs service API on both the UI and Public API.
The authorTimestamp
in SHOW LOGS
has changed from VARCHAR to the dateTime data type.
To get a count of the number of rows in a table, Dremio now requests an estimated document count rather than aggregating the document. As a result, Dremio can retrieve the count more quickly.
Catalog maintenance tasks are introduced for controlling and preventing duplicate dataset versions from being created by API calls. Two scheduled tasks now run daily:
- Trimming the lists to a maximum of 50 records that are subject to a 30-day time-to-live (TTL) or the TTL value that is configured for jobs.
- Removing temporary dataset versions generated by jobs that are older than 30 days or according to the configured TTL.
DX-87659, DX-87549
Memory tracking issues that would cause queries to be cancelled due to exceeded the memory limits are fixed (with Memory Arbiter enabled and high memory utilization on the node).
Refresh Tokens Lifetime for Tableau is extended to 90 days to handle offline use cases like extracts.
Promoted datasets with inconsistent partition depth no longer occasionally throw an ArrayIndexOutOfBoundsException
when filtering against deeper partitions.
A client pool has been added for more performant concurrent Hive metastore operations. Pool size can be controlled by store.hive3.client_pool_size
, if set to 0 pooling is disabled.
Implementation has been added for the AWS Glue Data Catalog to pull and use Lake Formation tag policies. By default, this feature is turned off.
Commons-compress has been updated to version 1.26.1 [CVE-2024-25710] to address potential security issues.
An issue with slot assignment for preview engines is now fixed by adding RDBMS sources and metadata refresh for the same source type. Preview engines no longer have assigned slots that cannot be released, and new queries can get free slots and no longer have timeout errors.
Dremio now uses multiple writers in parallel for non-partitioned table optimization. The small files generated during the writing are combined by another round of writing with a single writer.
Reflections have been fixed in the following ways:
- Reflections containing temporal functions such as NOW and CURRENT_TIME are no longer incrementally refreshed, potentially producing incorrect results. REFRESH REFLECTION jobs for reflections containing these dynamic functions are now full refreshes. DX-89451
- Reflection refresh jobs no longer show zero planning time when the refresh is incremental. DX-87548
- A snapshot-based incremental reflection refresh for unlimited-split datasets on Hive no longer results in excessive heap usage due to metadata access during the reflection refresh. DX-88194
Query profiles no longer show planning phases twice.
ROLE
and USER
audit events are now available in the sys.project.history.events
table in the default project of an organization.
An issue no longer occurs if Lake Formation tag policies are present, but there are no Lake Formation tags defined on a certain table.
To prevent conflicts between SLF4J 1.x and 2.x, the Dremio JDBC driver no longer exposes the SLF4J API and uses the java.util.logging
(JUL) framework to log messages. The application can be configured for the parent logger for the driver by using java.sql.Driver#getParentLogger()
or directly using java.util.logging.Logger#getLogger("com.dremio.jdbc")
.
For AWS standard edition, users no longer see an unsupported error when clicking on a dataset to query the dataset.
Dremio blocks view creation if the view has a cyclic dependency on itself.
Hash join support structures are reallocated when they are insufficient for incoming batch.
If the wiki editor is empty when generating a summary, it is now automatically plugged into the wiki editor.
An intermittent failure when retrieving a wiki has been fixed.
For tables and views in a catalog, table and view owners need to have the USAGE privilege on the catalog to retrieve and create reflections. Previously, table or view ownership was sufficient to retrieve and create reflections using the reflections API.
accelerationNeverRefresh
and accelerationNeverExpire
are properly populated in /api/v3
for sources.
The performance of loading the data visualization for jobs on the Monitoring page has improved.
The UI no longer breaks when clicking to set the refresh schedule for certain source types.
Privileges have been updated in the following ways:
- If you have only the ALTER privilege on a dataset and no privileges on the catalog, you can open the folder and see the dataset, but you cannot edit the dataset or run the query. DX-87891
- If you have only the SELECT privilege on a dataset and no privileges on the catalog, you can open the folder, see the dataset, and run a query on the dataset but you cannot edit the dataset. DX-87891
- Users who do not belong to the ADMIN role cannot view the User filter or the list of users on the Jobs page. DX-87660
- The view owner is now properly listed in the Dremio console after the owner is updated. DX-88705
The Table Cleanup tab in the catalog settings sidebar is no longer hidden when a user who belongs to the ADMIN role is viewing the Catalog Settings page with a non-admin role.
The SQL Runner has improved in the following ways:
- When you run a query in the SQL runner, the page no longer briefly displays the previous query's results. DX-83509
- The results of previously run queries now load much more quickly. After you open a saved script in the SQL Runner, the results are automatically displayed in a summarized format if at least one job in the script has successfully completed. To load the results of a specific query, select the query tab above the results table. DX-90110,DX-90627
- Running a query and attempting to save it as a view no longer causes the results to disappear. DX-86266
- Switching between the tabs in the SQL editor now correctly displays the job type. DX-89787
- Script names are no longer prevented from being saved after users rename a tab and edit the SQL content. DX-86751
- When expanding the large data field in the SQL Runner by using the ellipsis (...), the results are now responsive when the data includes DateTime objects. DX-86541
- There is no longer a data correctness issue where joins with non-equality conditions and join conditions using calculations would sometime introduce duplicate rows (while respecting desired filtering properties) into the result set. DX-90720
- Background threads no longer run when a query in the SQL Runner is cancelled or fails. DX-85812
Adding a folder to the primary catalog now uses the references of the primary catalog rather than the selected source. When adding a folder to the primary catalog using the + icon, the folder is now correctly added to the primary catalog and not to the selected Arctic source.
The Details panel is no longer blank when you open the panel from the hover menu inside of an Arctic source.
The dataset metadata card no longer incorrectly opens versioned views from the Job Details page.
Messaging has been improved for:
- notifying if the top level space or source doesn't exist during view creation DX-85784
- successfully updating script privileges when using
GRANT TO ROLE
orGRANT TO USER
DX-88527 - failed queries with row-access and column-masking policies using reflections DX-88480
- failed metadata cleaning due to the expiration of snapshots DX-69750
- switching between tabs in the SQL Runner DX-87980
- originating from Hive3 plugin engine for the cases where other source types are served, such as Hive 2.x or AWS Glue. DX-87596
- notifying you if the catalog for the source has been deleted or cannot be found when you attempt to use a source DX-65235
- attempting to create a table or view that has the same path as an existing folder DX-86880
Merge action in the Dremio console no longer shows the user ID instead of the user email in the commit.
The SKIP_FILE
option for the COPY INTO
SQL command no longer fails to handle Parquet file corruption issues if the issues are in the first page of a row group.
Trailing semicolons that terminate an OPTIMIZE TABLE
statement are no longer marked as an error in the SQL Runner.
Scalar UDFs no longer returns incorrect results in some cases.
A reliability issue in ARRAY_AGG
has been fixed.
DML no longer breaks after a table is recreated and merged.
You can now perform DML queries when no context is selected.
April 16, 2024
Azure images for Dremio Cloud executors have been upgraded from CentOS 8.5 Gen 2 Linux to the Ubuntu 22.04 Linux distribution.
April 2, 2024
Enabled the memory arbiter by default in order to monitor the usage of four key operators: HASH_AGGREGATE, HASH_JOIN, EXTERNAL_SORT, and TOP_N_SORT. This usage is monitored across all queries running on an executor to improve how the executor utilizes its direct memory and to reduce OutOfMemoryException errors.
- If the memory arbiter detects that the memory usage is too high, then the memory usage will be reduced in these two ways:
- Starting with the biggest consumers, some of these operators will need to reduce their memory usage mainly by spilling to disk.
- Memory allocations will be blocked.
Enabled HASH_JOIN to spill to disk by default when the memory allocated for a query is fully utilized.
Added support for column mapping within Delta Lake tables, effectively supporting minReaderVersion 2.
Improved partition pushdowns for Iceberg tables when queries use datetime filters that include a function on a column.
Improved coordinator startup times by allowing Dremio to serve queries while the materialization cache is loading in the background. Some queries might not be accelerated during this time.
Made it possible find out the oldest reflection data used by accelerated queries by looking for "Last Refresh from Table" in the job summary of a raw profile or by querying the system table SYS.PROJECT.REFLECTIONS and looking for last_refresh_from_table
in the output.
Disabled C3 caching during loads of Parquet source files via the COPY INTO operation, thereby reducing cache contention with other query workloads.
Reduced the heap memory used by the SORT operator.
Improved reliability and memory efficiency for Dremio coordinators.
Reading Iceberg tables with positional deletes no longer causes an IndexOutOfBoundsException.
Improved Dremio's capabilities for concurrent DML operations on Iceberg tables and improved error messaging for concurrent load failures.
Updated the Arrow package version to 14.0.2 to include Dremio Arrow fixes, and to include new features and fixes from Apache Arrow.
Added support for Version 15 of the Arrow Flight JDBC driver. You can download the driver from here.
Added support for the copy_errors()
table function on Parquet tables.
Added support for limiting access to specified databases on Glue sources.
Improved the Projects API in these two ways:
- The Project object now includes the
lastStateError
object for projects for which Dremio encounters an "invalid credentials" error. - The Projects API can be used to update project credentials.
DX-65288
Added to Reflection Summary objects of the Reflection API and the SYS.PROJECT.REFLECTIONS table the error message that explains the most recent failure of a refresh of a reflection. No message appears if no refresh has yet been attempted, no failure has occurred, or a successful refresh has followed a failed one.
Reflection recommendations are now associated with the corresponding job IDs.
Added the parameters isMetadataExpired
and lastMetadataRefreshAt
to the Table and View objects of the Catalog API. Now, when either of these two methods is called and a table or view has stale or no metadata, there is no automatic refresh of the metadata:
GET /v0/projects/{project-id}/catalog/{id}*
GET /v0/projects/{project-id}/catalog/by-path/{path}
Instead, users can look at the values for the two new parameters and decide whether to invoke a refresh by calling this method:
POST /v0/projects/{project-id}/catalog/{id}/refresh
Changed the tabs in the SQL runner to display the most recent results of a query, if the results are available from the job history, without the user having to run the query again.
Added support for editing the credentials of projects that use AWS or Azure. This can be done in the Project Storage section of a project's settings.
Privilege changes are processed more quickly in the Dremio console.
Added a check to determine whether users running the COPY INTO command have SELECT privileges on either the source storage location specified in the FROM clause or on each individual source file mentioned in the FILES clause.
Performance is improved and memory consumption is reduced for some INFORMATION_SCHEMA queries that filter on TABLE_NAME or TABLE_SCHEMA.
Fixed planning errors that resulted from accessing views on which a reflection that included the CONVERT_FROM()
function was defined.
Fixed an issue that caused an additional empty subnet field to appear in the second step of the process for creating the first project of an organization. This field appeared if a cloud tag was added during the first step.
Fixed an issue that allowed reflections to be created when their definitions included UDFs that contained context-sensitive functions.
Fixed an issue that prevented full exception stack traces from being provided for errors generated by queries that included an ambiguous column name.
Queries are no longer cancelled due to exceeding memory limits while spilling during a SORT operation.
Fixed an issue with case sensitivity that would lead to delayed processing of inherited privileges.
Fixed an issue with orphaned materialization datasets in the catalog due to incremental reflection refreshes that were not writing any data.
When an incremental reflection refresh is skipped due to no data changes, there is no longer an issue where the last refresh finished, causing the last refresh from the table and last refresh duration not to update in the reflection system tables and reflection summary.
Fixed an issue that caused filters on scans of source tables on MongoDB to use incorrect regex.
Ensured that group policy grants are respected in AWS Lake formation when Dremio is used with Okta.
Fixed an issue that occurred if "All tables" was selected during AWS Lake formation and the granting of a new permission that was meant to apply to all tables within the selected database.
Fixed an issue that caused the Save as View option in the SQL editor not to work after the option had already been used once in a single session.
Fixed an issue that prevented the Details icon from appearing for items in lists of sources or lists of spaces in the Dremio console.
Fixed an issue preventing users from accessing the Edit wiki button in the Details Panel.
Navigating between datasets using the lineage graph in the Dremio console no longer results in a message about unsaved changes.
Fixed an issue that caused the maximum number of scripts that could be saved in the Dremio console to be 100, not 1,000.
Fixed an issue that caused the creation of a new branch to update the context of the SQL Runner automatically.
Fixed an issue that prevented attempts to join datasets from the SQL Runner not to correctly default to the Custom Join tab when no recommended join existed.
Fixed an issue that caused the Delete Organization dialog to appear after an organization had been renamed and the new name saved.
Fixed an issue that enabled a Save button when an added subnet field had not been filled in.
Fixed an issue that caused the AWS Secret Access Key field to appear blank even when a key was specified. Now, the masked key is visible.
Fixed the handling of SQL functions, such as LOWER, UPPER, and REVERSE, in queries on system tables.
Made more concise the error messages for schema mismatch errors that occur for UNION ALL operators.
Dremio no longer caches CURRENT_DATE_UTC and CURRENT_DATE during query planning, which was causing incorrect results. As a result, queries that use CURRENT_DATE_UTC and CURRENT_DATE have some performance latency in favor of accurate results.
Fixed an issue that caused the SQL function APPROX_COUNT_DISTINCT to return null instead of 0 in some cases.
February 26, 2024
A valid location has been provided for DoGet
requests, resolving a compatibility issue with the Arrow Flight JDBC 15 driver and ADBC driver.
February 12, 2024
Privilege changes are processed more quickly in the Dremio console.
February 1, 2024
An exception error no longer occurs when you run a query or create a view with an ambiguous column name.
January 31, 2024
Incremental refreshes can be performed on reflections that are defined on views that use joins.
The Reflection Recommender gives recommendations to users with complex queries and deep semantic layers for better performance and predictable matching. The queries for which default raw reflections can be recommended must run against one or more views that match certain criteria.
You can now use MongoDB as a data source in Dremio Cloud in AWS. For details, see MongoDB.
For Azure Blob Storage and Data Lake Gen 2 sources, checksum-based verification is enabled to ensure data integrity during network transfers.
You can click Close Others to close all tabs besides the active tab in the SQL editor.
The ARRAY_FREQUENCY
function is now supported.
Creating a raw reflection on a dataset on which no reflections are already defined no longer creates an aggregation reflection.
To alter the reflections on a view or table, the user or role must have the privilege ALTER_REFLECTION
on it and also have the USAGE
and COMMIT
privileges on the Arctic catalog.
Query planning times are shorter during the metadata validation phase due to view schema learning.
There is no longer an exception during the planning of queries on views that use the INTERVAL
data type.
Queries against Iceberg tables with positional deletes no longer fail with an error like “the current delta should always be larger than the amount of values to skip."
Unneeded columns are now trimmed from JDBC pushdowns.
The performance of health checks of AWS Glue data sources has been improved with checks of the state of the metastore and attempts to retrieve databases with a specified maximum result limit for 1.
The successful generation of labels and wikis no longer requires an engine to be running.
Selectively-run queries will be highlighted as errors, if they fail.
The dialog that explains a query has failed should no longer appear when you are switching between SQL tabs.
When adding a new Arctic catalog source fails, the error message now provides detailed information about the specific error.
Previously, if you used a statement in your query to set a schemapath to an Arctic source and folder, then the table or view validation would fail. Now, you can set the context to an Arctic source that includes any number of folders.
January 16, 2024
Reflections on views that join two or more anchor tables (Apache Iceberg tables and certain types of datasets in filesystem sources, Glue sources, and Hive sources) can now be refreshed incrementally.
Dremio now uses Micrometer-based metrics. Existing Codahale-based metrics are preserved and include the tag {metric_type="legacy"}
.
Executor metrics tags now include engineId
and subEngineId
.
You can use the Recommendations API to submit job IDs of jobs that ran SQL queries, and receive recommendations for aggregation reflections that can accelerate those queries. See Recommendations for more information.
These terms were added to the list of reserved keywords: JSON_ARRAY
, JSON_ARRAYAGG
, JSON_EXISTS
, JSON_OBJECT
, JSON_OBJECTAGG
, JSON_QUERY
, and JSON_VALUE
.
The following words were incorrectly made reserved keywords: ABSENT
, CONDITIONAL
, ENCODING
, ERROR
, FORMAT
, PASSING
, RETURNING
, SCALAR
, UNCONDITIONAL
, UTF8
,
UTF16
, UTF32
.
Fix a bug causing archived sonar projects not to appear for a user on the Sonar Projects page immediately after that user received the Admin privilege.
A NullPointerException could be returned when a row count estimate could not be obtained.
The tutorials that are accessed from the left navigation bar are available only to the creators of organizations, not to all users of organizations.
The settings for configuring a new catalog no longer appear until the cloud or type of cloud is chosen.
The Add Column, Group By, and Join buttons could be disabled if the SELECT command that defined a view was run and that command ended in a semicolon.
If you saved a new view in the SQL Runner and then re-opened the SQL Runner, the view that you had just created would still be present.
For some types of data sources, the generation of a wiki page would fail.
The Save button for reflections defined on views in spaces would be enabled for public users who have only SELECT, EDIT, and VIEW REFLECTION privileges. Such users still were correctly prevented from modifying reflections, as clicking Save did nothing.
Reflection management orphaned reflection materialization tables that were in the KV store. These tables would never get cleaned up and cause the KV store to become larger than necessary.
Querying Apache Druid tables containing large amounts of data could cause previews in the SQL Runner to time out.
All columns were being sent in JDBC predicate pushdowns.
Queries with correlated subqueries could return incorrect results.
An exception occurred when Dremio tried to get an estimate of the row count for PostgreSQL tables.
Opening the SQL Runner from the Details page of a table caused the SQL Runner to open with the SQL editor hidden in the new tab and in all open tabs.
Scrolling through phases and operators in a visual profile was sometimes jumpy.
Users without permission to edit a view in an Arctic source were able to access the view's SQL definition if a direct URL to the Detail page for the view was provided by a user who did have edit permission.
The wrong branch could become active after you refreshed the SQL Runner page and then clicked on the breadcrumbs at the top of that page.
If you clicked a view or a table, ran the generated SELECT *
statement in the SQL Runner, and then clicked the Edit button in the dataset details on the right, the SQL Runner would not be refreshed with the DDL for creating the view or table. The SQL statement and successful/failed query in the SQL runner will remain in the editor page when navigating to a dataset.
In API requests to create a new project, the catalogName
body parameter is now required.