Changelog
This changelog provides a detailed record of the previous 12 months of updates and enhancements we have made to improve your Dremio Cloud experience.
October 30, 2024
Running a SELECT COUNT(*) query now uses Iceberg metadata instead of scanning the entire Iceberg table to return the total number of rows in a table.
For AWS accounts, fixed an issue where the Save button is disabled while editing the configuration in the catalog settings.
Fixed an issue that could prevent users from editing project settings for projects created using an AWS cloud.
Fixed a rare issue where decorrelating a subquery with an EXISTS statement and an empty GROUP BY clause could result in incorrect data.
October 16, 2024
You can now access the Arctic UDFs via API which supports CRUD actions.
These terms were added to the list of reserved keywords: CLUSTER
and CLUSTERING
.
Fixed an issue where file handles (and HTTP connections) were left opened after reading JSON commit logs for Delta tables within a AWS Glue Data Catalog.
Fixed an issue that could prevent a user from scrolling through the wiki content in the Details tab on the Datasets page.
Fixed an issue with "Go to Table" functionality on the Datasets page that could cause the table definition to be blank on the Data tab when multiple partitions from the same column are added to an Arctic table.
Dremio will now notify you when a view's metadata is out-of-date due to schema changes in the underlying views or tables. The notification will appear on the Data panel in the SQL Runner and in the Details and Lineage tabs on the Datasets page.
Fixed an issue that could cause query results to appear in a new tab when cached results are loading in the SQL Runner.
Creating a new tab while a script is executing will now cause a confirmation dialog to appear in the SQL Runner.
Fixed an issue that prevented non-admin users from saving a view using the Save as View button in the SQL Runner.
The Start Time filter on the Jobs page no longer updates to Custom after a user selects a start time filter, leaves for a short time, and then comes back to the page.
The Visual Profile tab on the Jobs page will now show the correct error message when a visual profile cannot be generated.
When hovering over the tooltip for a reflection score on the Reflections page, the daily query accelerated value will be rounded to the nearest integer.
Fixed a NullPointerException (NPE) that could cause VACUUM jobs for reflections to fail.
HASH_JOIN will now randomize the distribution if there are nulls being generated by a join condition in order to avoid sending the data to the same thread and eventually reduce the skew.
September 23, 2024
Fixed an issue that could occur when attempting to access datasets in the Data panel in the SQL Runner, resulting in a "Something went wrong" error message.
Fixed an issue that could cause views to not save properly for non-admin users when clicking the Save as View button in the SQL Runner.
September 20, 2024
In Enterprise edition, members of the admin role can now configure an OpenID Connect (OIDC) identity provider for authentication under Organization Settings on the Authentication page or using the Identity Providers API. This new authentication method allows organizations to configure SSO with OIDC-compliant identity providers.
You can now connect to Vertica as a source in Dremio.
Azure regions East US 2 and West US 2 have been added for Dremio Cloud.
Create user-defined functions (UDFs) to extend the native capabilities of Dremio’s SQL dialect and reuse business logic across queries. Because UDFs are native, first-class entities in the Arctic catalog, you can seamlessly experiment on and change UDFs using Arctic's branching capabilities.
New SQL commands have been added for UDFs: CREATE FUNCTION, DROP FUNCTION, DESCRIBE FUNCTION, and SHOW FUNCTIONS. UDFs can also be used in SELECT statements.
Fixed an issue where queries could be stuck in planning and accumulate until a coordinator restart is required.
Resolved an issue with queries against AWS Glue that were failing due to errors when loading an endpoints.json partitions
file.
Fixed an issue where a reflection is given a score of 0 if an error occurs while calculating the score. Now the score will be empty instead of 0.
When no new data is read during REFRESH REFLECTION jobs, the snapshot IDs of the datasets and reflections that they depend on are shown in the Refresh Decision section of the query profile.
Improved logout functionality.
The Edit Rule dialog now auto-populates with information from the existing rule.
You can now open the Details Panel from the options menu on the Datasets page.
The result summary table now sorts cached query results in the summary table on the SQL Runner page in the order that the queries are executed.
You can now see the selected value for a reflection's partition transformation in the reflections editor.
Fixed a compilation issue that could occur when a window function is used with an ARRAY type column.
Fixed an issue that could occur when complex types are returned when splitting a function such as ARRAY_COMPACT.
Fixed an issue that could prevent a reflection score from being provided when running USE to set the query context.
Fixed an issue where a failed reflection could show an incorrect record count and size in the sys.reflections
system table.
Fixed an issue that could cause ANALYZE TABLE to fail when table column names contained reserved keywords.
September 10, 2024
Fixed an intermittent issue that could cause project creation to fail with a ProjectConfigServiceException
. Project creation is no longer prevented due to an interruption when an asynchronous source is being created or updated, causing the source to not update properly.
September 3, 2024
Fixed an issue that could result in a leak from an unclosed connection in Microsoft SQL Server, Oracle, or Dremio cluster data sources.
Fixed an issue that could cause VACUUM CATALOG to fail with a ContainerNotFoundException
. Also fixed a bug that could cause VACUUM CATALOG to fail with an IllegalArgumentException
if a view is created in an Arctic catalog.
August 22, 2024
Dremio now supports writes using merge-on-read in the Apache Iceberg table properties, which creates positional delete files and optimizes DML operations.
A reflection score shows the value that a reflection provides to your workloads based on the jobs that have been executed in the last 7 days.
For reflections on Iceberg tables, a new type of refresh policy is available. You can now automatically refresh reflections for underlying tables that are in Iceberg format when new snapshots are created after an update.
When reflection refresh jobs fail, Dremio now retries the refresh according to a uniform policy.
You can authenticate to a Snowflake source using key pair authentication.
User impersonation is now supported for Microsoft SQL Server sources.
OPTIMIZE TABLE now supports Iceberg tables with equality deletes.
Mapping table columns to the corresponding Parquet columns has been improved for Iceberg tables that are created from Parquet files and have columns without IDs.
Fixed an issue with long calls to AWS Glue sources that could result in a deadlock, preventing the Glue database from appearing as a source in the Dremio console and privileges granted to roles and users from applying properly to that source.
Fixed an issue that could prevent reflections with a row-access or column-masking policy from accelerating queries after an upgrade.
Automatically generated reflection recommendations now appear only if they meet a minimum threshold of value to your workloads.
In the reflections editor, the Refresh Now button no longer appears for failed reflections.
Clicking on a dataset on the Datasets page or clicking the Open Results link on the Job Overview page creates a new tab that is not automatically saved as a script.
Fixed an issue that could prevent reflections from being created for queries that contain an OVER clause with a specified RANGE.
Reduced memory usage when SELECT statements are run from the information schema by adjusting the page size parameter for pagination.
Fixed an issue that could cause the CURRENT_TIME function to return incorrect data when a user's timezone is defined.
Improved the query performance for VACUUM TABLE when using EXPIRE SNAPSHOTS.
Fixed an issue that could prevent partition columns from being applied in INSERT and CREATE TABLE AS statements.
August 12, 2024
Fixed a performance issue that affected queries containing a window function and a large number of batches.
MIN_REPLICAS
and MAX_REPLICAS
are no longer considered reserved keywords for SQL queries.
July 31, 2024
You can now use role-based access control (RBAC) privileges to restrict users and groups from accessing folders and their contents. With this change, admin users must explicitly grant visibility of folders and their contents to users and roles on the Arctic catalog as described in Arctic Privileges. To revert to the previous “open by default” behavior in which all objects are visible to all users in the PUBLIC role, see Inheritance.
For a given query with views, the reflection recommender now provides an aggregation reflection recommendation if possible instead of only default raw reflection recommendations.
AWS Glue lake formation permission cache can now be invalidated by users on demand by using ALTER SOURCE or the Source API. The lake formation tag policy support is also enabled by default.
Results caching improves query performance for non-UI queries with a result set that is less than 20 MB by reusing results for subsequent jobs with the same deterministic query and without underlying dataset changes. To use this feature, you must configure the time-to-live (TTL) rule in your project store to clear the cache.
Improved query planning time for over-partitioned tables with complex partition filters.
A query with an inner join can now match with reflections that contain outer joins.
Added a new Dataset API endpoint, POST /dataset/{id}/reflection/recommendation/{type}
, for retrieving reflection recommendations by reflection type for a dataset.
The Catalog API Privileges endpoint is deprecated. We expect to remove it by July 2025.
In place of the Privileges endpoint, use the Catalog API Grants endpoint to retrieve privileges and grantees on specific catalog objects.
You can click Generate in the reflections tab to get a suggestion for creating an aggregation reflection. Statistics are no longer automatically collected and suggestions are generated when you open the reflections editor.
sys.project.pipe_summary
is a new system table that summarizes high-level statistics for autoingest pipes and is only accessible to admins.
The flow of queries is no longer coupled with query telemetry, which means that failure scenarios in the flow could affect query completion rates. Queries now succeed despite any failures with query telemetry processing or JTS availability, even in the case of incomplete profile information.
Fixed an issue with concurrent dataset modifications that could cause jobs to hang during the metadata retrieval or planning. An inline metadata refresh is now retried automatically after a failure due to a concurrent source modification.
Fixed a bug for complex queries that could result in an error message about the code being too large.
Reflections have been fixed in the following ways:
-
The default selected columns for raw reflections no longer fail to include all columns of a dataset.
DX-89497 -
Queries no longer fail if an underlying default raw reflection becomes invalid for substitution against the view. The workaround is to disable or refresh the reflection.
DX-85139
If an autoingest pipe job has been canceled by a query engine, the pipe job now retries to ingest the canceled batch.
Fixed the following NullPointerExceptions (NPEs) that could occur:
-
When failed jobs details are fetched.
92934 -
When accessing large Delta Lake tables in metastore sources.
DX-67629 -
Where the schema for a Delta Lake table was not captured correctly, leading to a failure to query the table.
DX-92477 -
When running a DML statement on an accelerated table.
DX-91682
Queries no longer fail due to a ConcurrentModificationException when runtime filters are present.
Added a CONFIGURE BILLING privilege so that non-admin users can view and modify billing account data.
To prevent unexpected out-of-memory errors, the Parquet vectorized reader allocates only the necessary amount of memory for scanning deeply nested structures.
Fixed a performance issue for Iceberg tables that could occur when Dremio reads position delete files. Previously, a position delete file could be accessed multiple times by different scan threads. Now all delete rows are read once and joined with the data files.
Fixed a bug that could cause concurrent autopromotion of the same folder path to fail.
In the Dremio console, ideographic spaces now display as regular spaces in the results.
Fixed a bug in the SQL Runner where might not be visible in the Scripts panel for a script with a long name.
Fixed a bug where the commit history may not load for tables or views that reside in hyphenated folders.
The user avatar at the bottom of the left navigation bar now shows the user's first and last initials instead of the first two letters of their username.
When you are editing the preview engine in the Edit Engine dialog, the currently selected instance family is no longer shown in the notification at the top of the dialog.
Scripts have been fixed in the following ways:
-
Switching between scripts while a job is running no longer causes the job to appear in other tabs.
DX-92260 -
Opening a script and applying a transformation on a saved job should now work as expected.
DX-92754 -
Running a subset of a script now highlights the appropriate queries when switching between results tabs.
DX-92143
The reflection data in the job summary of a raw profile will now render successfully even when the accelerationDetails field is skipped.
CAST TIME AS VARCHAR now returns the result in 'HH:mm:ss.SSS' format.
You can now clear the context for the query session by running a USE command without any parameters.
The CONVERT_TIMEZONE function now works properly for Druid data sources.
LEAD and LAG functions with the window set to a value that is greater than 1 no longer produce incorrect results.
July 9, 2024
View schema learning now occurs only for queries that are issued from the Dremio console or reflection refresh jobs.
Queries no longer hang on coordinator startup when the materialization cache takes a long time to start up.
A raw profile is now available as soon as a job is in a running state.
Fixed a bug where duplicate rows could be returned when retrieving usage
objects.
ORDER BY expressions in a subquery should be removed automatically as long as the query does not have LIMIT or OFFSET parameters, although the returned sort order cannot be guaranteed. In this example, ORDER BY deptno
should be removed:
SELECT *
FROM emp
JOIN (SELECT * FROM dept ORDER BY deptno) USING (deptno)
Some databases like Postgres and Oracle support ORDER BY expressions, so you may see different results depending on the target of your query.
July 2, 2024
Reflection recommendations automatically generate for the top 10 most effective default raw reflections based on query patterns from the last 7 days. You can view these recommendations on the Reflections page in the Dremio console.
Added a retry mechanism when reflections are expanded into the materialization cache, which adds fault tolerance to coordinator upgrades and restarts.
User impersonation is now supported for Oracle sources.
The Privileges dialog is improved for managing sources, views, tables, and folders.
You can now bulk delete scripts.
You can specify a column as a MAP data type in CREATE TABLE.
You can use VACUUM CATALOG for Arctic sources on Azure.
Deleting a project in Standard edition no longer results in autoingestion being unavailable.
The usernames in Arrow Flight JDBC/ODBC and Legacy JDBC/ODBC jobs are now shown in the same consistent case regardless of the username case in the connection URL.
Fixed an issue that could introduce duplicate rows in the results for RIGHT and FULL joins with non-equality conditions and join conditions that use calculations.
Updated error messaging for creating or deleting a folder on non-branch references.
Updated the following library to address potential security issues:
- org.postgresql:postgresql to version 42.4.5 [CVE-2024-1597] DX-91055
When you query the information schema, you can now see only the tables and views that you have access to instead of all datasets.
Added a rule that pushes an aggregate below a join if the grouping key is also a join key.
All existing engines without an instance family have been backfilled to either m5d or ddv5 depending on the cloud vendor.
Correlated subqueries that include a filter that doesn't match any rows no longer result in an error message.
Reflection recommendations now occur when plan regeneration is required and the name of the dataset is not fully qualified and contains a period (for example, "arctic1"."@username@dremio.com".v1
).
When a dataset is created in a source, the dataset inherits its owner from the source. Inheritance no longer fails if the source owner is inactive; instead, the dataset owner is now set to the system user.
The author ID no longer appears as the author's name in the commit history after a branch is merged using a SQL command.
Dataset version sorting no longer results in incorrect "not found" error messages when listing datasets in the Dremio console.
Reflections with row and column access control now produce the correct results when algebraically matched.
The current owner of a script is now correctly displayed in the Dremio console.
Certain font ligatures are no longer displayed in the results table on the SQL Runner page.
Disabling Download Query Profiles for admin and users now correctly restricts users from downloading profiles.
The raw query profile has been improved to include Execution Resources Planned and Execution Resources Allocated planning phases to help with debugging execution-related issues.
Users who do not have the required privileges to view all user and role names when using the Dremio console to manage privileges can add privileges by entering users' and roles' exact names in the Add User/Role field.
You can now use the Secret Resource URL when adding an Oracle source, which could not be used previously in an Oracle source due to a "missing password" error.
In the Advanced view of the reflections editor, you can select the SQL functions to use as measures in the Measure column for aggregation reflections.
The listing of catalog items no longer times out due to a very large number of catalog objects. To address the issue, optional pageToken
and maxChildren
parameters have been added to the API endpoints for getting catalog entities with children by ID or by path.
Indexing the same JSON into CONVERT_FROM multiple times no longer leads to incorrect results.
June 5, 2024
The Dremio JDBC driver now supports parameters in prepared statements.
You can use autoingest pipes to set up and deploy event-driven data ingestion pipelines directly in Dremio. This feature is in preview for Dremio Cloud and supports Amazon S3 as a source.
The retention period of jobs history stored in Dremio has been reduced from 30 days to 7 days, which improves job search response times. Use the jobs history system table to get the jobs history for jobs that are older than 7 days.
DML and CTAS are supported for the query_label
workload management rule.
There are two new methods to start refreshing a reflection.
When an incremental refresh materialization is deprecated, you no longer see a DROP TABLE job in the job history but the reflection data is synchronously cleaned up as a part of reflection management.
For Azure projects, you can now create a table or view when the name of the table or view has a dot such as "arctic1"."@username@dremio.com".v1
.
Users (including admin users) can now use the Scripts API to manage scripts from API clients for migration, management during owner offboarding, and other purposes.
All write operations for Arctic views are written in the new Iceberg Community View format (V1). Existing views are still supported in the old format (V0), although any update to an existing view rewrites the view in the new format. Read operations are supported for both V0 and V1. To see which view format is being used, open the Details panel or metadata card for the view. For Dialect, the V0 views show DREMIO and V1 views show DremioSQL.
ON CONFLICT and DRY RUN clauses are now available for MERGE BRANCH.
New SQL commands have been added for autoingest pipes: CREATE PIPE, ALTER PIPE, DESCRIBE PIPE, and DROP PIPE.
When a reflection that depends on certain file formats (Iceberg, Parquet, ORC, or Avro) is due for a refresh and has no new data, a full refresh is no longer performed where data is read in the REFRESH REFLECTION job. Instead, only a new refresh is planned and a materialization is created, eliminating redundancy and minimizing cost for the reflection.
Default raw reflection matching can now be used during REFRESH REFLECTION jobs.
Reflections are no longer deleted when a reflection refresh fails due to a network error or the source being down.
Duplicate default raw reflection recommendations are no longer created when querying a view that contains joins.
When multiple jobs are submitted to the reflection recommender, the reflection recommender no longer errors out if some of the jobs are ineligible for recommendation. Instead, reflections are recommended for eligible jobs.
TBLPROPERTIES
(table properties) for Iceberg tables are now saved in Apache Hive.
Reading a Delta Lake table no longer fails with an error about an invalid Parquet file.
The AWS Lake Formation tag authorizer now considers database-level tags.
Dremio now honors workload management rules that contain the query_label
function.
When using an IAM role and attempting to add an AWS Glue source, you no longer see an error message about loader constraint violation due to AWS Glue authentication.
Reflections no longer incorrectly match into queries containing ROLLUP.
On the Organization page, hovering over Learn more for Arctic and selecting the Get Started with Arctic link opens the updated Getting Started with Dremio page.
During the signup process, the catalog is no longer missing in the CloudFormation Template (CFT) parameters if the CFT failed the first time and you click Rerun CloudFormation template.
If you delete a branch or a tag that you are currently on, you are now rerouted to the Data page for the default reference instead of seeing an error message.
Tooltips on the Catalog page are now displayed correctly on Firefox.
Dataset names are no longer truncated incorrectly.
An error message no longer appears when loading results of multiple jobs that executed on different engines.
Error messages that appear when a user tries to view the wiki of the folder for which they don't have privileges now describe the problem more clearly.
Creating a new script while on a script that displays an error message no longer causes the error message to persist.
You can now use decimals in ARRAY_REMOVE and ARRAY_CONTAINS functions.
NPE has been fixed when ARRAY_CONTAINS is used in a WHERE clause.
New line characters (\n) are supported in regex matching.
Incorrect splitting no longer occurs when the value contains UNICODE characters like 'á'.
May 22, 2024
The system_iceberg_tables
plugin now uses the project ID in the project store in order to isolate data for each project.
May 14, 2024
Integrating with AWS Lake Formation is now supported, which provides access controls for datasets in the AWS Glue Data Catalog and defines security policies from a centralized location that may be shared across multiple tools.
The query profile now contains the origin of the error such as COORDINATOR or EXECUTOR in addition to the error type that is prefixed to the error message. In case of "OUT_OF_MEMORY ERROR:" error types, the type of memory causing the error such as HEAP or DIRECT_MEMORY and the additional information about the current memory usage can now be seen in the verbose section of the error in the profile.
You can now use MongoDB as a data source in Dremio Cloud in Azure. For details, see MongoDB.
Call failures to the JTS when sending an intermediate executor profile are now ignored.
Reliability for Dremio coordinators has improved.
Privileges are now available for folders in Arctic catalogs, and CREATE FOLDER, CREATE TABLE, and CREATE VIEW privileges have been added for Arctic catalogs. Privileges are also now inherited for objects in Arctic catalogs.
Support has been added for reading Apache Iceberg tables with equality deletes.
The CSV reader now uses direct memory instead of heap memory.
Queries now succeed even if telemetry storage fails. While a query is running, the executors and the coordinator send telemetry about the query execution to the JTS, which is written to a persistent store when the query completes or fails. This incomplete telemetry is indicated in the Job Details page for transparency. is_profile_incomplete
has been added in the system.project.jobs
table to indicate the profile status and incomplete data.
In Enterprise edition, you can now select a secrets management option in place of existing secrets/password fields inside of the source creation and source edit dialogs.
Out-of-the-box observability metrics are now available for user activity and jobs such as most active users, longest running jobs, most queried datasets, and more.
If the job profile results are incomplete, you are notified and the options to download the profile and see the raw profile are unavailable.
A Record fetch size has been added as a parameter in the settings for Snowflake sources. The default fetch size is 2000.
Arctic source pages are accessible for commits, tags, and branches. After you open an Arctic source, you can see for settings in the top right of the Datasets page actions.
When creating an organization, you have additional role options and can select multiple roles.
A new sys.reflection_lineage
table function lists all reflections that will be refreshed if a refresh is triggered for a particular reflection.
Reflection refreshes can now be configured based on a schedule. You can pick a time of day (UTC) and days of the week to refresh reflections for sources and tables.
Rate limits have been added to the Jobs service API on both the UI and Public API.
The authorTimestamp
in SHOW LOGS
has changed from VARCHAR to the dateTime data type.
To get a count of the number of rows in a table, Dremio now requests an estimated document count rather than aggregating the document. As a result, Dremio can retrieve the count more quickly.
Catalog maintenance tasks are introduced for controlling and preventing duplicate dataset versions from being created by API calls. Two scheduled tasks now run daily:
- Trimming the lists to a maximum of 50 records that are subject to a 30-day time-to-live (TTL) or the TTL value that is configured for jobs.
- Removing temporary dataset versions generated by jobs that are older than 30 days or according to the configured TTL.
DX-87659, DX-87549
Memory tracking issues that would cause queries to be cancelled due to exceeded the memory limits are fixed (with Memory Arbiter enabled and high memory utilization on the node).
Refresh Tokens Lifetime for Tableau is extended to 90 days to handle offline use cases like extracts.
Promoted datasets with inconsistent partition depth no longer occasionally throw an ArrayIndexOutOfBoundsException
when filtering against deeper partitions.
A client pool has been added for more performant concurrent Hive metastore operations. Pool size can be controlled by store.hive3.client_pool_size
, if set to 0 pooling is disabled.
Implementation has been added for the AWS Glue Data Catalog to pull and use Lake Formation tag policies. By default, this feature is turned off.
Commons-compress has been updated to version 1.26.1 [CVE-2024-25710] to address potential security issues.
An issue with slot assignment for preview engines is now fixed by adding RDBMS sources and metadata refresh for the same source type. Preview engines no longer have assigned slots that cannot be released, and new queries can get free slots and no longer have timeout errors.
Dremio now uses multiple writers in parallel for non-partitioned table optimization. The small files generated during the writing are combined by another round of writing with a single writer.
Reflections have been fixed in the following ways:
- Reflections containing temporal functions such as NOW and CURRENT_TIME are no longer incrementally refreshed, potentially producing incorrect results. REFRESH REFLECTION jobs for reflections containing these dynamic functions are now full refreshes. DX-89451
- Reflection refresh jobs no longer show zero planning time when the refresh is incremental. DX-87548
- A snapshot-based incremental reflection refresh for unlimited-split datasets on Hive no longer results in excessive heap usage due to metadata access during the reflection refresh. DX-88194
Query profiles no longer show planning phases twice.
ROLE
and USER
audit events are now available in the sys.project.history.events
table in the default project of an organization.
An issue no longer occurs if Lake Formation tag policies are present, but there are no Lake Formation tags defined on a certain table.
To prevent conflicts between SLF4J 1.x and 2.x, the Dremio JDBC driver no longer exposes the SLF4J API and uses the java.util.logging
(JUL) framework to log messages. The application can be configured for the parent logger for the driver by using java.sql.Driver#getParentLogger()
or directly using java.util.logging.Logger#getLogger("com.dremio.jdbc")
.
For AWS standard edition, users no longer see an unsupported error when clicking on a dataset to query the dataset.
Dremio blocks view creation if the view has a cyclic dependency on itself.
Hash join support structures are reallocated when they are insufficient for incoming batch.
If the wiki editor is empty when generating a summary, it is now automatically plugged into the wiki editor.
An intermittent failure when retrieving a wiki has been fixed.
For tables and views in a catalog, table and view owners need to have the USAGE privilege on the catalog to retrieve and create reflections. Previously, table or view ownership was sufficient to retrieve and create reflections using the reflections API.
accelerationNeverRefresh
and accelerationNeverExpire
are properly populated in /api/v3
for sources.
The performance of loading the data visualization for jobs on the Monitoring page has improved.
The UI no longer breaks when clicking to set the refresh schedule for certain source types.
Privileges have been updated in the following ways:
- If you have only the ALTER privilege on a dataset and no privileges on the catalog, you can open the folder and see the dataset, but you cannot edit the dataset or run the query. DX-87891
- If you have only the SELECT privilege on a dataset and no privileges on the catalog, you can open the folder, see the dataset, and run a query on the dataset but you cannot edit the dataset. DX-87891
- Users who do not belong to the ADMIN role cannot view the User filter or the list of users on the Jobs page. DX-87660
- The view owner is now properly listed in the Dremio console after the owner is updated. DX-88705
The Table Cleanup tab in the catalog settings sidebar is no longer hidden when a user who belongs to the ADMIN role is viewing the Catalog Settings page with a non-admin role.
The SQL Runner has improved in the following ways:
- When you run a query in the SQL runner, the page no longer briefly displays the previous query's results. DX-83509
- The results of previously run queries now load much more quickly. After you open a saved script in the SQL Runner, the results are automatically displayed in a summarized format if at least one job in the script has successfully completed. To load the results of a specific query, select the query tab above the results table. DX-90110,DX-90627
- Running a query and attempting to save it as a view no longer causes the results to disappear. DX-86266
- Switching between the tabs in the SQL editor now correctly displays the job type. DX-89787
- Script names are no longer prevented from being saved after users rename a tab and edit the SQL content. DX-86751
- When expanding the large data field in the SQL Runner by using the ellipsis (...), the results are now responsive when the data includes DateTime objects. DX-86541
- There is no longer a data correctness issue where joins with non-equality conditions and join conditions using calculations would sometime introduce duplicate rows (while respecting desired filtering properties) into the result set. DX-90720
- Background threads no longer run when a query in the SQL Runner is cancelled or fails. DX-85812
Adding a folder to the primary catalog now uses the references of the primary catalog rather than the selected source. When adding a folder to the primary catalog using the + icon, the folder is now correctly added to the primary catalog and not to the selected Arctic source.
The Details panel is no longer blank when you open the panel from the hover menu inside of an Arctic source.
The dataset metadata card no longer incorrectly opens versioned views from the Job Details page.
Messaging has been improved for:
- notifying if the top level space or source doesn't exist during view creation DX-85784
- successfully updating script privileges when using
GRANT TO ROLE
orGRANT TO USER
DX-88527 - failed queries with row-access and column-masking policies using reflections DX-88480
- failed metadata cleaning due to the expiration of snapshots DX-69750
- switching between tabs in the SQL Runner DX-87980
- originating from Hive3 plugin engine for the cases where other source types are served, such as Hive 2.x or AWS Glue. DX-87596
- notifying you if the catalog for the source has been deleted or cannot be found when you attempt to use a source DX-65235
- attempting to create a table or view that has the same path as an existing folder DX-86880
Merge action in the Dremio console no longer shows the user ID instead of the user email in the commit.
The SKIP_FILE
option for the COPY INTO
SQL command no longer fails to handle Parquet file corruption issues if the issues are in the first page of a row group.
Trailing semicolons that terminate an OPTIMIZE TABLE
statement are no longer marked as an error in the SQL Runner.
Scalar UDFs no longer returns incorrect results in some cases.
A reliability issue in ARRAY_AGG
has been fixed.
DML no longer breaks after a table is recreated and merged.
You can now perform DML queries when no context is selected.
April 16, 2024
Azure images for Dremio Cloud executors have been upgraded from CentOS 8.5 Gen 2 Linux to the Ubuntu 22.04 Linux distribution.
April 2, 2024
Enabled the memory arbiter by default in order to monitor the usage of four key operators: HASH_AGGREGATE, HASH_JOIN, EXTERNAL_SORT, and TOP_N_SORT. This usage is monitored across all queries running on an executor to improve how the executor utilizes its direct memory and to reduce OutOfMemoryException errors.
- If the memory arbiter detects that the memory usage is too high, then the memory usage will be reduced in these two ways:
- Starting with the biggest consumers, some of these operators will need to reduce their memory usage mainly by spilling to disk.
- Memory allocations will be blocked.
Enabled HASH_JOIN to spill to disk by default when the memory allocated for a query is fully utilized.
Added support for column mapping within Delta Lake tables, effectively supporting minReaderVersion 2.
Improved partition pushdowns for Iceberg tables when queries use datetime filters that include a function on a column.
Improved coordinator startup times by allowing Dremio to serve queries while the materialization cache is loading in the background. Some queries might not be accelerated during this time.
Made it possible find out the oldest reflection data used by accelerated queries by looking for "Last Refresh from Table" in the job summary of a raw profile or by querying the system table SYS.PROJECT.REFLECTIONS and looking for last_refresh_from_table
in the output.
Disabled C3 caching during loads of Parquet source files via the COPY INTO operation, thereby reducing cache contention with other query workloads.
Reduced the heap memory used by the SORT operator.
Improved reliability and memory efficiency for Dremio coordinators.
Reading Iceberg tables with positional deletes no longer causes an IndexOutOfBoundsException.
Improved Dremio's capabilities for concurrent DML operations on Iceberg tables and improved error messaging for concurrent load failures.
Updated the Arrow package version to 14.0.2 to include Dremio Arrow fixes, and to include new features and fixes from Apache Arrow.
Added support for Version 15 of the Arrow Flight JDBC driver. You can download the driver from here.
Added support for the copy_errors()
table function on Parquet tables.
Added support for limiting access to specified databases on Glue sources.
Improved the Projects API in these two ways:
- The Project object now includes the
lastStateError
object for projects for which Dremio encounters an "invalid credentials" error. - The Projects API can be used to update project credentials.
DX-65288
Added to Reflection Summary objects of the Reflection API and the SYS.PROJECT.REFLECTIONS table the error message that explains the most recent failure of a refresh of a reflection. No message appears if no refresh has yet been attempted, no failure has occurred, or a successful refresh has followed a failed one.
Reflection recommendations are now associated with the corresponding job IDs.
Added the parameters isMetadataExpired
and lastMetadataRefreshAt
to the Table and View objects of the Catalog API. Now, when either of these two methods is called and a table or view has stale or no metadata, there is no automatic refresh of the metadata:
GET /v0/projects/{project-id}/catalog/{id}*
GET /v0/projects/{project-id}/catalog/by-path/{path}
Instead, users can look at the values for the two new parameters and decide whether to invoke a refresh by calling this method:
POST /v0/projects/{project-id}/catalog/{id}/refresh
Changed the tabs in the SQL runner to display the most recent results of a query, if the results are available from the job history, without the user having to run the query again.
Added support for editing the credentials of projects that use AWS or Azure. This can be done in the Project Storage section of a project's settings.
Privilege changes are processed more quickly in the Dremio console.
Added a check to determine whether users running the COPY INTO command have SELECT privileges on either the source storage location specified in the FROM clause or on each individual source file mentioned in the FILES clause.
Performance is improved and memory consumption is reduced for some INFORMATION_SCHEMA queries that filter on TABLE_NAME or TABLE_SCHEMA.
Fixed planning errors that resulted from accessing views on which a reflection that included the CONVERT_FROM()
function was defined.
Fixed an issue that caused an additional empty subnet field to appear in the second step of the process for creating the first project of an organization. This field appeared if a cloud tag was added during the first step.
Fixed an issue that allowed reflections to be created when their definitions included UDFs that contained context-sensitive functions.
Fixed an issue that prevented full exception stack traces from being provided for errors generated by queries that included an ambiguous column name.
Queries are no longer cancelled due to exceeding memory limits while spilling during a SORT operation.
Fixed an issue with case sensitivity that would lead to delayed processing of inherited privileges.
Fixed an issue with orphaned materialization datasets in the catalog due to incremental reflection refreshes that were not writing any data.
When an incremental reflection refresh is skipped due to no data changes, there is no longer an issue where the last refresh finished, causing the last refresh from the table and last refresh duration not to update in the reflection system tables and reflection summary.
Fixed an issue that caused filters on scans of source tables on MongoDB to use incorrect regex.
Ensured that group policy grants are respected in AWS Lake formation when Dremio is used with Okta.
Fixed an issue that occurred if "All tables" was selected during AWS Lake formation and the granting of a new permission that was meant to apply to all tables within the selected database.
Fixed an issue that caused the Save as View option in the SQL editor not to work after the option had already been used once in a single session.
Fixed an issue that prevented the Details icon from appearing for items in lists of sources or lists of spaces in the Dremio console.
Fixed an issue preventing users from accessing the Edit wiki button in the Details Panel.
Navigating between datasets using the lineage graph in the Dremio console no longer results in a message about unsaved changes.
Fixed an issue that caused the maximum number of scripts that could be saved in the Dremio console to be 100, not 1,000.
Fixed an issue that caused the creation of a new branch to update the context of the SQL Runner automatically.
Fixed an issue that prevented attempts to join datasets from the SQL Runner not to correctly default to the Custom Join tab when no recommended join existed.
Fixed an issue that caused the Delete Organization dialog to appear after an organization had been renamed and the new name saved.
Fixed an issue that enabled a Save button when an added subnet field had not been filled in.
Fixed an issue that caused the AWS Secret Access Key field to appear blank even when a key was specified. Now, the masked key is visible.
Fixed the handling of SQL functions, such as LOWER, UPPER, and REVERSE, in queries on system tables.
Made more concise the error messages for schema mismatch errors that occur for UNION ALL operators.
Dremio no longer caches CURRENT_DATE_UTC and CURRENT_DATE during query planning, which was causing incorrect results. As a result, queries that use CURRENT_DATE_UTC and CURRENT_DATE have some performance latency in favor of accurate results.
Fixed an issue that caused the SQL function APPROX_COUNT_DISTINCT to return null instead of 0 in some cases.
February 26, 2024
A valid location has been provided for DoGet
requests, resolving a compatibility issue with the Arrow Flight JDBC 15 driver and ADBC driver.
February 12, 2024
Privilege changes are processed more quickly in the Dremio console.
February 1, 2024
An exception error no longer occurs when you run a query or create a view with an ambiguous column name.
January 31, 2024
Incremental refreshes can be performed on reflections that are defined on views that use joins.
The Reflection Recommender gives recommendations to users with complex queries and deep semantic layers for better performance and predictable matching. The queries for which default raw reflections can be recommended must run against one or more views that match certain criteria.
You can now use MongoDB as a data source in Dremio Cloud in AWS. For details, see MongoDB.
For Azure Blob Storage and Data Lake Gen 2 sources, checksum-based verification is enabled to ensure data integrity during network transfers.
You can click Close Others to close all tabs besides the active tab in the SQL editor.
The ARRAY_FREQUENCY
function is now supported.
Creating a raw reflection on a dataset on which no reflections are already defined no longer creates an aggregation reflection.
To alter the reflections on a view or table, the user or role must have the privilege ALTER_REFLECTION
on it and also have the USAGE
and COMMIT
privileges on the Arctic catalog.
Query planning times are shorter during the metadata validation phase due to view schema learning.
There is no longer an exception during the planning of queries on views that use the INTERVAL
data type.
Queries against Iceberg tables with positional deletes no longer fail with an error like “the current delta should always be larger than the amount of values to skip."
Unneeded columns are now trimmed from JDBC pushdowns.
The performance of health checks of AWS Glue data sources has been improved with checks of the state of the metastore and attempts to retrieve databases with a specified maximum result limit for 1.
The successful generation of labels and wikis no longer requires an engine to be running.
Selectively-run queries will be highlighted as errors, if they fail.
The dialog that explains a query has failed should no longer appear when you are switching between SQL tabs.
When adding a new Arctic catalog source fails, the error message now provides detailed information about the specific error.
Previously, if you used a statement in your query to set a schemapath to an Arctic source and folder, then the table or view validation would fail. Now, you can set the context to an Arctic source that includes any number of folders.
January 16, 2024
Reflections on views that join two or more anchor tables (Apache Iceberg tables and certain types of datasets in filesystem sources, Glue sources, and Hive sources) can now be refreshed incrementally.
Dremio now uses Micrometer-based metrics. Existing Codahale-based metrics are preserved and include the tag {metric_type="legacy"}
.
Executor metrics tags now include engineId
and subEngineId
.
You can use the Recommendations API to submit job IDs of jobs that ran SQL queries, and receive recommendations for aggregation reflections that can accelerate those queries. See Recommendations for more information.
These terms were added to the list of reserved keywords: JSON_ARRAY
, JSON_ARRAYAGG
, JSON_EXISTS
, JSON_OBJECT
, JSON_OBJECTAGG
, JSON_QUERY
, and JSON_VALUE
.
The following words were incorrectly made reserved keywords: ABSENT
, CONDITIONAL
, ENCODING
, ERROR
, FORMAT
, PASSING
, RETURNING
, SCALAR
, UNCONDITIONAL
, UTF8
,
UTF16
, UTF32
.
Fix a bug causing archived sonar projects not to appear for a user on the Sonar Projects page immediately after that user received the Admin privilege.
A NullPointerException could be returned when a row count estimate could not be obtained.
The tutorials that are accessed from the left navigation bar are available only to the creators of organizations, not to all users of organizations.
The settings for configuring a new catalog no longer appear until the cloud or type of cloud is chosen.
The Add Column, Group By, and Join buttons could be disabled if the SELECT command that defined a view was run and that command ended in a semicolon.
If you saved a new view in the SQL Runner and then re-opened the SQL Runner, the view that you had just created would still be present.
For some types of data sources, the generation of a wiki page would fail.
The Save button for reflections defined on views in spaces would be enabled for public users who have only SELECT, EDIT, and VIEW REFLECTION privileges. Such users still were correctly prevented from modifying reflections, as clicking Save did nothing.
Reflection management orphaned reflection materialization tables that were in the KV store. These tables would never get cleaned up and cause the KV store to become larger than necessary.
Querying Apache Druid tables containing large amounts of data could cause previews in the SQL Runner to time out.
All columns were being sent in JDBC predicate pushdowns.
Queries with correlated subqueries could return incorrect results.
An exception occurred when Dremio tried to get an estimate of the row count for PostgreSQL tables.
Opening the SQL Runner from the Details page of a table caused the SQL Runner to open with the SQL editor hidden in the new tab and in all open tabs.
Scrolling through phases and operators in a visual profile was sometimes jumpy.
Users without permission to edit a view in an Arctic source were able to access the view's SQL definition if a direct URL to the Detail page for the view was provided by a user who did have edit permission.
The wrong branch could become active after you refreshed the SQL Runner page and then clicked on the breadcrumbs at the top of that page.
If you clicked a view or a table, ran the generated SELECT *
statement in the SQL Runner, and then clicked the Edit button in the dataset details on the right, the SQL Runner would not be refreshed with the DDL for creating the view or table. The SQL statement and successful/failed query in the SQL runner will remain in the editor page when navigating to a dataset.
In API requests to create a new project, the catalogName
body parameter is now required.
December 14, 2023
You can now add an Azure private endpoint in the Azure portal when you connect your Azure account to Dremio Cloud or add a project to an organization. The outbound private endpoints are used to connect Dremio executors to the Dremio Cloud control plane over the Azure network.
The Dremio-to-Dremio connector is now supported in Azure.
Automated table cleanup to delete expired snapshots and orphaned metadata files is now supported for Iceberg tables in Arctic catalogs.
The algorithm that triggers a refresh of dependent reflections has been improved to prevent duplicate refreshes. The refresh operation now remains in a pending state until all direct and indirect dependences finish refreshing.
For reflections that are defined on Parquet datasets in S3 sources, Dremio can now automatically choose incremental refresh or full refresh.
Planning time for reflections has been substantially improved. The acceleration profile now contains a detailed breakdown of reflection normalization and substitution times.
The external token provider audit log now includes audit events for creating and updating BI applications.
The Clouds API now includes the privateEndpoints
parameter for specifying an Azure private endpoint.
You can now use tabs in the SQL Runner to work on multiple tasks simultaneously. All of your work in each tab is autosaved.
The Visual Profile now displays notable observations and potential problems for operators and phases. Users can use filters to control which operators are displayed.
The Visual Profile now shows the following runtime metrics: waitTimeSkew
, wallClockTimeSkew
, batchesProcessedSkew
, sleepingDuration
and cpuWaitTime
.
When users try to edit a deleted script, they will now see a confirmation dialog with the following options to prevent lost work: Discard, Copy SQL, and Save as script.
This update adds support for the following SQL functions: ARRAY_AGG
, ARRAY_APPEND
, ARRAY_DISTINCT
, ARRAYS_OVERLAP
, ARRAY_PREPEND
, ARRAY_SLICE
.
If you disable the Query dataset on click setting, the Datasets page does not include for tables and views. To query a dataset, click in the left navigation panel to open the SQL Runner or click for the dataset and select Query.
Users can now set privileges on folders with a .
character in their names and the tables these folders contain.
Iceberg metadata table functions no longer truncate the number of results returned to the maximum batch size set for exec.batch.records.max
.
Row-level runtime filtering is disabled for reflection refresh jobs so that views no longer return incorrect results due to an incorrect match to a single Starflake reflection.
When connecting to an Apache Druid source, the username and password are now optional.
When modifying the credentials for an existing Arctic catalog, the external ID for the IAM role now persists rather than refreshing with the page.
View schema learning has been improved to handle complex types and no longer requires query re-planning.
Fixed a NullPointerException (NPE) that occurred during split assignment of Delta Lake scans.
When creating recommendation reflections, more than one recommendation may be created in response to a single job ID. Also, the initial SQL query can now contain outer joins that are part of a view definition, in addition to inner joins, and set operators. See Reflection Recommendations for more information.
Updated Calcite to version 1.19.
When a user logs out, all UI context is now cleared.
Logging out while on the Settings page for an Arctic catalog no longer results in an error.
All scripts are now visible when users scroll to the end of the scripts list in the SQL Runner. Also, the displayed number of scripts is now accurate up to 1000.
In the SQL Runner functions panel, the filter categories are now listed in alphabetical order.
In the SQL Runner, the copy button is now disabled while queries are running.
Using the tab character in object names no longer causes inconsistent column spacing.
On the Job Overview page for a canceled query, clicking the View Profile tab no longer results in an error.
The Job Overview page no longer reports incorrect state information for reflections.
A new script is no longer created when you open the SQL Runner by clicking a dataset name and then click the back button to return to the previous screen.
When users are on the Job Details page, the browser tab name now correctly displays Job Details - Dremio
.
For queries with a large number of results, truncation messages now display the correct number of rows of results.
When deleting a script, users now receive only a single confirmation dialog.
Table results now clear correctly when users save a run or previewed query as a script.
When editing a query, users can now see the previewed results of a transformation on the previously selected dataset.
The APPROX_COUNT_DISTINCT function now properly calculates the approximate count distinct rather than the exact count distinct.
Fixed an issue where queries that contain correlated subqueries in the join condition could return duplicate rows.
Queries that involve array columns that contain string values no longer fail.
Fixed a performance issue that affected queries that contain many GET calls for large arrays.
A balanced UnionAll subtree now prevents stack issues when inserting a large number of values.
In some cases, the HASH_JOIN operator could request more memory at the beginning of its work than anticipated. When this happens, instead of allowing the query to fail, Dremio now satisfies the operator's request and takes note of the elevated memory requirement.
Users now receive a more informative error message for ALTER TABLE queries that attempt to set a masking policy that refers to a non-existent function.
November 27, 2023
You can connect your Azure account to Dremio Cloud when getting started or adding a project or cloud to your organization for the following supported regions: East US, Central US, and West Europe. Learn more about the Azure prerequisites and how you can get started.
The COPY INTO command now supports Parquet files.
You no longer need the MONITOR privilege to run Arctic optimization jobs.
November 16, 2023
You can see a view definition or an Arctic table definition if you have the SELECT
privilege, although editing a view definition requires further privileges.
You can now see syntax errors in your SQL query as you enter the query into the SQL editor. Each error is automatically detected with a red wavy underline and contains information about the type of error. For more information, see Syntax Error Highlighting.
The details panel can be collapsed so it no longer overlaps the SQL Runner page or Datasets page, making it easier to access and to use for switching between details for different objects.
Dremio now supports the SQL commands SHOW CREATE VIEW
to see a view definition and SHOW CREATE TABLE
to see a table definition. For more information, see SHOW CREATE VIEW and SHOW CREATE TABLE.
The following SQL functions are now supported: ATAN2
, BITWISE_AND
, BITWISE_NOT
, BITWISE_OR
, BITWISE_XOR
, DATETYPE
, HASH64
, PARSE_URL
, PMOD
, STRING_BINARY
, and TIMESTAMPTYPE
.
Folders are no longer deleted from the main branch when using the delete folder option.
When using hash joins, queries no longer fail with unexpected restart of an executor.
The default job results cleanup path no longer results in disk space issues and unexpected restarts on some cluster nodes.
In the new source dialog for Arctic sources, the following configuration options have been moved from the Storage tab to the Advanced Options tab: Disable check for expired metadata while querying and Enable source to be used with other sources even though Disable Cross Source is configured.
When hovering over a very long label for a dataset in the details panel, the label name is no longer cut off in the tooltip.
When generated labels are a subset of existing labels for a dataset, the Append button is disabled inside the dialog.
Previously, if a user dropped a branch in which reflections were created, the reflections defined by the datasets on that branch would not be deleted in the next reflection refresh cycle. Those reflections would become orphaned and never get cleaned up. This issue is now fixed.
For Hive and Glue sources, filters are now successfully pushed down to the Iceberg Manifest Scan.
The parsing of CSV files has become more strict. Quoted values are now expected to be terminated properly with the quote symbol before reaching the end of the file; otherwise, an UnmatchedQuoteAtEOFException
will be thrown.
Extra columns in a CSV file (compared to target table schema) no longer cause issues during a COPY INTO ON_ERROR
('continue') job.
Query profile now shows the correct resolved table/key count when a SQL context is set in a query or view.
Users can now browse tables in catalogs whose names include an underscore.
Billing and usage views now more accurately reflect Azure-specific engine characteristics.
Role endpoints that are PUBLIC
now return limited information. These endpoints are called by UI in the context of searching a role or getting the role information.
The visual profile is no longer prevented from working in some cases due to strict security measures.
Operations to add a row-access policy no longer fail because the UDF couldn't be resolved.
If a query used in a reflection contains a UDF, reflection refreshes no longer fail with a plan serialization error.
In order to increase coordinator stability, the plan cache size has been decreased from 10k queries to 1k queries and the time duration from 10 days to 8 hours.
For datasets created by Dremio, the CREATE TABLE
, REFRESH REFLECTION
, OPTIMIZE TABLE
, and INSERT INTO
SQL commands will now have dictionary encoding enabled. If the page data lends itself to dictionary encoding, the corresponding page data will be dictionary encoded.
Error handling is improved when users create a view with a full query starting with CREATE VIEW
.
The reflection recommender now provides user queries that include COUNT(DISTINCT)
and/or APPROX_COUNT_DISTINCT
with accurate reflection recommendations.
Handling of inferred partition columns is improved. Specifically, FOR PARTITIONS (...)
now works properly for inferred partition columns.
October 31, 2023
Removed an errant dependency check that was preventing some engines from starting or scaling replicas.
Fixed an issue with AWS regional STS endpoint support for Glue sources that assume an AWS role. To enable AWS region STS endpoint support, set value of the property fs.s3a.assumed.role.sts.endpoint
to the STS endpoint hostname for the region that you are using. For example, the value might be sts.us-east-1.amazonaws.com
.
Metadata on AWS Glue sources was not being refreshed according to the schedule defined on the source. In some cases, new data was only seen after ALTER TABLE <table> REFRESH METADATA
was run.
Due to metadata caching, it may take up to five minutes to reflect revoked privileges on objects in a Sonar project, including on Arctic catalogs.
Users with the organization-level MANAGE GRANTS
privilege who have not been assigned the ADMIN
role are not able to assign privileges to users or roles unless they have been explicitly assigned the CREATE USER
or CREATE ROLE
privilege.