Changelog
This changelog provides a detailed record of the previous 12 months of updates and enhancements we have made to improve your Dremio Cloud experience.
July 9, 2024
View schema learning now occurs only for queries that are issued from the Dremio console or reflection refresh jobs.
Queries no longer hang on coordinator startup when the materialization cache takes a long time to start up.
A raw profile is now available as soon as a job is in a running state.
Fixed a bug where duplicate rows could be returned when retrieving usage
objects.
July 2, 2024
Reflection recommendations automatically generate for the top 10 most effective default raw reflections based on query patterns from the last 7 days. You can view these recommendations on the Reflections page in the Dremio console.
Added a retry mechanism when reflections are expanded into the materialization cache, which adds fault tolerance to coordinator upgrades and restarts.
User impersonation is now supported for Oracle sources.
The Privileges dialog is improved for managing sources, views, tables, and folders.
You can now bulk delete scripts.
You can specify a column as a MAP data type in CREATE TABLE.
You can use VACUUM CATALOG for Arctic sources on Azure.
Deleting a project in Standard edition no longer results in autoingestion being unavailable.
The usernames in Arrow Flight JDBC/ODBC and Legacy JDBC/ODBC jobs are now shown in the same consistent case regardless of the username case in the connection URL.
Fixed an issue that could introduce duplicate rows in the results for RIGHT and FULL joins with non-equality conditions and join conditions that use calculations.
Updated error messaging for creating or deleting a folder on non-branch references.
Updated the following library to address potential security issues:
- org.postgresql:postgresql to version 42.4.5 [CVE-2024-1597] DX-91055
When you query the information schema, you can now see only the tables and views that you have access to instead of all datasets.
Added a rule that pushes an aggregate below a join if the grouping key is also a join key.
All existing engines without an instance family have been backfilled to either m5d or ddv5 depending on the cloud vendor.
Correlated subqueries that include a filter that doesn't match any rows no longer result in an error message.
Reflection recommendations now occur when plan regeneration is required and the name of the dataset is not fully qualified and contains a period (for example, "arctic1"."@username@dremio.com".v1
).
When a dataset is created in a source, the dataset inherits its owner from the source. Inheritance no longer fails if the source owner is inactive; instead, the dataset owner is now set to the system user.
The author ID no longer appears as the author's name in the commit history after a branch is merged using a SQL command.
Dataset version sorting no longer results in incorrect "not found" error messages when listing datasets in the Dremio console.
Reflections with row and column access control now produce the correct results when algebraically matched.
The current owner of a script is now correctly displayed in the Dremio console.
Certain font ligatures are no longer displayed in the results table on the SQL Runner page.
Disabling Download Query Profiles for admin and users now correctly restricts users from downloading profiles.
The raw query profile has been improved to include Execution Resources Planned and Execution Resources Allocated planning phases to help with debugging execution-related issues.
Users who do not have the required privileges to view all user and role names when using the Dremio console to manage privileges can add privileges by entering users' and roles' exact names in the Add User/Role field.
You can now use the Secret Resource URL when adding an Oracle source, which could not be used previously in an Oracle source due to a "missing password" error.
In the Advanced view of the reflections editor, you can select the SQL functions to use as measures in the Measure column for aggregation reflections.
The listing of catalog items no longer times out due to a very large number of catalog objects. To address the issue, optional pageToken
and maxChildren
parameters have been added to the API endpoints for getting catalog entities with children by ID or by path.
Indexing the same JSON into CONVERT_FROM multiple times no longer leads to incorrect results.
June 5, 2024
The Dremio JDBC driver now supports parameters in prepared statements.
You can use autoingest pipes to set up and deploy event-driven data ingestion pipelines directly in Dremio. This feature is in preview for Dremio Cloud and supports Amazon S3 as a source.
The retention period of jobs history stored in Dremio has been reduced from 30 days to 7 days, which improves job search response times. Use the jobs history system table to get the jobs history for jobs that are older than 7 days.
DML and CTAS are supported for the query_label
workload management rule.
There are two new methods to start refreshing a reflection.
When an incremental refresh materialization is deprecated, you no longer see a DROP TABLE job in the job history but the reflection data is synchronously cleaned up as a part of reflection management.
For Azure projects, you can now create a table or view when the name of the table or view has a dot such as "arctic1"."@username@dremio.com".v1
.
Users (including admin users) can now use the Scripts API to manage scripts from API clients for migration, management during owner offboarding, and other purposes.
All write operations for Arctic views are written in the new Iceberg Community View format (V1). Existing views are still supported in the old format (V0), although any update to an existing view rewrites the view in the new format. Read operations are supported for both V0 and V1. To see which view format is being used, open the Details panel or metadata card for the view. For Dialect, the V0 views show DREMIO and V1 views show DremioSQL.
ON CONFLICT and DRY RUN clauses are now available for MERGE BRANCH.
New SQL commands have been added for autoingest pipes: CREATE PIPE, ALTER PIPE, DESCRIBE PIPE, and DROP PIPE.
When a reflection that depends on certain file formats (Iceberg, Parquet, ORC, or Avro) is due for a refresh and has no new data, a full refresh is no longer performed where data is read in the REFRESH REFLECTION job. Instead, only a new refresh is planned and a materialization is created, eliminating redundancy and minimizing cost for the reflection.
Default raw reflection matching can now be used during REFRESH REFLECTION jobs.
Reflections are no longer deleted when a reflection refresh fails due to a network error or the source being down.
Duplicate default raw reflection recommendations are no longer created when querying a view that contains joins.
When multiple jobs are submitted to the reflection recommender, the reflection recommender no longer errors out if some of the jobs are ineligible for recommendation. Instead, reflections are recommended for eligible jobs.
TBLPROPERTIES
(table properties) for Iceberg tables are now saved in Apache Hive.
Reading a Delta Lake table no longer fails with an error about an invalid Parquet file.
The AWS Lake Formation tag authorizer now considers database-level tags.
Dremio now honors workload management rules that contain the query_label
function.
When using an IAM role and attempting to add an AWS Glue source, you no longer see an error message about loader constraint violation due to AWS Glue authentication.
Reflections no longer incorrectly match into queries containing ROLLUP.
On the Organization page, hovering over Learn more for Arctic and selecting the Get Started with Arctic link opens the updated Getting Started with Dremio page.
During the signup process, the catalog is no longer missing in the CloudFormation Template (CFT) parameters if the CFT failed the first time and you click Rerun CloudFormation template.
If you delete a branch or a tag that you are currently on, you are now rerouted to the Data page for the default reference instead of seeing an error message.
Tooltips on the Catalog page are now displayed correctly on Firefox.
Dataset names are no longer truncated incorrectly.
An error message no longer appears when loading results of multiple jobs that executed on different engines.
Error messages that appear when a user tries to view the wiki of the folder for which they don't have privileges now describe the problem more clearly.
Creating a new script while on a script that displays an error message no longer causes the error message to persist.
You can now use decimals in ARRAY_REMOVE and ARRAY_CONTAINS functions.
NPE has been fixed when ARRAY_CONTAINS is used in a WHERE clause.
New line characters (\n) are supported in regex matching.
Incorrect splitting no longer occurs when the value contains UNICODE characters like 'á'.
May 22, 2024
The system_iceberg_tables
plugin now uses the project ID in the project store in order to isolate data for each project.
May 14, 2024
Integrating with AWS Lake Formation is now supported, which provides access controls for datasets in the AWS Glue Data Catalog and defines security policies from a centralized location that may be shared across multiple tools.
The query profile now contains the origin of the error such as COORDINATOR or EXECUTOR in addition to the error type that is prefixed to the error message. In case of "OUT_OF_MEMORY ERROR:" error types, the type of memory causing the error such as HEAP or DIRECT_MEMORY and the additional information about the current memory usage can now be seen in the verbose section of the error in the profile.
You can now use MongoDB as a data source in Dremio Cloud in Azure. For details, see MongoDB.
Call failures to the JTS when sending an intermediate executor profile are now ignored.
Reliability for Dremio coordinators has improved.
Privileges are now available for folders in Arctic catalogs, and CREATE FOLDER, CREATE TABLE, and CREATE VIEW privileges have been added for Arctic catalogs. Privileges are also now inherited for objects in Arctic catalogs.
Support has been added for reading Apache Iceberg tables with equality deletes.
The CSV reader now uses direct memory instead of heap memory.
Queries now succeed even if telemetry storage fails. While a query is running, the executors and the coordinator send telemetry about the query execution to the JTS, which is written to a persistent store when the query completes or fails. This incomplete telemetry is indicated in the Job Details page for transparency. is_profile_incomplete
has been added in the system.project.jobs
table to indicate the profile status and incomplete data.
In Enterprise edition, you can now select a secrets management option in place of existing secrets/password fields inside of the source creation and source edit dialogs.
Out-of-the-box observability metrics are now available for user activity and jobs such as most active users, longest running jobs, most queried datasets, and more.
If the job profile results are incomplete, you are notified and the options to download the profile and see the raw profile are unavailable.
A Record fetch size has been added as a parameter in the settings for Snowflake sources. The default fetch size is 2000.
Arctic source pages are accessible for commits, tags, and branches. After you open an Arctic source, you can see for settings in the top right of the Datasets page actions.
When creating an organization, you have additional role options and can select multiple roles.
A new sys.reflection_lineage
table function lists all reflections that will be refreshed if a refresh is triggered for a particular reflection.
Reflection refreshes can now be configured based on a schedule. You can pick a time of day (UTC) and days of the week to refresh reflections for sources and tables.
Rate limits have been added to the Jobs service API on both the UI and Public API.
The authorTimestamp
in SHOW LOGS
has changed from VARCHAR to the dateTime data type.
To get a count of the number of rows in a table, Dremio now requests an estimated document count rather than aggregating the document. As a result, Dremio can retrieve the count more quickly.
Catalog maintenance tasks are introduced for controlling and preventing duplicate dataset versions from being created by API calls. Two scheduled tasks now run daily:
- Trimming the lists to a maximum of 50 records that are subject to a 30-day time-to-live (TTL) or the TTL value that is configured for jobs.
- Removing temporary dataset versions generated by jobs that are older than 30 days or according to the configured TTL.DX-87659, DX-87549
Memory tracking issues that would cause queries to be cancelled due to exceeded the memory limits are fixed (with Memory Arbiter enabled and high memory utilization on the node).
Refresh Tokens Lifetime for Tableau is extended to 90 days to handle offline use cases like extracts.
Promoted datasets with inconsistent partition depth no longer occasionally throw an ArrayIndexOutOfBoundsException
when filtering against deeper partitions.
A client pool has been added for more performant concurrent Hive metastore operations. Pool size can be controlled by store.hive3.client_pool_size
, if set to 0 pooling is disabled.
Implementation has been added for the AWS Glue Data Catalog to pull and use Lake Formation tag policies. By default, this feature is turned off.
Commons-compress has been updated to version 1.26.1 [CVE-2024-25710] to address potential security issues.
An issue with slot assignment for preview engines is now fixed by adding RDBMS sources and metadata refresh for the same source type. Preview engines no longer have assigned slots that cannot be released, and new queries can get free slots and no longer have timeout errors.
Dremio now uses multiple writers in parallel for non-partitioned table optimization. The small files generated during the writing are combined by another round of writing with a single writer.
Reflections have been fixed in the following ways:
- Reflections containing temporal functions such as NOW and CURRENT_TIME are no longer incrementally refreshed, potentially producing incorrect results. REFRESH REFLECTION jobs for reflections containing these dynamic functions are now full refreshes. DX-89451
- Reflection refresh jobs no longer show zero planning time when the refresh is incremental. DX-87548
- A snapshot-based incremental reflection refresh for unlimited-split datasets on Hive no longer results in excessive heap usage due to metadata access during the reflection refresh. DX-88194
Query profiles no longer show planning phases twice.
ROLE
and USER
audit events are now available in the sys.project.history.events
table in the default project of an organization.
An issue no longer occurs if Lake Formation tag policies are present, but there are no Lake Formation tags defined on a certain table.
To prevent conflicts between SLF4J 1.x and 2.x, the Dremio JDBC driver no longer exposes the SLF4J API and uses the java.util.logging
(JUL) framework to log messages. The application can be configured for the parent logger for the driver by using java.sql.Driver#getParentLogger()
or directly using java.util.logging.Logger#getLogger("com.dremio.jdbc")
.
For AWS standard edition, users no longer see an unsupported error when clicking on a dataset to query the dataset.
Dremio blocks view creation if the view has a cyclic dependency on itself.
Hash join support structures are reallocated when they are insufficient for incoming batch.
If the wiki editor is empty when generating a summary, it is now automatically plugged into the wiki editor.
An intermittent failure when retrieving a wiki has been fixed.
For tables and views in a catalog, table and view owners need to have the USAGE privilege on the catalog to retrieve and create reflections. Previously, table or view ownership was sufficient to retrieve and create reflections using the reflections API.
accelerationNeverRefresh
and accelerationNeverExpire
are properly populated in /api/v3
for sources.
The performance of loading the data visualization for jobs on the Monitoring page has improved.
The UI no longer breaks when clicking to set the refresh schedule for certain source types.
Privileges have been updated in the following ways:
- If you have only the ALTER privilege on a dataset and no privileges on the catalog, you can open the folder and see the dataset, but you cannot edit the dataset or run the query. DX-87891
- If you have only the SELECT privilege on a dataset and no privileges on the catalog, you can open the folder, see the dataset, and run a query on the dataset but you cannot edit the dataset. DX-87891
- Users who do not belong to the ADMIN role cannot view the User filter or the list of users on the Jobs page. DX-87660
- The view owner is now properly listed in the Dremio console after the owner is updated. DX-88705
The Table Cleanup tab in the catalog settings sidebar is no longer hidden when a user who belongs to the ADMIN role is viewing the Catalog Settings page with a non-admin role.
The SQL Runner has improved in the following ways:
- When you run a query in the SQL runner, the page no longer briefly displays the previous query's results. DX-83509
- The results of previously run queries now load much more quickly. After you open a saved script in the SQL Runner, the results are automatically displayed in a summarized format if at least one job in the script has successfully completed. To load the results of a specific query, select the query tab above the results table. DX-90110,DX-90627
- Running a query and attempting to save it as a view no longer causes the results to disappear. DX-86266
- Switching between the tabs in the SQL editor now correctly displays the job type. DX-89787
- Script names are no longer prevented from being saved after users rename a tab and edit the SQL content. DX-86751
- When expanding the large data field in the SQL Runner by using the ellipsis (...), the results are now responsive when the data includes DateTime objects. DX-86541
- There is no longer a data correctness issue where joins with non-equality conditions and join conditions using calculations would sometime introduce duplicate rows (while respecting desired filtering properties) into the result set. DX-90720
- Background threads no longer run when a query in the SQL Runner is cancelled or fails. DX-85812
Adding a folder to the primary catalog now uses the references of the primary catalog rather than the selected source. When adding a folder to the primary catalog using the + icon, the folder is now correctly added to the primary catalog and not to the selected Arctic source.
The Details panel is no longer blank when you open the panel from the hover menu inside of an Arctic source.
The dataset metadata card no longer incorrectly opens versioned views from the Job Details page.
Messaging has been improved for:
- notifying if the top level space or source doesn't exist during view creation DX-85784
- successfully updating script privileges when using
GRANT TO ROLE
orGRANT TO USER
DX-88527 - failed queries with row-access and column-masking policies using reflections DX-88480
- failed metadata cleaning due to the expiration of snapshots DX-69750
- switching between tabs in the SQL Runner DX-87980
- originating from Hive3 plugin engine for the cases where other source types are served, such as Hive 2.x or AWS Glue. DX-87596
- notifying you if the catalog for the source has been deleted or cannot be found when you attempt to use a source DX-65235
- attempting to create a table or view that has the same path as an existing folder DX-86880
Merge action in the Dremio console no longer shows the user ID instead of the user email in the commit.
The SKIP_FILE
option for the COPY INTO
SQL command no longer fails to handle Parquet file corruption issues if the issues are in the first page of a row group.
Trailing semicolons that terminate an OPTIMIZE TABLE
statement are no longer marked as an error in the SQL Runner.
Scalar UDFs no longer returns incorrect results in some cases.
A reliability issue in ARRAY_AGG
has been fixed.
DML no longer breaks after a table is recreated and merged.
You can now perform DML queries when no context is selected.
April 16, 2024
Azure images for Dremio Cloud executors have been upgraded from CentOS 8.5 Gen 2 Linux to the Ubuntu 22.04 Linux distribution.
April 2, 2024
Enabled the memory arbiter by default in order to monitor the usage of four key operators: HASH_AGGREGATE, HASH_JOIN, EXTERNAL_SORT, and TOP_N_SORT. This usage is monitored across all queries running on an executor to improve how the executor utilizes its direct memory and to reduce OutOfMemoryException errors.
- If the memory arbiter detects that the memory usage is too high, then the memory usage will be reduced in these two ways:
- Starting with the biggest consumers, some of these operators will need to reduce their memory usage mainly by spilling to disk.
- Memory allocations will be blocked.
Enabled HASH_JOIN to spill to disk by default when the memory allocated for a query is fully utilized.
Added support for column mapping within Delta Lake tables, effectively supporting minReaderVersion 2.
Improved partition pushdowns for Iceberg tables when queries use datetime filters that include a function on a column.
Improved coordinator startup times by allowing Dremio to serve queries while the materialization cache is loading in the background. Some queries might not be accelerated during this time.
Made it possible find out the oldest reflection data used by accelerated queries by looking for "Last Refresh from Table" in the job summary of a raw profile or by querying the system table SYS.PROJECT.REFLECTIONS and looking for last_refresh_from_table
in the output.
Disabled C3 caching during loads of Parquet source files via the COPY INTO operation, thereby reducing cache contention with other query workloads.
Reduced the heap memory used by the SORT operator.
Improved reliability and memory efficiency for Dremio coordinators.
Reading Iceberg tables with positional deletes no longer causes an IndexOutOfBoundsException.
Improved Dremio's capabilities for concurrent DML operations on Iceberg tables and improved error messaging for concurrent load failures.
Updated the Arrow package version to 14.0.2 to include Dremio Arrow fixes, and to include new features and fixes from Apache Arrow.
Added support for Version 15 of the Arrow Flight JDBC driver. You can download the driver from here.
Added support for the copy_errors()
table function on Parquet tables.
Added support for limiting access to specified databases on Glue sources.
Improved the Projects API in these two ways:
- The Project object now includes the
lastStateError
object for projects for which Dremio encounters an "invalid credentials" error. - The Projects API can be used to update project credentials.DX-65288
Added to Reflection Summary objects of the Reflection API and the SYS.PROJECT.REFLECTIONS table the error message that explains the most recent failure of a refresh of a reflection. No message appears if no refresh has yet been attempted, no failure has occurred, or a successful refresh has followed a failed one.
Reflection recommendations are now associated with the corresponding job IDs.
Added the parameters isMetadataExpired
and lastMetadataRefreshAt
to the Table and View objects of the Catalog API. Now, when either of these two methods is called and a table or view has stale or no metadata, there is no automatic refresh of the metadata:
GET /v0/projects/{project-id}/catalog/{id}*
GET /v0/projects/{project-id}/catalog/by-path/{path}
Instead, users can look at the values for the two new parameters and decide whether to invoke a refresh by calling this method:
POST /v0/projects/{project-id}/catalog/{id}/refresh
Changed the tabs in the SQL runner to display the most recent results of a query, if the results are available from the job history, without the user having to run the query again.
Added support for editing the credentials of projects that use AWS or Azure. This can be done in the Project Storage section of a project's settings.
Privilege changes are processed more quickly in the Dremio console.
Added a check to determine whether users running the COPY INTO command have SELECT privileges on either the source storage location specified in the FROM clause or on each individual source file mentioned in the FILES clause.
Performance is improved and memory consumption is reduced for some INFORMATION_SCHEMA queries that filter on TABLE_NAME or TABLE_SCHEMA.
Fixed planning errors that resulted from accessing views on which a reflection that included the CONVERT_FROM()
function was defined.
Fixed an issue that caused an additional empty subnet field to appear in the second step of the process for creating the first project of an organization. This field appeared if a cloud tag was added during the first step.
Fixed an issue that allowed reflections to be created when their definitions included UDFs that contained context-sensitive functions.
Fixed an issue that prevented full exception stack traces from being provided for errors generated by queries that included an ambiguous column name.
Queries are no longer cancelled due to exceeding memory limits while spilling during a SORT operation.
Fixed an issue with case sensitivity that would lead to delayed processing of inherited privileges.
Fixed an issue with orphaned materialization datasets in the catalog due to incremental reflection refreshes that were not writing any data.
When an incremental reflection refresh is skipped due to no data changes, there is no longer an issue where the last refresh finished, causing the last refresh from the table and last refresh duration not to update in the reflection system tables and reflection summary.
Fixed an issue that caused filters on scans of source tables on MongoDB to use incorrect regex.
Ensured that group policy grants are respected in AWS Lake formation when Dremio is used with Okta.
Fixed an issue that occurred if "All tables" was selected during AWS Lake formation and the granting of a new permission that was meant to apply to all tables within the selected database.
Fixed an issue that caused the Save as View option in the SQL editor not to work after the option had already been used once in a single session.
Fixed an issue that prevented the Details icon from appearing for items in lists of sources or lists of spaces in the Dremio console.
Fixed an issue preventing users from accessing the Edit wiki button in the Details Panel.
Navigating between datasets using the lineage graph in the Dremio console no longer results in a message about unsaved changes.
Fixed an issue that caused the maximum number of scripts that could be saved in the Dremio console to be 100, not 1,000.
Fixed an issue that caused the creation of a new branch to update the context of the SQL Runner automatically.
Fixed an issue that prevented attempts to join datasets from the SQL Runner not to correctly default to the Custom Join tab when no recommended join existed.
Fixed an issue that caused the Delete Organization dialog to appear after an organization had been renamed and the new name saved.
Fixed an issue that enabled a Save button when an added subnet field had not been filled in.
Fixed an issue that caused the AWS Secret Access Key field to appear blank even when a key was specified. Now, the masked key is visible.
Made more concise the error messages for schema mismatch errors that occur for UNION ALL operators.
Dremio no longer caches CURRENT_DATE_UTC and CURRENT_DATE during query planning, which was causing incorrect results. As a result, queries that use CURRENT_DATE_UTC and CURRENT_DATE have some performance latency in favor of accurate results.
Fixed an issue that caused the SQL function APPROX_COUNT_DISTINCT to return null instead of 0 in some cases.
February 26, 2024
A valid location has been provided for DoGet
requests, resolving a compatibility issue with the Arrow Flight JDBC 15 driver and ADBC driver.
February 12, 2024
Privilege changes are processed more quickly in the Dremio console.
February 1, 2024
An exception error no longer occurs when you run a query or create a view with an ambiguous column name.
January 31, 2024
Incremental refreshes can be performed on reflections that are defined on views that use joins.
The Reflection Recommender gives recommendations to users with complex queries and deep semantic layers for better performance and predictable matching. The queries for which default raw reflections can be recommended must run against one or more views that match certain criteria.
You can now use MongoDB as a data source in Dremio Cloud in AWS. For details, see MongoDB.
For Azure Blob Storage and Data Lake Gen 2 sources, checksum-based verification is enabled to ensure data integrity during network transfers.
You can click Close Others to close all tabs besides the active tab in the SQL editor.
The ARRAY_FREQUENCY
function is now supported.
Creating a raw reflection on a dataset on which no reflections are already defined no longer creates an aggregation reflection.
To alter the reflections on a view or table, the user or role must have the privilege ALTER_REFLECTION
on it and also have the USAGE
and COMMIT
privileges on the Arctic catalog.
Query planning times are shorter during the metadata validation phase due to view schema learning.
There is no longer an exception during the planning of queries on views that use the INTERVAL
data type.
Queries against Iceberg tables with positional deletes no longer fail with an error like “the current delta should always be larger than the amount of values to skip."
Unneeded columns are now trimmed from JDBC pushdowns.
The performance of health checks of AWS Glue data sources has been improved with checks of the state of the metastore and attempts to retrieve databases with a specified maximum result limit for 1.
The successful generation of labels and wikis no longer requires an engine to be running.
Selectively-run queries will be highlighted as errors, if they fail.
The dialog that explains a query has failed should no longer appear when you are switching between SQL tabs.
When adding a new Arctic catalog source fails, the error message now provides detailed information about the specific error.
Previously, if you used a statement in your query to set a schemapath to an Arctic source and folder, then the table or view validation would fail. Now, you can set the context to an Arctic source that includes any number of folders.
January 16, 2024
Reflections on views that join two or more anchor tables (Apache Iceberg tables and certain types of datasets in filesystem sources, Glue sources, and Hive sources) can now be refreshed incrementally.
Dremio now uses Micrometer-based metrics. Existing Codahale-based metrics are preserved and include the tag {metric_type="legacy"}
.
Executor metrics tags now include engineId
and subEngineId
.
You can use the Recommendations API to submit job IDs of jobs that ran SQL queries, and receive recommendations for aggregation reflections that can accelerate those queries. See Recommendations for more information.
These terms were added to the list of reserved keywords: JSON_ARRAY
, JSON_ARRAYAGG
, JSON_EXISTS
, JSON_OBJECT
, JSON_OBJECTAGG
, JSON_QUERY
, and JSON_VALUE
.
The following words were incorrectly made reserved keywords: ABSENT
, CONDITIONAL
, ENCODING
, ERROR
, FORMAT
, PASSING
, RETURNING
, SCALAR
, UNCONDITIONAL
, UTF8
,
UTF16
, UTF32
.
Fix a bug causing archived sonar projects not to appear for a user on the Sonar Projects page immediately after that user received the Admin privilege.
A NullPointerException could be returned when a row count estimate could not be obtained.
The tutorials that are accessed from the left navigation bar are available only to the creators of organizations, not to all users of organizations.
The settings for configuring a new catalog no longer appear until the cloud or type of cloud is chosen.
The Add Column, Group By, and Join buttons could be disabled if the SELECT command that defined a view was run and that command ended in a semicolon.
If you saved a new view in the SQL Runner and then re-opened the SQL Runner, the view that you had just created would still be present.
For some types of data sources, the generation of a wiki page would fail.
The Save button for reflections defined on views in spaces would be enabled for public users who have only SELECT, EDIT, and VIEW REFLECTION privileges. Such users still were correctly prevented from modifying reflections, as clicking Save did nothing.
Reflection management orphaned reflection materialization tables that were in the KV store. These tables would never get cleaned up and cause the KV store to become larger than necessary.
Querying Apache Druid tables containing large amounts of data could cause previews in the SQL Runner to time out.
All columns were being sent in JDBC predicate pushdowns.
Queries with correlated subqueries could return incorrect results.
An exception occurred when Dremio tried to get an estimate of the row count for PostgreSQL tables.
Opening the SQL Runner from the Details page of a table caused the SQL Runner to open with the SQL editor hidden in the new tab and in all open tabs.
Scrolling through phases and operators in a visual profile was sometimes jumpy.
Users without permission to edit a view in an Arctic source were able to access the view's SQL definition if a direct URL to the Detail page for the view was provided by a user who did have edit permission.
The wrong branch could become active after you refreshed the SQL Runner page and then clicked on the breadcrumbs at the top of that page.
If you clicked a view or a table, ran the generated SELECT *
statement in the SQL Runner, and then clicked the Edit button in the dataset details on the right, the SQL Runner would not be refreshed with the DDL for creating the view or table. The SQL statement and successful/failed query in the SQL runner will remain in the editor page when navigating to a dataset.
In API requests to create a new project, the catalogName
body parameter is now required.
December 14, 2023
You can now add an Azure private endpoint in the Azure portal when you connect your Azure account to Dremio Cloud or add a project to an organization. The outbound private endpoints are used to connect Dremio executors to the Dremio Cloud control plane over the Azure network.
The Dremio-to-Dremio connector is now supported in Azure.
Automated table cleanup to delete expired snapshots and orphaned metadata files is now supported for Iceberg tables in Arctic catalogs.
The algorithm that triggers a refresh of dependent reflections has been improved to prevent duplicate refreshes. The refresh operation now remains in a pending state until all direct and indirect dependences finish refreshing.
For reflections that are defined on Parquet datasets in S3 sources, Dremio can now automatically choose incremental refresh or full refresh.
Planning time for reflections has been substantially improved. The acceleration profile now contains a detailed breakdown of reflection normalization and substitution times.
The external token provider audit log now includes audit events for creating and updating BI applications.
The Clouds API now includes the privateEndpoints
parameter for specifying an Azure private endpoint.
You can now use tabs in the SQL Runner to work on multiple tasks simultaneously. All of your work in each tab is autosaved.
The Visual Profile now displays notable observations and potential problems for operators and phases. Users can use filters to control which operators are displayed.
The Visual Profile now shows the following runtime metrics: waitTimeSkew
, wallClockTimeSkew
, batchesProcessedSkew
, sleepingDuration
and cpuWaitTime
.
When users try to edit a deleted script, they will now see a confirmation dialog with the following options to prevent lost work: Discard, Copy SQL, and Save as script.
This update adds support for the following SQL functions: ARRAY_AGG
, ARRAY_APPEND
, ARRAY_DISTINCT
, ARRAYS_OVERLAP
, ARRAY_PREPEND
, ARRAY_SLICE
.
If you disable the Query dataset on click setting, the Datasets page does not include for tables and views. To query a dataset, click
in the left navigation panel to open the SQL Runner or click
for the dataset and select Query.
Users can now set privileges on folders with a .
character in their names and the tables these folders contain.
Iceberg metadata table functions no longer truncate the number of results returned to the maximum batch size set for exec.batch.records.max
.
Row-level runtime filtering is disabled for reflection refresh jobs so that views no longer return incorrect results due to an incorrect match to a single Starflake reflection.
When connecting to an Apache Druid source, the username and password are now optional.
When modifying the credentials for an existing Arctic catalog, the external ID for the IAM role now persists rather than refreshing with the page.
View schema learning has been improved to handle complex types and no longer requires query re-planning.
Fixed a NullPointerException (NPE) that occurred during split assignment of Delta Lake scans.
When creating recommendation reflections, more than one recommendation may be created in response to a single job ID. Also, the initial SQL query can now contain outer joins that are part of a view definition, in addition to inner joins, and set operators. See Reflection Recommendations for more information.
Updated Calcite to version 1.19.
When a user logs out, all UI context is now cleared.
Logging out while on the Settings page for an Arctic catalog no longer results in an error.
All scripts are now visible when users scroll to the end of the scripts list in the SQL Runner. Also, the displayed number of scripts is now accurate up to 1000.
In the SQL Runner functions panel, the filter categories are now listed in alphabetical order.
In the SQL Runner, the copy button is now disabled while queries are running.
Using the tab character in object names no longer causes inconsistent column spacing.
On the Job Overview page for a canceled query, clicking the View Profile tab no longer results in an error.
The Job Overview page no longer reports incorrect state information for reflections.
A new script is no longer created when you open the SQL Runner by clicking a dataset name and then click the back button to return to the previous screen.
When users are on the Job Details page, the browser tab name now correctly displays Job Details - Dremio
.
For queries with a large number of results, truncation messages now display the correct number of rows of results.
When deleting a script, users now receive only a single confirmation dialog.
Table results now clear correctly when users save a run or previewed query as a script.
When editing a query, users can now see the previewed results of a transformation on the previously selected dataset.
The APPROX_COUNT_DISTINCT function now properly calculates the approximate count distinct rather than the exact count distinct.
Fixed an issue where queries that contain correlated subqueries in the join condition could return duplicate rows.
Queries that involve array columns that contain string values no longer fail.
Fixed a performance issue that affected queries that contain many GET calls for large arrays.
A balanced UnionAll subtree now prevents stack issues when inserting a large number of values.
In some cases, the HASH_JOIN operator could request more memory at the beginning of its work than anticipated. When this happens, instead of allowing the query to fail, Dremio now satisfies the operator's request and takes note of the elevated memory requirement.
Users now receive a more informative error message for ALTER TABLE queries that attempt to set a masking policy that refers to a non-existent function.
November 27, 2023
You can connect your Azure account to Dremio Cloud when getting started or adding a project or cloud to your organization for the following supported regions: East US, Central US, and West Europe. Learn more about the Azure prerequisites and how you can get started.
The COPY INTO command now supports Parquet files.
You no longer need the MONITOR privilege to run Arctic optimization jobs.
November 16, 2023
You can see a view definition or an Arctic table definition if you have the SELECT
privilege, although editing a view definition requires further privileges.
You can now see syntax errors in your SQL query as you enter the query into the SQL editor. Each error is automatically detected with a red wavy underline and contains information about the type of error. For more information, see Syntax Error Highlighting.
The details panel can be collapsed so it no longer overlaps the SQL Runner page or Datasets page, making it easier to access and to use for switching between details for different objects.
Dremio now supports the SQL commands SHOW CREATE VIEW
to see a view definition and SHOW CREATE TABLE
to see a table definition. For more information, see SHOW CREATE VIEW and SHOW CREATE TABLE.
The following SQL functions are now supported: ATAN2
, BITWISE_AND
, BITWISE_NOT
, BITWISE_OR
, BITWISE_XOR
, DATETYPE
, HASH64
, PARSE_URL
, PMOD
, STRING_BINARY
, and TIMESTAMPTYPE
.
Folders are no longer deleted from the main branch when using the delete folder option.
When using hash joins, queries no longer fail with unexpected restart of an executor.
The default job results cleanup path no longer results in disk space issues and unexpected restarts on some cluster nodes.
In the new source dialog for Arctic sources, the following configuration options have been moved from the Storage tab to the Advanced Options tab: Disable check for expired metadata while querying and Enable source to be used with other sources even though Disable Cross Source is configured.
When hovering over a very long label for a dataset in the details panel, the label name is no longer cut off in the tooltip.
When generated labels are a subset of existing labels for a dataset, the Append button is disabled inside the dialog.
Previously, if a user dropped a branch in which reflections were created, the reflections defined by the datasets on that branch would not be deleted in the next reflection refresh cycle. Those reflections would become orphaned and never get cleaned up. This issue is now fixed.
For Hive and Glue sources, filters are now successfully pushed down to the Iceberg Manifest Scan.
The parsing of CSV files has become more strict. Quoted values are now expected to be terminated properly with the quote symbol before reaching the end of the file; otherwise, an UnmatchedQuoteAtEOFException
will be thrown.
Extra columns in a CSV file (compared to target table schema) no longer cause issues during a COPY INTO ON_ERROR
('continue') job.
Query profile now shows the correct resolved table/key count when a SQL context is set in a query or view.
Users can now browse tables in catalogs whose names include an underscore.
Billing and usage views now more accurately reflect Azure-specific engine characteristics.
Role endpoints that are PUBLIC
now return limited information. These endpoints are called by UI in the context of searching a role or getting the role information.
The visual profile is no longer prevented from working in some cases due to strict security measures.
Operations to add a row-access policy no longer fail because the UDF couldn't be resolved.
If a query used in a reflection contains a UDF, reflection refreshes no longer fail with a plan serialization error.
In order to increase coordinator stability, the plan cache size has been decreased from 10k queries to 1k queries and the time duration from 10 days to 8 hours.
For datasets created by Dremio, the CREATE TABLE
, REFRESH REFLECTION
, OPTIMIZE TABLE
, and INSERT INTO
SQL commands will now have dictionary encoding enabled. If the page data lends itself to dictionary encoding, the corresponding page data will be dictionary encoded.
Error handling is improved when users create a view with a full query starting with CREATE VIEW
.
The reflection recommender now provides user queries that include COUNT(DISTINCT)
and/or APPROX_COUNT_DISTINCT
with accurate reflection recommendations.
Handling of inferred partition columns is improved. Specifically, FOR PARTITIONS (...)
now works properly for inferred partition columns.
October 31, 2023
Removed an errant dependency check that was preventing some engines from starting or scaling replicas.
Fixed an issue with AWS regional STS endpoint support for Glue sources that assume an AWS role. To enable AWS region STS endpoint support, set value of the property fs.s3a.assumed.role.sts.endpoint
to the STS endpoint hostname for the region that you are using. For example, the value might be sts.us-east-1.amazonaws.com
.
Metadata on AWS Glue sources was not being refreshed according to the schedule defined on the source. In some cases, new data was only seen after ALTER TABLE <table> REFRESH METADATA
was run.
Due to metadata caching, it may take up to five minutes to reflect revoked privileges on objects in a Sonar project, including on Arctic catalogs.
Users with the organization-level MANAGE GRANTS
privilege who have not been assigned the ADMIN
role are not able to assign privileges to users or roles unless they have been explicitly assigned the CREATE USER
or CREATE ROLE
privilege.
October 23, 2023
VACUUM CATALOG
, which removes expired snapshots and orphaned metadata files for Iceberg tables, is now supported in Dremio Cloud. For Arctic catalogs, you can configure automatic table cleanup and set the cutoff policy in catalog settings. Dremio uses the cutoff policy to determine which snapshots and associated files to expire and delete. For Arctic catalog sources, you can manually run VACUUM CATALOG
on demand. For more information, see Enabling Table Cleanup and Setting the Cutoff Policy and VACUUM CATALOG.
You can now use Dremio Cloud's Generative AI capabilities to create wikis and labels for datasets. For more information, see Generative AI.
In the advanced editor for reflections, Dremio now can recommend partition columns.
Reflection and query plan caches are now cleared when they are disabled to ensure that queries do not use a deprecated reflection.
Arctic catalog settings, details pages, and API responses now recommend new URL patterns with Nessie API version v2 preselected.
Improved validation of the S3 root path when adding an Arctic source.
Plans for queries containing CONVERT_FROM(JSON)
can now be cached.
Text-to-SQL events are now available in system history tables.
Updated the operation used to refresh Delta Lake table metadata to improve performance.
The sys.organization.usage
system table now returns usage data for 365 days instead of 90 days.
For new projects created after October 23rd, users must be assigned the USAGE
privilege on the project before they can access or execute queries against any resource within the project's scope. For projects that existed before October 23rd, users who are members of the PUBLIC role automatically have the USAGE
privilege on the project. For more information, see Project Visibility and Access.
Tooltips have been added to disabled copy buttons in the case that you are viewing a page over HTTP instead of HTTPS.
Partition recommendations for reflections based on a single partitioned table are now available.
Incremental refresh query plans have been optimized to avoid Iceberg metadata scans when the snapshotID has not changed since the last refresh.
If a source owner is removed from Dremio, another user with permission to the source can now promote datasets and change the source configuration in place of the owner that was removed.
Dremio Cloud now supports the ARRAY_TO_STRING
SQL function, which returns a string of the values provided in the input array. For more information, see ARRAY_TO_STRING.
Dremio Cloud now supports the SET_UNION
SQL function, which returns a single array that includes all of the elements from the input arrays, without duplicates. For more information, see SET_UNION.
In some cases, the billing API was returning incorrect data for account balances.
Dremio was unable to read and query AWS Glue table partitions if partition column names or partition values contained spaces or other special characters.
When adding a project to an existing cloud, the CFT flow was ignoring the selected AWS region and directing to US_WEST_2 instead.
Fixed an issue that was causing an exception during filter pushdown into a Parquet scan.
Nessie sources with names that included special characters were not loading properly in the Dremio console.
In some cases, incremental reflection refresh by partition was resulting in truncated data when the base dataset and the reflection used truncate Iceberg transform.
The metadata card was not showing up if you hovered over a dataset with a forward-slash in its name.
When viewing details for a versioned dataset, the History tab was not displaying any information.
The Columns section in the dataset details panel was not updating if you selected a different dataset without first closing the details panel.
In some cases, unnecessary warnings about metadata changes were being displayed when editing Arctic source properties.
Fixed some minor scrolling and table display issues on the Project Settings > Engines page.
If a dataset name was the same as one of the tabs in the dataset details view (data, details, reflections, history), clicking to edit the dataset or clicking the Go to Table button would take you directly to the tab with the same name as the dataset.
For some browsers, an interruption in connectivity can cause a failure in updating the status of long-running queries.
Default raw reflections could not be substituted into a query that used UNION
with mixed types, which was causing longer than normal planning times.
In some cases, running ALTER TABLE <table_path> FORGET METADATA
against a view could result in the view being deleted instead of the command failing with an error.
Fixed an issue that was preventing users from creating aggregation reflections without dimensions via SQL, even though such reflections could be created in the Dremio console.
Fixed the following issues with acceleration information in job profiles when the plan cache was used: acceleration information was missing for a prepared query, plan cache usage was missing for a prepared query, acceleration information was missing when the query was not accelerated but reflections were considered, and canonicalized user query alternatives were missing. Additionally, matching hints were missing for reflections that were only considered.
If a date pattern only contained the year and month, the parsed date was returned as the last day of the previous month instead of the first day of the specified month.
In some scenarios, when Query dataset on click was enabled, clicking on a dataset was opening the dataset in the SQL Runner with an empty query instead of a default SELECT
statement on the dataset.
October 12, 2023
Dremio Cloud now supports access to cross-account S3 and Glue data sources in VPCs that utilize private subnets. To enable this access, the following connection properties must to be added under Source Settings > Advanced Options:
For S3 sources:
- fs.s3a.assumed.role.sts.endpoint = sts.<aws-region>.amazonaws.com
- fs.s3a.endpoint = s3.<aws-region>.amazonaws.com
For Glue sources:
- aws.region = <aws-region>
Prior to adding any sources, an S3 gateway VPC endpoint and an STS interface VPC endpoint must be created in the VPC.
October 11, 2023
Users who were not assigned to the ADMIN
role were unable to run queries against tables and views that did not have an owner. Owners were missing from tables and views created prior to the August 17 update of Dremio Cloud. For some tables and views, an error scenario could have caused the owner to be missing.
October 6, 2023
Users could drop a table or view from an Arctic catalog if they had USAGE
and COMMIT
privileges on the catalog and SELECT
privileges on the table or view. With this update, only users with USAGE
and COMMIT
privileges on the catalog and OWNERSHIP
privileges on a table or view, or users in the ADMIN
role, can drop a table or view.
In the Dremio console, it appeared as though a user without OWNERSHIP
privileges on an Arctic catalog could delete the catalog, even though they could not.
In some cases, queries that used CONVERT_FROM
in a filter condition were failing.
September 21, 2023
Dremio Arctic and all of its related features are no longer in preview mode.
In the Dremio console, ownership in Sonar and Arctic is now listed separately from other privileges, at the top of the Privileges page, and the procedure for transferring ownership is streamlined. For more information, see Transferring Organization Ownership and Transferring Ownership.
Added keyboard shortcuts in the SQL Runner for showing or hiding the Text-to-SQL panel and for triggering Text-to-SQL. For more information, see Keyboard Shortcuts.
The Record Count column has been moved next to the Current Footprint column in the Project Settings > Reflections table.
Added support for the il-central-1: Israel (Tel Aviv) region to the AWS Glue source.
Dremio Cloud provides more helpful information in the error message if an invalid tag or branch name is supplied.
You can now leave the Database Name field blank in a PostgresSQL source.
Reduced the number of S3 lookups required for Arctic DML and DDL operations to improve performance in query planning.
Reduced the amount of heap memory used by the query plan cache.
Updated the Source API to prevent sending secret values in clear text.
Setting query_label()
as an engine rule was resulting in an exception error.
In some cases, default raw reflection matching was not working as expected for users not assigned to the ADMIN
role.
After searching for and selecting a username or role in catalog privileges, the search string was not automatically being cleared.
The Tableau and Power BI buttons were visible in the SQL Runner for unsaved queries.
Saving a query as a view without having run the query was resulting in an error if no engine replicas were active.
Fixed an issue with COL_LIKE()
when input and pattern contained the %
character.
LIKE
was not being highlighted as a reserved keyword in the SQL editor.
Fixed an issue that was causing the use of GRANT ALL
on a project to fail with an "invalid project privilege" error.
OPTIMIZE
and VACUUM
queries on tables with reflections were being evaluated for reflection matching, causing an error.
Fixed an issue with filter pushdowns that was causing some preview queries to fail even though the same query was successful when using Run.
September 12, 2023
Fixed an issue that was causing the RST_STREAM closed stream
error when processing large result sets via JDBC or ODBC.
September 8, 2023
Some queries were failing with a Failure getting source
error.
Queries against views and tables in an Arctic catalog were not showing up on the Jobs page. This issue only affected organizations that use the default Arctic catalog instead of spaces.
September 6, 2023
The following regions are now supported in the AWS Glue source:
- ap-south-2: Asia Pacific (Hyderabad)
- ap-southeast-3: Asia Pacific (Jakarta)
- ap-southeast-4: Asia Pacific (Melbourne)
- eu-south-2: EU (Spain)
- eu-central-2: EU (Zurich)
- me-central-1: Middle East (UAE) DX-69347
Added a new table function, SYS.RECOMMEND_REFLECTIONS
, that recommends aggregation reflections to accelerate existing SQL queries. For more information, see Reflection Recommendations.
This update adds support for the following SQL functions:
- APPROX_PERCENTILE DX-62151
- ARRAY_AVG DX-65324
- ARRAY_CAT DX-67718
- ARRAY_COMPACT DX-67718
- ARRAY_GENERATE_RANGE DX-67718
- ARRAY_MAX DX-65324
- ARRAY_MIN DX-65324
- ARRAY_POSITION DX-67718
- ARRAY_REMOVE DX-65324
- ARRAY_REMOVE_AT DX-67718
- ARRAY_SIZE DX-67718
- ARRAY_SUM DX-65324
- NORMALIZE_STRING DX-68631
The Status of some failed queries was being reported as RUNNING instead of FAILED in the Jobs Overview page.
Updated com.google.guava:guava to 32.1.1-jre to address CVE-2023-2976 in Dremio's internal Iceberg fork.
Updated validation settings to ensure that only privileged users could view Acceleration Settings on the Project Settings > Reflections page.
Fixed an issue that was causing inconsistent query results when ARRAY_CONTAINS
was used with nullability checks.
Plans for queries containing CONVERT_FROM
could not be cached.
To address a CONCURRENT_MODIFICATION
error seen in concurrent metadata refresh queries on Parquet tables, if the query is submitted by scheduler, failures are ignored. If the query is submitted by users, the failed query is retried until it succeeds.
August 28, 2023
When viewing catalog or folder contents on the Datasets page, the "More" menu (...) for tables and views now contains a Delete option, allowing users with appropriate privileges to delete a table or view.
For some failed queries, status in the Job profile was being reported as RUNNING instead of FAILED.
When creating a new project, the Arctic catalog name was not being validated prior to launching the CloudFormation template.
The Open Results link on the Jobs overview page was not working as expected for queries that were run from edit mode on the Dataset page.
When saving a view, the items in the "Save View As" dialog were not sorted in the same way as on the Dataset page.
For organizations created prior to August 17, 2023, granting or revoking table or view privileges for the first time via SQL was successful, but an error was produced on the Organization Settings > Privileges page.
If a dataset name or the name of a parent folder contained a space or ampersand (&) character, clicking on the dataset would populate the SQL Runner with a truncated SELECT
statement.
August 17, 2023
As of this update, each Sonar project in new Dremio Cloud organizations will come deployed with an Arctic catalog, which will support data management capabilities (folders, tables, etc.) for the project. This primary Arctic catalog replaces your home space.
You can now use Role-Based Access Control (RBAC) privileges to control which roles and users can read, write, and manage tables and views in Arctic catalogs.
Dremio Sonar now supports the same SQL syntax as Spark when working with Arctic/Nessie sources.
Dremio automatically optimizes incremental reflection file size to improve reflection performance.
In this update, you must explicitly create folders (namespaces) before creating tables or views in them.
The Usage page under Organization Settings now includes usage data for Arctic catalogs.
Dremio will avoid a full data scan for simple aggregations on partition columns, reading the manifest metadata instead, which improves performance for queries on very large tables.
After DML operations against unpartitioned Iceberg tables, Dremio now compacts the data files written by the DML operation to improve future read performance.
Updated the Snowflake connector to fix intermittent issues when adding Snowflake as a source.
If you optimized a table in Dremio Arctic (Optimize Once) and then viewed the dataset settings for another table, the Optimize Once button remained disabled unless you refreshed the page.
Logged in users were getting redirected to the login page instead of the create password page when clicking on an invite that had not yet been accepted.
When running a job multiple times, the status and job link for the last attempt are now displayed as expected.
Increased concurrency limits to avoid errors when concurrent inserts into the same table were being sent from different streams.
Fixed an issue that could cause a memory leak when querying an Iceberg table with positional deletes.
At times, the DAY()
function was returning either integer or timestamp, depending on how the query was written.
Fixed an issue that was causing an error when running OPTIMIZE TABLE
on a table with reflections.
In some cases, the IF EXISTS
option for DROP BRANCH
and DROP TAG
was being ignored.
Top-level CASE
statements intended to return a Boolean were not being rewritten correctly, resulting in an error for some SQL Server queries.
Some SQL Server queries with nested CASE
statements were failing with invalid SQL comparison syntaxes.
Fixed an issue with the LEFT()
SQL function on Oracle sources for queries with dates.
Some date subtraction queries were not getting pushed down for Oracle sources.
July 27, 2023
July 24, 2023
This update provides performance improvements in the Jobs listing page, and any user with sufficient privileges can now view reflection jobs in the table.
The details panel displaying Wiki content is now available inside a folder on the main branch in an Arctic source.
This update adds support for a new connector that allows querying data from Apache Druid. For more information, see Apache Druid.
You can now drag and drop a table from your home space into the Text-to-SQL panel in the SQL Runner.
We have made some improvements in the efficiency of the queries suggested when using the Text-to-SQL feature.