3.3 Release Notes
Azure Storage OAuth authentication
Dremio supports Azure Storage OAuth 2.0 authentication. See Azure Storage for more information.
Single Sign On
Dremio provides SSO Microsoft's Azure Active Directory as an identity provider. See Configuring SSO for more information.
Personal Access Tokens (EE only)
Dremio allows personal access tokens (PATs). PATs can be used to address the login requirements for REST APIs or ODBC/JDBC when using SSO/LDAP in addition to logging into the Dremio UI. . See Personal Access Tokens for more information.
Dremio allows you to blacklist a node from being used for execution. This means that you can ensure that a node in your Dremio cluster can be excluded from being used to execute queries. See Monitoring Nodes for more information.
VDS Update Behavior
Dremio implements an start automatic widening behavior when creating VDS tables
from their underlying PDS.
This behavior is different from typical database view behavior.
In other words, when an underlying PDS is structurally changed and
if a VDS was created with
SELECT * FROM table,
the corresponding VDS is also changed to reflect the new structure.
For example, is a new column is added to the underlying PDS table after the VDS has been created, then the representative VDS table automatically refreshes to include the new column.
Amazon S3-compatible Minio
The Amazon S3-compatible Minio plugin is supported. See Amazon S3 for more information.
Dremio UI Reflections Filtering
The Dremio UI now provides additional filtering, sorting, and searching capability for reflections via Admin > Acceleration > Reflections.
The Reflections page is available as a tab on the main Explore page. No need to click cog icon to open DatasetSettings and then use Reflectioins as a modal page.
In addition, sorting of reflections is added for name, dataset, and footprint columns.
Dremio supports some partition pruning when reading a subset of rows using the LIMIT operator. This feature is dependent on whether the data source supports partition pruning.
Impersonation for Query Users (EE only)
Certain data sources allow impersonation to use the query user in addition to the VDS-delegated user as the impersonated username. This feature is applicable for HDFS, Hive, or MapR-FS data sources. See HDFS, Hive, and MapR-FS for more information.
Kubernetes Helm Charts
With this release, Azure storage configuration for uploads and accelerator data in K8S helm charts was added.
Gandiva is enabled by default.
Dremio now fully supports the decimal data type for Parquet and Hive (Parquet/ORC) sources. Text/JSON sources can cast strings/integers to decimals for decimal precision and scale.
Dremio now supports submitting queries with single semi-colon as a terminator (only one query at a time).
Dremio UI Chat Button
The Chat button in the Dremio UI is deprecated and will not be available in the next release.
- The upgrade process may take a prolonged amount of time depending on the length of the refresh cycle for reflections.
Creating, querying and dropping tables in scratch space, backed by PDFS, leaks directory handles
Resolved by closing directory handle after listing, in PDFS.
For Hive, false results are occasionally cached when user permissions
for a source dataset are checked.
Resolved by modifying the caching behavior. When user access is checked, the results are cached if the user has access to the source dataset. If the user does not have access, then the results are not cached; on the next request, user permissions for the source dataset is check again.
On upgrading to 3.2, reflections on datasets with more than 800 leaf fields
will fail to materialize with a NullPointerException.
Dremio supports datasets with at most 800 leaf fields. Prior to upgrade to 3.2, if there were reflections created on datasets with more than the limit, on upgrading, materializing such reflections will fail with "extended metadata (read definition) is not available" message. Administrators are recommended to drop all such reflections.
For S3 and HDFS data sources, the ColumnCountException field doesn't display the dataset name for files.
Resolved by removing the dataset name from the ColumnCountException message since the dataset name for S3 and HDFS isn't available.
When VDS reflections are refreshed, sometimes the refresh fail with the following error:
AssertionError: Cannot add expression of different type to set.
Resolved by updating VDS metadata flags.
Need an option to enable/disable Parquet date auto corrections.
Resolved by introducing a new configuration parameter,
to enable/disable auto correction of dates on parquet files. By default, it is enabled.
Hive Parquet decimal values display incorrectly when the schema changes the decimal type.
Resolved by enhancing query detection changes.
Unable to expand the buffer when querying Parquet files.
Resolved by fixing a buffer allocation issue in Apache Arrow.
Cannot read Dremio CTAS-generated Parquet files.
Fixed by updating the Python library for Apache Arrow. With this bug fix, all the Parquet files generated by Dremio 3.2 are readable by PyArrow. Files generated by older versions of Dremio still cannot be read by PyArrow.
A double scrollbar appears on the Jobs page when the screen size is small.
Resolved by re-factoring the Jobs page layout.
Unavailable sources display as green instead of red in dataset Browse panel.
Resolved by improving the state/status lookup.
For unreachable sources, source icons in the left navigation display red.
Fixed icon color for unavailable sources in the left navigation pane.
Data is never loaded when previewing results for a failed job or running a query which fails.
Resolved by displaying actual error message instead of spinner.
Remove Format menu does not work in the community edition.
Resolved by fixing the Remove Format menu for the OSS version.
In MapR-FS, VDS-based Access Delegation alignment in the UI is not displaying correctly.
Resolved by correcting the VDS access delegation display.
Moving a dataset to a different space makes it unavailable in search.
Resolved by improving the handling of datasets when they are moved or renamed.
Periodically, the Dremio UI displays a blank screen.
Resolved by improving validation implementation.
Under certain circumstances, browsers return an unexpected error associated with an undefined query context.
Resolved by improving UI implementation associated with an empty query context.
Cannot create or modify new rules after using Ctrl-Del in the conditions editor.
Resolved by proper handling keyboard shortcut.
File uploading stalls with very large files.
Resolved by limiting the upload file size to 500MB.
Supported SSL cipher suites updated.
Resolved by updating the supported cipher suites to align with recommended cipher suite list from OWASP TLS Cipher String Cheat Sheet.
In the RPM package, the Dremio Admin commands do not display error messages that indicate a false success.
Resolved by adding the logback-admin.xml file to the RPM package.
Upgrading to Dremio 3.2 on the MapR package breaks the S3 source and prevents it from being removed.
Resolved by allowing safe deletion and refresh for missing plugins.
When start Dremio, an invalid WARN message occurs.
Resolved by logging a message only if the Dremio version is older than storeVersion.
Job Summary for Profiles display incorrect planning time.
Resolved by improving the handling of command pool wait times.
The access.log is not working as expected.
Resolved by fixing Jetty access log collector class.
Outer JOIN is not producing the expected result for large datasets.
Resolved by improving the filter push down to the Parquet scan operator.
Field trimmer doesn't update collation field correctly in some cases.
Resolved by updating the Calcite field trimmer.
For column-based incremental reflections,
a SQL query with the equal (=) filter produces incorrect results.
Resolved the handling of column-based incremental updates.
Planner can generate incorrect BroadcastExchange when join type is right.
Resolved by enhancing the condition check for BroadcastExchange.
Make pclean logical disabled by default.
For Hive, a class not found exception occurs with the Hive parquet
DeprecatedParquetInputFormat input format.
Resolved by adding support to Hive source for parquet dataset using
parquet.hive.DeprecatedParquetInputFormat serialization format, commonly found in Cloudera Hadoop distribution.
For Amazon S3, excessive log messages are produced by ORC libraries when reading ORC files.
Resolved by changing the usage of ORC split property values for non-HDFS file systems. Hive tables or partitions which do not use HDFS to store data will have the Hadoop/Hive configuration property
hive.orc.splits.include.fileid set to
false for Hadoop library calls.
This change is required because ORC split file IDs are only available in HDFS.
The REFRESH METADATA SQL query does not work with Azure Storage.
Resolved by fixing the Azure Storage plug-in PDS METADATA REFRESH trigger.
For the RDBMS plugins, if the date_trunc() function is used in the query
it cannot be pushed down.
Resolved by adding support for the date_trunc() function in the RDBMS plugins.
3.3.2 Release Notes
Enhancements in 3.3.2
Metadata Query Limit
As of Dremio 3.3.2, a limit can be set on the number of tables returned for "get tables" metadata request from client applications.
- Users can set the maximum number of tables returned with the
MaxMetadataCountproperty. For JDBC, set the value as a connection property. For ODBC, set the value as an advanced property.
- Administrators can define the default maximum with the
The connection property overrides the support key. By default, the limit is set to 0 (disabled).
As of Dremio 3.3.2, the following enhancements or behavioral changes are applicable:
- Expressions, left JOINs, and inequality JOINs are now supported.
- For all relational databases, the
to_datefunction is now pushed down when used anywhere in a query.
Fixed Issues in 3.3.2
On occasion, the log/archive/queries.json file may be overwritten with random contents.
Fixed archiving of tracker.json logs where tracker.log archiving no longer overrides query.json archived logs.
For Teradata sources, queries are making unnecessary calls to retrieve metadata.
Resolved by improving the metadata retrieval process.
For Teradata data sources, previews for VDSs/queries with UNIONs fail.
Resolved by correcting the Teradata SQL LIMIT with UNION functionality.
For JDBC, JOINs do not pass down expressions in ON clauses.
Resolved by adding support for push down expressions in ON clauses for JDBC.
For SQL queries, left JOINs with BETWEEN and AND operators are not working.
Resolved by improving the handling of left/right swaps in rule.
For ADLS data sources, timeouts may occur if caching is disabled.
Resolved by improving socket/thread usage when caching is disabled.
For ADLS data sources, "too many open files" errors may occur when reading Parquet files.
Resolved by improving socket/thread usage when caching is disabled.
Submitting multiple CREATE VDS queries in Dremio 3.2 can cause the coordinator to become unusable.
For SQL queries, provided transitive join is enabled, filter push down occurs on both tables when the join key is computed.
Resolved by detecting the filter and projections are on equivalent expressions. To enable,
planner.experimental.transitivejoin must be set to on. Default: off
For SQL queries, aggregate JOINs on reflections do not work with timestamp fields.
Resolved by improving pushdown rules.
Enqueued jobs do not show their queue.
Jobs UI now shows the queue name of enqueued jobs.
For SQL queries, column name conflict resolution does not occur at every level.
When joining columns through the UI, if columns in two separate tables had the same name but different casing (e.g. DEPARTMENTID and department_id), the columns were _not automatically renamed despite their names being equivalent.
Resolved where JOINs through the UI detect and automatically resolve case-insensitive column name conflicts. The original names are preserved with an "_X" (where X is an integer) appended to the name. For example, when UI joining tables with columns DEPARTMENT_ID and department_id, department_id will become department_id_0. If there were more department_id columns, they would be renamed department_id_1, department_id_2, and so on.