4.0 Release Notes

What's New

Cloud Columnar Cache (C3)

Dremio provides a local (per executor node) cache for Parquet files. Cloud columnar caching is implemented for the following data sources:

  • Amazon S3
  • ADLS (Gen 1)
  • Azure Storage (ADLS Gen 2) - v2 only

See Configuring Cloud Cache for more information about cloud caching. See Amazon S3, ADLS, and Azure Storage for specific data source configuration information.

Multi-Cluster Isolation

Dremio provides the ability to isolate workloads by defining multiple separate clusters of nodes and route workloads to specific clusters by configuring WLM Queues. See Workload Management for more information.

AWS Security

Configurable IAM Role-based Access

Dremio supports configurable IAM role-based access to S3 buckets. On top of using access key/secret, S3 sources can now use customizable IAM roles from EC2 instance metadata for access. See Amazon S3 for data source configuration information.

Note that full S3 bucket access is not required for IAM roles.

AWS Secrets Manager

Dremio supports AWS Secrets Manager for the following data sources:

  • Redshift
  • Oracle
  • PostgreSQL

This feature is configurable in the General tab for the data sources when adding or modifying the data sources.

AWS KMS Encryption

Hive 3.1

Dremio now supports Hive 3.1. Additionally Hive ACID tables now use v2 of the specification.

Enhancements

Dremio UI Copy Result Sets

Dremio allows you to copy the result sets from the display window to the clipboard. This is accomplished via a button on the results table that copies query results to the clipboard. The maximum number of records copied is 5,000.

Amazon S3

Whitelisting Buckets

Dremio provides functionality for whitelisting S3 buckets. See Amazon S3 for data source configuration information.

Metadata Query Limit

A limit can be set on the number of tables returned for "get tables" metadata request from client applications. By default, the limit is set to 0 (disabled).

  • Users can set the maximum number of tables returned with the MaxMetadataCount property. For JDBC, set the value as a connection property. For ODBC, set the value as an advanced property.
  • Administrators can define the default maximum with the client.max_metadata_count support key. If a connection property is specificed it overrides the support key set on the server.

SQL Query

The following enhancements or behavioral changes are applicable:

  • For all relational databases, the to_date function is now pushed down when used anywhere in a query.
  • For Elasticsearch, scale_float type is supported.

Decimal Support

Dremio supports decimal to decimal mappings for relational database sources and MongoDB. Existing relational database and MongoDB data source will now map decimal to decimal. See Data Types for more information in addition to the data source specific data types.

Functionality Changes

The following functionality changes are applicable:

  • Dremio no longer supports Avro and Sequencefile outside of Hive.
  • Dremio no longer supports IBM DB2 and HBase. DB2 and HBase were deprecated in 3.1.3 and removed in 4.0.
  • Dremio's Intercom chat/Ask Dremio is now disabled.

Deprecations

The following features are deprecated in 4.0 and will be removed in a future release.

  • PDFS - Dremio now requires external storage to be configured for Distributed Storage sources and to replace local PDFS.
  • MapR (Community Edition)
  • Voting and Automatic Thresholds for Reflections
  • Elastic - Lucene search queries using the CONTAINS syntax; Painless scripts with aggregate pushdown; Versions 5 and 6
  • MongoDB - Aggregation Pipeline Pushdowns
  • Single Node with single process installations - Single node installations will require separate coordinator and executor processes.
  • SQL Functionality:
    • CONVERT_FROM - Will replace with a function that enforces a fixed schema
    • Mixed Types - Remove mixed types within a column and enforce casting to a common value

Not Supported

  • Windows installation
  • MacOS installation

Upgrade Notes

For additional upgrade notes, see Installing and Upgrading.

Upgrade Process Time

The upgrade process may take a prolonged amount of time depending on the length of the refresh cycle for reflections and depending on the use of decimals for relational sources.

Reflections Out-of-Sync

For RDBMS data sources, upgrading from Dremio 3.3 to 4.0 causes external reflections to become out of sync. This is expected behavior for external reflections on RDBMS sources. Workaround this behavior by dropping and recreating your external reflections.

Decimal Upgrade Behavior

When you upgrade, decimal columns in RDBMS and MongoDB sources now show as decimal in Dremio. Reflections with decimal data types will be invalid until refreshed. See RDBMS Decimal Support for more information about decimals.

Amazon S3 Distributed Store

The following exception may occur when using Amazon S3 as a distributed store and an EC2 Metadata Authentication mechanism.

Exception

java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.SharedInstanceProfileCredentialsProvider not found

See Configuring Distributed Storage for more information.

Dremio 2.0.3 or lower

Upgrading from Dremio 2.0.3 or lower to Dremio 4.0 is not supported.

  1. Upgrade to Dremio 3.x first.
  2. Then, upgrade to 4.0.

Fixed Issues

Unable to access Hive tables backed by Azure Storage.
Resolved by adding ADLS and Azure properties to the Hive configuration.

For MongoDB, an incorrect pushdown occurs when ISODate is used with FILTER.
Resolved by relaxing restrictions around allowed pushdowns for MongoDB.

For NAS data sources, adding the source fails when a forward slash is at the end of the path.
Resolved to take into account a trailing forward slash in the path.

For ADLS data sources, timeouts may occur if caching is disabled.
Resolved by improving socket/thread usage when caching is disabled.

Intermittent permission errors may occur when multiple ADLS sources are configured.
Resolved file system caching issue.

Dremio Wiki has an XSS security bug.
Resolved the XSS security issue by upgrading some internal modules.

Data is never loaded when previewing results for a failed job or running a query which fails.
Resolved by displaying actual error message instead of spinner.

The REFRESH METADATA SQL query does not work with Azure Storage.
Resolved by fixing the Azure Storage plug-in PDS METADATA REFRESH trigger.

For the RDBMS plugins, if the date_trunc() function is used in the query it cannot be pushed down.
Resolved by adding support for the date_trunc() function in the RDBMS plugins.

Supported SSL cipher suites updated.
Resolved by updating the supported cipher suites to align with recommended cipher suite list from OWASP TLS Cipher String Cheat Sheet.

Need to monitor heap usage on executor nodes to prevent outages.
Resolved by improving internal management of queries and heap usage along with improved error and exception messaging.

Direct memory usage from ORC reader can sometimes cause issues.
Fixed netty direct memory usage from ORC reader.

ODBC driver cannot handle dots in column names.
Resolved by relaxing dot validation on column names in Dremio.

Raw reflections with negative values for the partitioned column can lead to incorrect values..
Resolved by ensuring that the data from these partitions is also read as part of the query.

For MongoDB, timestamp filters with strings are not pushed down.
Resolved by coercing strings to timestamp when pushing down to MongoDB.

Aggregate queries on text file with new lines within a quoted field behave incorrectly.
Resolved by correcting count queries with .csv files.

ORC pushdowns are ignoring floating point literals.
Resolved by adding decimal case to ORC literal filters.

For RDBMS sources, when an unsupported column type was encountered subsequent columns would be incorrectly fetched.
Resolved by correcting detection of unsupported columns.

For SQL Server, new data sources do not have the advance option "Verify server certificate" enabled by default.
Resolved by enabling "Verify server certificate" by default.

Reflection refreshes on acid tables with only delta files return empty results.
Resolved so that queries that are re-run with new deltas before a metadata refresh return consistent results.

sys.queries always returning empty results.
Resolved by removing the sys.queries from the list of valid system tables.

On occasion, the log/archive/queries.json file may be overwritten with random contents.
Fixed archiving of tracker.json logs where tracker.log archiving no longer overrides query.json archived logs.

For Teradata sources, queries are making unnecessary calls to retrieve metadata.
Resolved by improving the metadata retrieval process.

For Teradata data sources, previews for VDSs/queries with UNIONs fail.
Resolved by correcting the Teradata SQL LIMIT with UNION functionality.

For ADLS data sources, timeouts may occur if caching is disabled.
Resolved by improving socket/thread usage when caching is disabled.

For ADLS data sources, "too many open files" errors may occur when reading Parquet files.
Resolved by improving socket/thread usage when caching is disabled.

For SQL queries, provided transitive join is enabled, filter push down occurs on both tables when the join key is computed.
Resolved by detecting the filter and projections are on equivalent expressions. To enable, planner.experimental.transitivejoin must be set to on. Default: off

For SQL queries, aggregate JOINs on reflections do not work with timestamp fields.
Resolved by improving pushdown rules.

Enqueued jobs do not show their queue.
Jobs UI now shows the queue name of enqueued jobs.

For SQL queries, column name conflict resolution does not occur at every level.
When joining columns through the UI, if columns in two separate tables had the same name but different casing (e.g. DEPARTMENTID and department_id), the columns were _not automatically renamed despite their names being equivalent.

Resolved where JOINs through the UI detect and automatically resolve case-insensitive column name conflicts. The original names are preserved with an "_X" (where X is an integer) appended to the name. For example, when UI joining tables with columns DEPARTMENT_ID and department_id, department_id will become department_id_0. If there were more department_id columns,

4.0.1 Release Notes

Dremio Community 4.0 Docker Image Issue

We discovered an issue with the Docker image for Community Edition version 4.0 that is used in Kubernetes, Azure AKS and AWS EKS deployments.

Community Edition 4.0 Docker images accidentally incorporated elements of the Enterprise Version, which can cause Dremio to not be able to issue queries and potentially corrupt the configuration. Dremio Community version 4.0.1 was just released to correct the issue.

Users deploying via YARN, AWS CloudFormation, Azure ARM or Linux RPM and tar installations were not affected.

Community users who upgraded from 3.x to 4.0 using Docker, Kubernetes, AKS or EKS are recommended to:

  1. Upgrade to Dremio Community version 4.0.1
  2. Restore the configuration from a backup created prior to upgrading to 4.0.

[info] Note

It is highly recommended to restore from backup after upgrading to 4.0.1. If no backup is available the upgrade will work provided no new sources were added.

Community users who created a new system with Dremio Community 4.0 between Sept 12 and Sept 18 using Docker, Kubernetes, AKS or EKS are recommended to:

  1. Delete the new system created with Dremio Community 4.0
  2. Re-install a new system using Dremio Community 4.0.1

[info] Note

New systems created with Dremio Community 4.0 using Docker, Kubernetes, AKS or EKS will continue to function, but will not be able to upgrade to 4.0.1 or later versions.

We apologize for the inconvenience and appreciate your continued support.

4.0.2 Release Notes

Enhancements in 4.0.2

Cloud Cache for HDFS

Dremio now provides cloud columnar caching for HDFS. See HDFS for more information.

See Cloud Cache and Configuring Cloud Cache for more information about cloud caching.

Async Reading for HDFS

The HDFS data source now support asynchronous reading.

Admin repair-acls Command

Dremio added an Admin command, repair-acls, that is used to help repair ACLs. This command performs a dry run and prints entities that are missing ACLs. See Repair ACLs for more information.

Downloading Result Sets

Downloaded jobs run much faster as they no longer rerun the original query, but essentially download the results from the distributed storage directly. That is, what is configured for the distributed storage. See Configuring Distributed Storage for more information.

This enhancement affects only non-default configuration (that is not PDFS). Dremio allows you to download result sets in one of the following formats:

  • JSON
  • CSV
  • Parquet

See Data Curation for more information.

Job Results Systems Table

A new systems table, sys.job_results, has been added that allows you to query the job results using the sys.job_result.<job id> path. See Job Details for more information.

Partitions Information on Physical Datasets

Dremio now shows partitions information on columns for a physical dataset. See Dataset Concepts for more information.

RDBMS

  • Dremio now allows the RDBMS connector to pushdown date/string comparisons.
  • Dremio now supports partial acceleration for queries against relational sources.

Relational Planning

Dremio introduces a new planning mode, called Relational Planning, which enhances JDBC pushdown phase in the Dremio query planner. Previously, queries against relational sources could only be accelerated when every dataset is covered by a reflection. With this enhancement, partial substitution on relational sources is now available. In addition to this, queries against relational sources are optimized in the Dremio query planner before it is pushed down.

Fixed Issues in 4.0.2

On RDBMS environments, queries are sometimes not considering reflection.
Resolved by reporting errors when reflections are not being considered.

For Hive, when querying ORC files, sometimes ORC runs out of heap space.
Resolved by improving usage of Hadoop buffer.

On Oracle, the DATE TO string comparison EQUALS TO filter sometimes fail.
Resolved by improving some varchar and datetime comparisons for ARP.

For MongoDB, refreshing metadata on collections causes corresponding raw reflection to NOT be chosen for substitution.
Resolved the issue by relaxing some internal conditions.

A command click for opening new tabs for the View Details link in the job doesn't open a new tab.
Resolved so that command-click and right-click work as expected for normal links.

For Teradata, accessing a dataset corresponding to a view would cause an exception.
Resolved by falling back to catalog metadata when normal metadata is not available.

A query failed with PLAN ERROR: Unable to convert the value of null and type VARBINARY.
Resolved by improving the handling of null varbinary literals in RexToExpr.

For MongoDB, Dremio can throw the following error when scales of the values change: Failure while attempting to read metadata for [TABLE NAME].
Resolved by automatically adjusting decimal scale during schema learning in the MongoDB source connector.

4.0.3 Release Notes

Fixed Issues in 4.0.3

Intermitent access errors on Hive 2 sources with 'hadoop.yarn.security.DockerCredentialTokenIdentifier not a subtype' error.
Resolved by fixing Hadoop access errors.

4.0.4 Release Notes

Enhancements in 4.0.4

Nested Loop Join

Dremio now pushes filters into the Nested Loop Join operator.

Fixed Issues in 4.0.4

Dremio failed to start when setting com.dremio logging to TRACE
Resolved by supporting TRACE level logging

Copy and paste results truncated numeric data beyond 16 digits of percision
Resolved by supporting full numeric precision in copy and paste results

The export-profiles command in the dremio-admin tool failed to run
Resolved by fixing export-profiles command in dremio-admin

The Dremio Hub ARP Connector did not ingesting metadata when table types are fixed width in older JDBC drivers
Resolved by supporting fixed Width Table Types to support older JDBC drivers

Vulnerabilities reported against Jetty 9.4.15 security scan: CVE-2019-10241, CVE-2019-10247
Resolved by fixing security issues to pass the Jetty 9.4.15 security scan

4.0.5 Release Notes

Enhancements in 4.0.5

Session Expiration System Option

A new system option token.expiration.min was introduced to configure the timeout period for session tokens, which by default is 30 hours. After changing the system option Dremio needs to be restarted.

Fixed Issues in 4.0.5

Tables in Teradata with columns Titles failed to run
Added support for columns with titles within Teradata by ensuring the column name is used instead

Reflections in some cases failed to match
Fixed an issue where in rare cases a valid reflection might not be considered

AWS Redshift identified an issue in Redshift JDBC drivers that led to unexpected server restarts
Resolved by upgrading the Redshift JDBC driver in Dremio to the latest version: 1.2.36.1060

A change in KMS delegation tokens caused executors to not start with Kerberos and Ranger KMS enabled
Resolved by supporting the delegation token described in https://issues.apache.org/jira/browse/HADOOP-14445

S3 sources using the AWS Security Token Service failed after upgrading to 4.0
Resolved by fixing an issue with the AWS Security Token Service

ADLS and S3 sources failed with READ_DATA or TimeoutException error when async is enabled
Fixed asyncronous read issues when using ADLS and S3

The Rest API to upload files could be used to expose files on the local filesystem
Resolved by validating the uploaded file's location

Dremio version inforation was provided on login page
Resolved by removing version information from login page

VDSs without proper ACLs provided access to all users
Resolved by ensuring that if no ACLs for a dataset are available the default is no access for users

Long metadata operatitions such as deleting a source can cause queries to fail or the UI to become unresponsive
Resolved by improving contention during long metadata operations.

4.1 Release Notes

What's New

Multiple AWS Clusters

In AWS deployments, Dremio supports the ability to provision multiple separate execution clusters from a single Dremio coordinator node, dynamically schedule execution clusters to run idependently at different times and automatically start and stop based on workload requirements at runtime. See Multiple AWS Clusters for more details.

4.1.1 Release Notes

Enhancements in 4.1.1

Mathematical Functions

Added the option to provide a seed value to the RANDOM() function, enables users to generate consistent results within queries that use RANDOM() for testing and verification purposes. See Mathematical Functions for additional details.

Fixed Issues in 4.1.1

Running the ALTER PDS command with different case for the table name removed the table's metadata
Resolved by supporting mixed case in table names in the ALTER PDS command.

Selecting all columns in a JDBC query but in a different order caused the query to fail
If all columns for a table were selected in a JDBC query the request was processed as SELECT * even if the columns were in a different order, which failed schema consistency checks. Resolved by explicitly using column names when provided.

C3 not enabled by default for Reflections stored on Cloud Storage
Refection Distribued Storage enables C3 by default when using Cloud Storage.

Substitution with many reflections on leaf nodes may take long time to complete in planning
Resolved by enhancing reflection substitution logic performance by reducing unneeded candidates.

Queries with reflections can timeout during query planning
Resolved by improving the implication check used by filters during acceleration planning

Aggregate reflection substitution gives wrong results when querying with DATE_ADD
Resolved by fixing reflection substitution logic with the DATE_ADD function.

REVERSE function on Hive tables with ORC files fails with IndexOutOfBoundsException
Resolved by fixing REVERSE function processing in Hive tables

Operations to view reflection status, either through sys.reflections or the Admin UI can hang
Resolved by always retreiving reflection status from the metadata store

Improved memory default settings for servers with larger memory capacity
Default memory setttings for coordinator and exector nodes were updated if not specified, refer to Configuring Memory for the updated default settings.

CLOB/BLOB Data Types in Oracle not supported
Added support for the CLOB and BLOB Data Types in Oracle and support for unknown types in other relational sources.

Incorrect Partitions selected when filtering based on Decimal values
Resolved by fixing an issue with partion selection with Decimal values which could result in missing valid partitions.

Query fails with java.lang.IndexOutOfBoundsException
Resolved by improving spilling in the HashAgg Operator

4.1.2 Release Notes

Fixed Issues in 4.1.2

Attempts to read data from Azure Data Lake Storage fail with "DATA_READ ERROR: java.util.concurrent.TimeoutException"
Resolved by correcting issue with Azure SDK and asynchronous reads

No error shown if invalid credentials are provided for an Azure Storage source with a storageV2 type
Resolved by ensuring credential errors are reported correctly with storageV2 types

Executor nodes fail to start in YARN environments with "ERROR - java.lang.RuntimeException: Unable to initialize Initializer LdapBootstrapAdminUserInitializer"
Resolved by correcting startup exception error related to LDAP and YARN

4.1.3 Release Notes

Fixed Issues in 4.1.3

Unable to connect to Azure sources when using the Azure Active Directory authentication method
Resolved by correcting an authentication issue in date format handling

4.1.4 Release Notes

Enhancements in 4.1.4

Backup is now considerably faster
Backups are now written in binary mode versus in JSON which is considerly faster plus backups are now multithreaded to speed up the overall process. Additionally a new option was added to include profiles in the backup. See Backups for details.

Fixed Issues in 4.1.4

In clause optimization not happening for NOT IN queries
Resolved by preserving the IN/NOT IN form during the optimization phase rather than being converted to an OR

Upgraded the Jackson library version to 2.10.2

Memory leak in spoolingBatchBuffer
Resolved by fixing the leak in an internal data structure

Unexpected error occurred when doing filter in UI when preview result set has zero rows
Resolved by not displaying the chart if the table doesn’t have any rows

Reflection matching failing due to java.lang.ArrayIndexOutOfBoundsException
Resolved by fixing a bug in an internal data structure

Unable to use WANdisco with Hive and Dremio 4.0
Resolved by adding the ability to add custom dependencies to the Hive plugin bundles. Dremio loads Hive related classes in separate classloaders in Dremio 4.0 and later, so dependencies for Hive now need to be placed in the following locations:

  • CE - Hive 2 - dremio\plugins\connectors\hive2.d
  • CE - Hive 3 - dremio\plugins\connectors\hive3.d
  • EE - Hive 2 - dremio\plugins\connectors\hive2-ee.d
  • EE - Hive 3 - dremio\plugins\connectors\hive3-ee.d

Advanced reflection view on a dataset doesn't show the reflections in Safari
Resolved to ensure that the view works with the Safari browser

Unable to add Elasticsearch 6.7 on Dremio 4.1
Resolved by fixing adding some of the Elastic sources needed to support the latest version of Jersey

Tableau closes sometimes with "RuntimeAssert: We have a Disconnect exception, but DataSourceException::Name was not set!"
Resolved by adding upgrading to Avatica to 1.12 to support the Connection.isValid implementation

Unable to push down TRUNCATE on decimal columns
Resolved by supporting the push down of Truncate on decimal columns

For SQL Server, errors of expression type uniqueidentifier is invalid for COLLATE clause
Resolved by not adding a COLLATE clause in SQL queries for columns of type uniqueidentifier.

4.1.6 Release Notes

Known Issues in 4.1.6

Certificate errors when SSL/TLS is enabled with the Web UI
If certificates are used to enable SSL/TLS with the Web UI, please hold off upgrading to Dremio 4.1.6 until resolved

Fixed Issues in 4.1.6

Orphaned YARN containers cause stability issues
Improved the watchdog process that monitors YARN containers to ensure orphaned processes are removed. YARN will automatically start another container with Dremio.

REGEXP_LIKE() is not pushed down
Added pushdown support for REGEXP_LIKE with Oracle, PostgreSQL and Teradata

After several hours or days if nodes are left blacklisted queries begin to fail with an error: IllegalStateException: Failed to find receiver for sender (71)
Resolved by sending the current list of executors and excluding blacklisted nodes.

OrcRawRecordMerger uses excessive heap memory when there are too many delta files with an ORC ACID table
Caused by too many Hive ORC deltas, improved the error message to recommend running Hive compaction on the table to correct the issue.

Results vary depending on where LIKE operator is being used in WHERE clause
Resolved by preserving all non-simple conditions in SimpleFilterFinder

Locally attached dremio-admin backup does not function on OSX
Resolved by adding appropriate Security Context for dremio-admin backup in local-attach mode

BlackDuckScan identified CVEs related to Apache Thrift 0.9.3
Resolved by upgrading version of libthrift

Some queries take several seconds to plan and subseconds to execute
Resolved by improving the performance of Pre-Logical Filter Pushdown

TRUNC on Redshift is not pushed down
Added pushdown support for TRUNC with Redshift

Decimal values with out of range precision or scale are not shown in Dremio
Resolved by truncating values if they are beyond the supported range and displaying within Dremio.

S3 Gov Cloud connection not working
Added support for S3 STS authentication without compat mode

Unable to run CTAS on S3 with the error Creating buckets is not supported
Resolved by including the configured basePath in the path resolution

Removed in 4.1.6

  • MapR (Community Edition)

4.1.7 Release Notes

Fixed Issues in 4.1.7

Certificate errors when SSL/TLS is enabled with the Web UI
Resolved keystore issues with multiple certificates or wildcard characters

High planning time for queries caused by too many partition values
Resolved by fixing long partition pruning times with multiple large partition chunks

4.1.8 Release Notes

Fixed Issues in 4.1.8

RDBMS Queries with multiple joined tables with duplicate column names can result in ambiguous column references
Resolved by ensuring distinct columns within SQL of pushdown queries.

Some file errors don't expose the path associated with the error
Add the file path for these errors to the query profile.

Attempting to query Parquet files with conflicting complex nested types would sometimes fail
Resolved by skipping better handling change in schema

Failures sometimes occurred when working with certain correlated queries on Postgres
Changed conditions to avoid failure

Difficulty determining state of system when are queries are cancelled at resource exhaustion
Additional context information is now shown when such failures occur.

Errors when trying to filter or flatten nested data that included decimals in nested fields
Addressed by handling nested decimal fields within more processing operations.

Slow metadata operations when connecting RDBMS systems that fail to declare scale and/or precision for Decimal columns
Updated handling to coerce queries and data results to common scale/precisions according to Dremio decimal rules.

Memory leak associated with adding/removing Azure sources large numbers of times
Resolved by ensuring all resources are released when Azure sources are deleted.

Hive external table pointing to ADLS does not work on 4.0.x
Resolved by fixing classloader when accessing HIVE ADLS table


results matching ""

    No results matching ""