3.2 Release Notes

What's New

Azure Storage Data Sources

Dremio now supports Azure Storage as a data source. Azure Storage provides foundational support for Azure Data Lake Storage Gen2. See Azure Storage and the Azure Storage section Distributed Storage for more information.

Cloud Orchestration Templates

The following deployment templates are provided with Dremio's 3.2 release:

  • Azure Template
    Dremio now provides a new deployment template for Azure. See Azure Template for more information.

  • Azure AKS Deployment
    Dremio now provides the deployment information for Dremio on Azure Kubernetes Service (AKS). See Azure AKS for more information.

  • Amazon EKS Deployment
    Dremio now provides the deployment information for Dremio on Amazon Elastic Container Service for Kubernetes (EKS). See Amazon EKS for more information.

Predictive Pipelining

Implemented a predictive pipelining technology that leverages understanding of columnar file formats and analytic workload patterns to coalesce nearby columns and avoid small reads to improve file IO and system resource utilization. Results in minimal wait times and higher throughout, reduced IO wait times by up to 80% and improves query response times by 2-5x. Implemented for ADLS and AWS S3.

Enhancements

Hive Complex Types

Dremio now supports reading complex data types (LIST, STRUCT, MAP, and UNION) from non-transactional and compacted transactional Hive ORC files. See Hive Datatypes for more information.

Hive Metadata Optimization

Optimizations were implemented for fetching, storing, and processing Hive metadata to support Hive tables with a significantly larger number of partitions and splits.

Incremental Refresh

Dremio now supports incremental refresh for datasets with columns fields of
BigInt, Int, Timestamp, Date, Varchar, Float, Double, and Decimal data types. See Refreshing Data Reflections for more information.

Increased Query Concurrency

Improvements in job submission and query planning were implemented that significantly increase concurrency, reduce planning time, and improve resiliency of the coordinator.

LDAP Group Lists

LDAP functionality is implemented that establishes user-group relationships for group entries that lists users that belong to that group. See Using LDAP for more information.

Health Checks for YARN Deployments

For YARN deployments, a health check is implemented to watch Dremio processes and kill executor processes that do not shutdown cleanly.

Count Operations

Support was added for multi-column count operations across RDBMS sources. See Aggregate Functions for more information.

Username Impersonation

For HDFS, Hive, and MapR-FS data sources, Dremio allows you to modify the case of the username (lowercase, uppercase, or As Is). See Dremio Advanced Options section in the HDFS, Hive, or MapR-FS data sources for more information.

Support Higher Cardinality Aggregation Operations over Variable Length Fields

Memory utilization for aggregation operations on variable length fields was optimized to be able to support a higher cardinality.

Partition Pruning

Dremio now supports some partition pruning when reading a subset of rows using the LIMIT operator. This feature is dependent on whether the data source supports partition pruning.

Functionality Changes

Azure Data Lake Storage Gen1 Data Sources

As of Dremio 3.2, the Azure Data Lake Storage data source name has changed to Azure Data Lake Storage Gen1 and the data source's Advanced Options now provides an asynchronous access option (default).

In addition, the distributed storage properties in (dremio.conf and core-site.xml) have changed. See the ADLS Gen1 section in Distributed Storage for more information.

Amazon S3 Data Sources

As of Dremio 3.2, the distributed storage properties in (dremio.conf and core-site.xml) have changed. See the Amazon S3 section in Distributed Storage for more information.

In addition, the data source's Advanced Options now provides an asynchronous access option (default).

Upgrading

Upgrading from Dremio 1.x

Upgrading from Dremio 1.x to 3.2 is not supported.

To upgrade to Dremio 3.2:

  1. Upgrade from Dremio 1.x to 3.1 first.
  2. Upgrade from Dremio 3.1 to 3.2.

Deprecated/Not Supported

The following are either deprecated or not supported.

MapR

As of Dremio 3.2, Dremio's MapR distribution does not support the following:

  • MapR version 5.x
  • Amazon S3 data sources
  • ADLS data sources

MongoDB

As of Dremio 3.2, complex push down operations are not supported against MongoDB. The MongoDB connector supports pushdowns of projections and filters, however, aggregations, flattening, limits, and sorts are not supported.

Fixed Issues

Executor

Timestamp columns with milliseconds are not recognizing by Dremio.
Resolved by fixing milliseconds in the ORC copier.

Unable to expand the buffer when querying Parquet files.
Resolved by fixing a buffer allocation issue in Apache Arrow.

Hive ORC non-vectorized reader shows incorrect statistics for Hive ORC tables.
Fixed Hive readers to show setup time, processing time, and wait time statistics correctly.

For Hive ORC, the "Input bytes" fields always show as 0 in Jobs page after reading Hive ORC tables.
Fixed the Hive readers so that Input Bytes statistics gets the correct value.

Heap allocations during planning and plan propagation are excessive.
Resolved by implementing indices on executor endpoints to optimize execution plan generation.

Statistics reported for a query that spills to disk during aggregation many overflow.
Resolved by using 64-bit values to report statistics for aggregation operators in the query.

FragmentExecutor- Failure while handling OOB message occurs during concurrency.
Resolved by ignoring the out-of-band (OOB) message when the pipeline is not set up.

Already added Hive tables need manual force metadata refresh to show/hide supported/unsupported columns.
Resolved by updating table schema correctly from automatic periodic metadata refresh.

Parquet readers do not handle partial stats correctly.
Fixed parquet readers.

Unable to read some boolean columns in Parquet files.
Resolved by fixing the Parquet file reader to access and read page headers based on the header type.

UNION data type is not supported for Hive.
Resolved by updating the Hive ORC vectorized reader to support UNION data type. See Hive Data Types for more information.

The MAP datatype is not supported for Hive.
Resolved by updating the Hive ORC vectorized reader to support the MAP datatype. See Hive Data Types for more information.

Gandiva operator metrics not updated after an expression split.
Fixed operator metrics for projects and filters.

The initcap() function incorrectly lowercases characters.
Fixed the lower case issue.

Fragment message processing and spilling could be improved.
Resolved by improving out of band message handling for external sort operators.

The CONVERT_FROM query with JSON string does not handle null values in arrays.
Resolved by improving how the config parameter is passed to JsonReader.

Dremio fails to read Parquet files that have zero rows.
Fixed issue associated with the ability to read zero-row Parquet files.

In some cases, queries do not re-attempt after running out of memory.
Resolved by improving logic for triggering query re-attempts.

Results were inconsistent when reading Parquet decimal partition columns.
Resolved by padding the size of decimal values to match expected length.

A high cardinality GROUP BY with a min/max query on a varchar columns, might cause executors to fail.
Resolved by moving min/max accumulators for variable length types off heap.

For Hive ORC, structs and lists are not supported.
Updated Hive ORC vectorized reader to support structs and lists.

For the very high number of concurrent queries on, per core CPU utilization may be uneven.
Resolved by optinally pinning execution threads to cores.

Parquet readers fail when the input file does not contain rowgroups.
Resolved by improving reads for Parquet files that have zero rows.

Queries take too long to start.
If there are a large number splits or nodes, the time it takes for messages to be serialized and sent to each node to start a query could be excessive. Resolved by normalizing the messages sent to start a query.

Metadata and Administration

Certain metadata patterns involving large objects can reduce system performance
Automatically store larger dataset definitions directly in the filesystem, bypassing the RocksDB key value store.

Schedules for cleanup jobs are not configurable.
Resolved by providing user configuration to enable a schedule for job cleanup tasks.

An excessive number of active queries can cause system crashes.
Resolved by limiting the total number of live queries on a single coordinator.

Query Planning

The planning terminator causes the planning timeout to reset constantly.
Resolved by sharing the planning terminator across instances within same query.

Queries with join and aggregation may take long to plan when accelerated.
Fixed by enhancing substitution planning.

Queries that end up referring to a huge numbers of splits hold a large number of system resources, causing resource contention at the coordinator node.
Resolved by failing queries that select too many splits. Users can work around this limit by placing a more selective filter in the query, such that the query selects fewer partitions, and thus fewer splits.

Failed to get execution plan for empty parquet file while dragging and uploading the file to Dremio.
Resolved by fixing scan of empty parquet files.

For star schema, queries are not getting substituted as expected when using reflection on VDS with two sources.
Resolved by fixing a issue with matching and substitution in the query planner for accelerating the query.

A Null pointer exception occurs when attempting to get an underlying expression of an alias.
Resolved by fixing an issue in the query planner that avoids the Null pointer exception.

Substitution errors display in the Log and Profile UI sections for queries that were accelerated successfully.
Resolved by fixing an issue in the query planner that incorrectly reported errors during query acceleration.

Long physical planning times occur for deeply nested JOINs.
Resolved by optimizations in the query planner to reduce the time taken to plan for complex VDS hierarchies.

Substitution is broken with incremental reflection-based VDS.
Resolved substitution issue.

Flatten query not matching incremental reflection.
Resolved substitution issue.

The JOIN row count and cost could be improved.
Resolved by taking into account the JOIN condition for scoring and row count.

A Tableau-generated query fails with an unsupported operation error.
Resolved by fixing issue in join planning and optimizations.

Queries that use the date_trunc function are not successfully accelerated.
Provided handling for the date_trunc function in the query planner during matching and substitution.

In some cases, columns are incorrectly mapped during join and project optimization which can cause queries to fail during planning.
Resolving by fixing a bug in DremioAggregateJoinTranspose query optimizer rule to handle column mapping correctly.

In some cases incremental reflections on VDS fails when the queries have aggregate functions.
Resolved by fixing an issue during query planning that makes column naming consistent for aggregation functions in the query.

The VALUES_READER on Preview during a FLATTEN operation causes incorrect results.
Resolved by fixing a parallelization error.

TPCH query7 with reflections get cancelled because planning time exceeded 60 seconds.
Resolved by improving logical planning for the Planner Phase.

Sources

A buffer overflow issue might occur and fail the query when an underlying CSV file is badly formatted.
Resolved by improved handling of badly formatted source CSV files.

For MySQL, the "Enable Legacy Dialect" checkbox is in the wrong place.
Resolved by fixing the location of the checkbox in the UI.

In Teradata, large constants not pushed down correctly.
Resolved by handling push down of numeric type constants correctly.

For Teradata, SQL LIMIT/OFFSET queries are not supported.
Resolved by adding support for LIMIT/OFFSET queries.

JDBC drivers should be updated to versions that use JDBC API v4.2.
Resolved by reviewing and updated the JDBC drivers packaged with Dremio to versions that support the JDBC API v4.2.

For RDBMS sources, query failures occur on multi-column COUNT and COUNT(DISTINCT...
Resolved by adding across multiple column COUNT query capability. See Aggregate Functions for more information.

For MySQL, IS DISTINCT/IS NOT DISTINCT queries are not supported.
Added support for IS DISTINCT/IS NOT DISTINCT operators on MySQL.

For PowerBI, Dremio does not support TLS.
Added TLS support for PowerBI.

For some RDBMS queries, extraneous sort and union exchange operators appear in the explain plan.
Fixed an issue in the query planner that avoids the addition of the extraneous Sort nodes in the query plan for queries against relational sources.

Connections to MS SQL Server do not support TLS.
Resolved by adding support for connections from Dremio to MS SQL Server over TLS.

For RDBMS sources, queries for row count can be slow for large tables.
Resolved by using faster queries to get row count.

Experience

Previews are slow and keep users from using the Dremio UI while preview is in-progress.
The preview, save, and transform operations on datasets are now non-blocking. The user can proceed with further dataset edits while these operations are in-progress.

Dremio's server log files can become large in size.
Resolved by adding both a size and time based file rotating policy to the log configuration.

When NFS is used as distributed storage, there is a delayed sychronization between executor and coordinator nodes which causes results not displaying in the Dremio UI.
Resolved by forcing NFS cache synchronization when the distributed storage is NFS.

Promoting a folder with an existing catalog id gives 400 error and the folder cannot be re-promoted.
Resolved by improving promotion handling.

Dremio's admin command does not log its message to a file. It only displays them on the screen.
Resolved by logging dremio-admin commands message to a log file.

Impersonation does not work with Ranger based authentication.
Resolved by enforcing the override on the DOAS property for Ranger based authentication.

Previews are slow and keep users from using the Dremio UI while preview is in-progress.
The preview, save, and transform operations on datasets are now non-blocking. The user can proceed with further dataset edits while these operations are in-progress.

Search results in the Dremio UI are not ordered correctly (by relevance).
Fixed so that the search results are sorted by relevance.

The Dremio UI Sources section disappears when refreshing page in Firefox.
Fixed issue with calculating space required to display the sources section in the Firefox browser.

Preview and Save/Save buttons should not have to wait for data to load.
Resolved by retrieving and displaying result sets asynchronously.

International users need an alternative date format.
Resolved by providing a date format option through the UI.

In YARN deployments, executor processes in containers sometimes do not exit cleanly and remain active.
Resolved by implementing a watchdog to watch Dremio processes and a HTTP health check to kill executor processes that do not shutdown cleanly.

The UI validation for reflection refresh settings for "never refresh" is incorrect.
Fixed UI validation to check (compare) values only when both checkboxes are unchecked.

The UI New Query page layout could be improved.
Resolved by improving the page layout for the New Query page by open the Datasets tab by default, displaying a placeholder in table area, making the editor cover more lines if user has not resized the editor and split editor and datasets areas evenly.

Dremio UI doesn't allow a user to cancel a job.
Resolved by allowing a user to click cancel on a job when it is in enqueued job.

The Dremio Wiki sidebar needs to be resizable.
Made the Wiki sidebar resizable.

Data fetching is broken for historical versions, if a dataset was saved as another dataset.
Data fetching for historical versions is fixed.

Cleanup of Job Results does not also cleanup job profiles.
Improved cleanup talks for Profiles.

Log message in Dremio's server log should use UTC instead of java server time.
Changed log messages to be time stamped using UTC time instead of Java server time.

Keyboard shortcuts for Run and Preview are not available.
Added shortcuts to the run/preview dropdown labels.

Admin User search should return a user if there is a substring match.
Fixed substring match for the administrator username using the search criteria.

Cannot change the context in the SQL Editor.
Resolved by improving the Resource tree to expand sources.

Recommended joins include datasets that are no longer connected or are removed.
Resolved by changing Join recommendations to exclude datasets that no longer present.

Messages warning about the existence of dependent datasets when changing a dataset are not displayed consistently.
Resolved by displaying a "dependent datasets" warning when renaming/moving/deleting a dataset.

Dataset table in the UI shows incorrect data for very big numbers.
Resolved by encoding numbers as string in the response to the display in the UI to workaround numeric value limitations in javascript.

When there are a large number of columns, its hard to find a particular column or group of columns.
Resolved by adding field filtering for columns in the Dremio dataset UI.

3.2.1 Release Notes

Fixed Issues in 3.2.1

Cannot read Dremio CTAS-generated Parquet files.
Fixed by updating the Python library for Apache Arrow. With this bug fix, all the Parquet files generated by Dremio 3.2 are readable by PyArrow. Files generated by older versions of Dremio still cannot be read by PyArrow.

3.2.2 Release Notes

Enhancements for 3.2.2

Impersonation for Query Users (EE only)

With Dremio 3.2.2, certain data sources allow impersonation to use the query user in addition to the VDS-delegated user as the impersonated username. This feature is applicable for HDFS, Hive, or MapR-FS data sources. See HDFS, Hive, and MapR-FS for more information.

S3 Compatible Endpoints

As of Dremio 3.2.2, Amazon S3-compatible products such as Minio are supported. See Amazon S3 for more information

Fixed Issues in 3.2.2

Creating, querying and dropping tables in scratch space, backed by PDFS, leaks directory handles
Resolved by closing directory handle after listing, in PDFS.

Datasets with more than 800 leaf fields cause errors.
Resolved by adhering to max leaf count in accelerator connector.

Periodically, the Dremio UI display a blank screen.
Resolved by improving validation implementation.

Under certain circumstances, browsers return an unexpected error associated with an undefined query context.
Resolved by improving UI implementation associated with an empty query context.

For Azure using the Azure Template, the medium sized cluster fails to deploy.
Resolved by fixing space issues and updating layout and labels.

In the RPM package, the Dremio Admin commands do not display error messages that indicate a false success.
Resolved by adding the logback-admin.xml file to the RPM package.

Upgrading to Dremio 3.2 on the MapR package breaks the S3 source and prevents it from being removed.
Resolved by allowing safe deletion and refresh for missing plugins.

For Amazon S3, excessive log messages are produced by ORC libraries when reading ORC files.
Resolved by changing the usage of ORC split property values for non-HDFS file systems. Hive tables or partitions which do not use HDFS to store data will have the Hadoop/Hive configuration property hive.orc.splits.include.fileid set to false for Hadoop library calls. This change is required because ORC split file IDs are only available in HDFS.

For Hive, a class not found exception occurs with the Hive parquet DeprecatedParquetInputFormat input format.
Resolved by adding support to Hive source for parquet dataset using parquet.hive.DeprecatedParquetInputFormat serialization format, commonly found in Cloudera Hadoop distribution.

3.2.3 Release Notes

What's New in 3.2.3

Cloud Orchestration Templates

The Amazon AWS documentation has been updated with new and streamlined information on setting up and deploying Dremio on Amazon AWS. See Amazon AWS Template.

Fixed Issues in 3.2.3

A double scrollbar appears on the Jobs page when the screen size is small.
Resolved by re-factoring the Jobs page layout.

Remove Format menu does not work in the community edition.
Resolved by fixing the Remove Format menu for the OSS version.

In MapR-FS, VDS-based Access Delegation alignment in the UI is not displaying correctly.
Resolved by correcting the VDS access delegation display.

Need an option to enable/disable Parquet date auto corrections.
Resolved by introducing a new configuration parameter, store.parquet.auto.correct.dates, to enable/disable auto correction of dates on parquet files. By default, it is enabled.

For S3 and HDFS data sources, the ColumnCountException field doesn't display the dataset name for files.
Resolved by removing the dataset name from the ColumnCountException message since the dataset name for S3 and HDFS isn't available.

3.2.4 Release Notes

Functionality Changes in 3.2.4

As of Dremio 3.2.4, Hive MAP datatype functionality has been removed.

Fixed Issues in 3.2.4

Planner can generate incorrect BroadcastExchange when join type is right.
Resolved by enhancing the condition check for BroadcastExchange.

On upgrading to 3.2, reflections on datasets with more than 800 leaf fields will fail to materialize with a NullPointerException.
Dremio supports datasets with at most 800 leaf fields. Prior to upgrade to 3.2, if there were reflections created on datasets with more than the limit, on upgrading, materializing such reflections will fail with "extended metadata (read definition) is not available" message. Administrators are recommended to drop all such reflections.

3.2.5 Release Notes (EE only)

Fixed Issues in 3.2.5

Hive queries on ORC files are failing with an out-of-bounds exception.
Resolved by handing the case when an external ORC table has more columns than the corresponding original ORC file.

Unable to execute nested queries on Parquet files having complex types using async readers.
Resolved by fixing parquet async reader's logic to get projection involving subset of fields in complex types during a nested query execution.

For column-based incremental reflections, a SQL query with the equal (=) filter produces incorrect results.
Resolved the handling of column-based incremental updates.

When clicking on the Spaces link (in the left navigation) on the Home page to view Spaces and Source lists, the displayed values in the "Created" column are the current date.
Resolved by providing the created date for source list and placeholder for spaces list.

3.2.6 Release Notes (EE only)

Fixed Issues in 3.2.6

Gandiva-based executions may need optimization.
Resolved by enabling OR-IN optimization in Gandiva.

For Hive, false results are occasionally cached when user permissions for a source dataset are checked.
Resolved by modifying the caching behavior. When user access is checked, the results are cached if the user has access to the source dataset. If the user does not have access, then the results are not cached; on the next request, user permissions for the source dataset is check again.

3.2.7 Release Notes (EE only)

What's New in 3.2.7

Azure Storage on Azure Government

With Dremio 3.2.7, Azure Storage data source on the Azure Government cloud platform is supported. See Azure Storage for details.

Functionality Changes in 3.2.7

For MongoDB, Dremio now exposes ISO_DATE fields as timestamp (rather than as date).

Fixed Issues in 3.2.7

For MongoDB, timestamp filters with strings are not pushed down.
Resolved by coercing strings to timestamp when pushing down to MongoDB.

Aggregate queries on text file with new lines within a quoted field behave incorrectly.
Resolved by correcting count queries with .csv files.

3.2.8 Release Notes (EE only)

Fixed Issues in 3.2.8

The REFRESH METADATA SQL query does not work with Azure Storage.
Resolved by fixing the Azure Storage plug-in PDS METADATA REFRESH trigger.

Field trimmer doesn't update collation field correctly in some cases.
Resolved by updating the Calcite field trimmer.

Dremio Wiki has an XSS security bug.
Resolved the XSS security issue by upgrading some internal modules.

Dremio Wiki has an XSS security bug.
Resolved the XSS security issue by upgrading some internal modules.

3.2.9 Release Notes (EE only)

Fixed Issues in 3.2.9

For Teradata sources, queries are making unnecessary calls to retrieve metadata.
Resolved by improving the metadata retrieval process.

For Teradata data sources, previews for VDSs/queries with UNIONs fail.
Resolved by correcting the Teradata SQL LIMIT with UNION functionality.


results matching ""

    No results matching ""