3.1 Release Notes

What's New

Workload Management (EE only)

[info] Enterprise Edition only

The Workload Management feature improves workload management via user-defined job queues. These queues that are associated with different resource constraints and flexible assignment rules for assigning user jobs into these queues.

Workload Management is displayed in the Dremio UI on the Admin console and the Queues and Rules sections allow you to manage your queues and rules. See Workload Management for more information.

Hash Aggregation Spilling

Hash aggregation spilling is now available. This feature supports the memory-intensive hash aggregation queries that process large datasets. See Hash Aggregation Spilling for more information.

[info] Hash aggregation is disabled by default.

To request access to this feature, please send email to preview@dremio.com.

Enhancements

Dataset Preview

Previewing a dataset now generates local SQL table references instead of global references, improving performance on slow metadata systems.

Gandiva

The Gandiva feature is now supported on Mac OS X version 10.11 and higher.

Enhanced Connector Framework

With this release, Dremio is adding improved relational connectors for Oracle, Redshift and MySQL data sources. This framework provides enhanced performance and extensive push-down capabilities for each of theses relational sources.

Upgrading

External Reflections

When upgrading from Dremio version 3.0.x to 3.1.0, external reflections become out of sync. Workaround this issue by dropping and recreating your external reflections.

Fixed Issues

Executor

Attempting an aggregation whose measure is a minimum or a maximum of a VARCHAR or VARBINARY column causes Dremio executors to run out of heap memory.
These operations are supported only for low cardinality data sets.

When a Yarn-provisioned executor runs out of heap memory, the executor may continue in a diminished capacity which can result in it being left in a provisioning or decommissioning state for a long time.
Resolved so that if the executor runs out of heap memory, the process is killed. This allows Yarn to start a new process in its place.

ParquetDatasetAccessor.getBatchSchema() and ParquetFormatPlugin.PreviewReader are very inefficient and can result in unnecessary memory usage when reading files with many columns.
Improved efficiency of preview of Parquet files by reducing the unnecessary read parallelism.

When using Parquet, the ROUND operation may produce incorrect results.
Fixed by adjusting the ROUND method for floating point numerics.

JVM may crash if Dremio runs out of direct memory during RAW reflections.
Fixed by resolving exceptions and data flush implementation.

A memory leak may occur with a file is not found
Fixed the issue by releasing all the buffers.

Under tight memory conditions, fragments may fail while allocating children allocators.
Resolved by improving child allocation in the executor build.

For the Gandiva feature, the performance is sub-optimal.
Resolved by splitting IF expressions and boolean operators in Java and Gandiva.

SpoolingRawBatchBuffer opens a new input stream for each spilled batch it reads from disk. This can result in a large number of unused RA threads.
Resolved by reusing input streams and simplifying usage.

Core Services

When upgrading from Dremio version 3.0.x to 3.1.0, external reflections become out of sync.
Workaround this issue by dropping and recreating your external reflections.

Partial reindexing fails or takes a long time.
On startup after a crash, as part of recovery, Dremio reindexes data that was not committed to the internal index. In most cases, this partial reindexing failed or took unusually long due to a bug in computing how much data to reindex.

When restarting the master-coordinator node, the materialization cache no longer refreshes on other coordinators.
Even when a non-master coordinator fails to connect to the master it will still keep trying to refresh the materialization cache every 30s.

For Workload Management, the query is mislabeled as UI Run when it should be UI Download.
Resolved by implementing a UI_DOWNLOAD workload type.

For Workload management, need to display cancellation messages for queries.
Fixed by adding a new field in the user interface, under the Jobs tab, for cancellation notices.

For Workload management, there isn't a message in the job status when a query is cancelled.
Fixed by adding a cancellation message to job status and profile.

For S3, adding reflections with a bucket name but without a folder name, caused an error.
Resolved by supporting no folders in S3 buckets for reflection storage.

For a moved and renamed dataset, the version history may become out of sync.
Resolved by clearing the cache and forcing information reload after a dataset is renamed.

The error message on missing policies on a S3 bucket is too cryptic.
Resolved by changing the log from warning to error level to show a better error message on missing policies on a S3 bucket.

Excessive refresh and load materialization jobs occur with raw reflections.
Failure to deserialize materialization plans will no longer cause reflection to refresh indefinitely.

Query

Some phases of heuristic planning could loop forever. If a loop is created, a runaway query may occur.
Resolved by implementing a termination flag.

An experimental planning rule to transpose projections past filters was enabled by default, but has been reported to be causing excessive planning time.
The rule is now disabled by default.

A query was cancelled because planning time exceeded 60 seconds.
Fixed timing issue.

A planning error occurs when an accelerated query involves inequality joins.
Resolved by improving JOIN rule functionality.

Raw reflections for VDS do not match the query.
Resolved by implementing direct VDS matching capability.

VDS expansion sometimes provides inconsistent matching replacements which could result in incorrect results.
Resolved by improving VDS substitutions and replacement management.

A substitution error occurs with star schema reflection.
Fixed internal implementation of mapping .

Preview queries on datasets can become time consuming when a dataset is large.
Resolved by supporting a limit value for previews.

An experimental planning rule may cause excessive planning time.
The rule is now disabled by default.

The "create or replace vds" SQL command fails.
Fixed bug.

Plug-ins

The CTAS DROP operation on Amazon S3 is not optimum.
Resolved issue so that performance is improved.

SQL pushdown query sometime fails on Postgres.
Resolved by correcting SQL translations for IN with multiple columns.

In the RDBMS plug-in, the performance for retrieving the initial table metadata needs improvement.
Resolved by adding source-specific query to list tables and improving time for adding new sources and retrieving schemas.

The log message for an unknown type was presented as an error rather than a warning.
Fixed by correctly logging as a warning without the stack trace and with contextual information.

For Oracle, Dremio can not access Oracle synonyms.
Resolved by providing synonym support for Oracle. When the option is enabled, synonym tables are included and oracle connection is configured to return column information.

When viewing objects in an Oracle instance, synonyms are listed as queryable objects, however, Dremio is unable to get the columns for the object and fails.
Resolved by adding a switch that lets you hide synonyms and making synonyms properly queryable when they are not hidden.

The direct memory used by the non-vectorized Hive ORC reader was held even after its usage was done.
Resolved by using a scoped Dremio allocator in the zero copy path on the non-vectorized Hive ORC reader.

Running queries against Hive tables with a large number of partitions can cause hanging.
Resolved by implementing a change to lazily create Hive job configuration which optimizes and reduces the memory utilization when running queries against Hive tables with a larger number of partitions.

Per CVE-2017-5644, Apache POI is vulnerable to some denial of service.
Resolved by upgrading Apache POI to 3.17.

Tableau Desktop 2018.3.2 fails to open the Dremio-generated .tds file.
Because Tableau Desktop 2018.3.2 expects the "datasource" element to have a "version" attribute, resolved this issue by setting the version to an empty string.

A confusing error message occurs when issuing a CREATE TABLE AS SELECT query into a S3 source.
Fixed by implementing better error reporting when failing to create tables in S3.

Miscellaneous

Per CVE-2014-0107, Apache Xalan is vulnerable to code injection through specially crafted files.
Resolved by upgrading Apache Xalan from 2.7.1 to 2.7.2.

Dremio logs are being written to the wrong log file, journalctl.
Fixed so that the logs are written to the server.log file.

The SQL editor context does not work with folders that have periods in them (for example, S3 bucket names).
Resolved handling of folder names with periods.

During Preview, an execution error shows an empty preview rather than an error message.
Fixed by adding an error message.

Need to limit previews when using the REST API.
Resolved by allowing Preview APIs to return schema without preview data.

The UI screen is flickering in the Admin > Dataset area.
Fixed flickering scrollbars.

Automatic previews for datasets increases data load when querying large datasets.
Resolved by decoupling of dataset metadata load and data load. Data load now does not block a user from altering query.

In the user interface, the alignment is broken after the Wiki feature.
Fixed by modifying the UI layout.

Tag displays don't show context.
Resolved by showing tags that are associated with a dataset.

In the user interface, the Data Graph option isn't available for physical datasets.
Resolved so that the Data Graph option is available for physical and virtual datasets.

3.1.1 Release Notes

What's New in 3.1.1

Licensing

Dremio now implements a Licensing Agreement requirement when installing Dremio Enterprise Edition.

Authentication

In addition to supporting LDAP simple bind (user, password) authentication, Dremio now additionally supports the following LDAP modes of authentication:

  • Anonymous (no user, no password)
  • Unauthenticated Simple Bind (user, no password).

[info] Important
When authenticating to Dremio, empty passwords for users are not allowed.

Fixed Issues in 3.1.1

When creating reflections (or) creating tables from sources that have nested structures, values in nested fields not written correctly.
Resolved by improving the Parquet writer to handle UNIONS better.

When searching for file system based sources, a file that was promoted as dataset and then unpromoted may still appear in dataset search results.
Resolved by implementing a check to verify that only datasets are returned.

A query planning rule may lead to infinite planning issues.
Resolved by removing the rule.

A dependent reflection is not using the parent reflection's latest materialization.
Resolved so that the dependent reflection is refreshed to use the latest materialization.

Some phases of heuristic planning could loop forever.
Resolved by improving the JOIN rule.

With Oracle, duplicated column names are not renamed in the result set which can cause either a memory leak or null results.
Resolved by renaming the duplicated columns in the result set.

3.1.3 Release Notes

Enhancements in 3.1.3

HDFS and Hive Impersonation

For HDFS and Hive impersonation, an impersonated username can now be lowercase or uppercase. A new advanced option, Impersonation User Delegation, can be configured either when adding a new HDFS or Hive source or editing an existing source. Default: As is

See HDFS or Hive for more information.

Parquet File Reader

Dremio now supports offheap memory buffers for reading Parquet files from Azure Data Lake Store (ADLS).

Metadata Tables

Datasets are now limited to a maximum width of 800 columns. Datasets that have already exceed the limit will not be queryable after their metadata is refreshed. Please contact support@dremio.com for further details and questions.

Hash Aggregation Spilling Default

In Dremio 3.1.3, the Hash Aggregation Spilling feature is enabled by default. See Hash Aggregation Spilling for more information.

Deprecations in 3.1.3

IBM DB2

IBM DB2 as a connector has been deprecated in this release. Existing DB2 source connections will still continue to function in this release, but any new DB2 connections will not be possible. In the next maintenance release Dremio will be officially removing DB2 from the product causing any existing DB2 connections to fail. Dremio may choose to develop a newer DB2 connector in the future.

HBase

HBase as a connector has been deprecated in this release. Existing HBase source connections will still continue to function, but any new HBase connections will not be possible. Dremio will make a community version of the HBase connector available in the future which you will be able to download and configure to add new HBase source connections.

Fixed Issues in 3.1.3

Dremio rejects operations when a table has a large number of columns.
Resolved by improving the handling of metadata tables.

When source names are mixed-cased (not lower-cased), the orphan split cleaning method identifies datasets referring to the lower-case source name as orphans and removes all their splits.
Fixed the issue by normalizing the source name for comparison.

3.1.6 Release Notes

Fixed Issues in 3.1.6

Dremio does not start on system reboot due to a permissions issue.
Fixed the issue by changing the directory permissions that dremio user needs access to on reboot.

Dremio is unable to read Parquet v2 data files.
Resolved by improving the compression logic and data size calculations.

When using Dremio UI drag and drop, Oracle pushes down incorrect SQL when previewing a join.
Resolved by improving alias usage with JOINs on top of LIMIT/OFFSETs.

Logs are receiving too much trace information.
Resolved by grouping errors and logging once per metadata refresh, rather than once per error, while also moving trace information from INFO to DEBUG.

Dremio does not provide a warning for Hive deprecation.
Resolved by logging a Hive deprecation warning.

The hash aggregate operation uses a sub-optimal amount of heap memory.
Resolved by improving the usage of heap memory by the hash aggregate operation.

For Excel files, some data does not display correctly when previewing the file or when promoting the file to a dataset.
Resolved by improving inline string detection and supporting columns with all Null values.

For very large varchar values (greater than 32k), the vectorized hash aggregate operator may hang.
Resolved by improving hash aggregate operations with very large varchar values.

For MongoDB, filters are not pushed down when casting the string field to BIGINT.
Resolved by pushing down implicit numeric casts in filters.

Open file limit for the Dremio processes defaults to 1024. This limits the number of concurrent queries you can execute.
Please increase the maximum number of file descriptors to at least 65536. Depending on the size of the sort, Dremio may open a large number of file descriptors.

3.1.7 Release Notes

Fixed Issues in 3.1.7

When a user cancels a query, it moves to a cancelled state but still waits for all fragments to finish. As a result, the profile may either not render properly or not be available.
Resolved by rendering the profiles for cancelled queries.

When files change in a folder (for example, they are deleted) and metadata refresh has not yet run, queries fail with a File/Folder does not exist error message.
Resolved by re-attempting the query (metadata change) when Parquet files/folders are missing from the source. This fix is implemented for S3 and ADLS sources; The fix for Hive sources is scheduled for a future release.

If an exception is thrown while handling node unregistration, queries may hang.
Fixed by improving the internal handling of notifications.

Very large column sizes in source CSV files can cause system unresponsiveness.
Resolved by limiting the column size of source CSV files to 32,000 bytes.

Soft limit warning message is shown when dremio stop is executed.
Fixed issue by displaying warning message only when dremio start is executed.

For large workloads, locality is not being calculated correctly.
Fixed to improve performance.

In Hive Parquet, unsupported datatypes (struct, map, and list) are displayed.
Fixed to display only supported datatypes.

Performance is impacted by a change in batch size.
Resolved by adjusting the hash bucket count in some operators.

In YARN deployments, executor processes in containers sometimes do not exit cleanly and remain active.
Resolved by implementing a watchdog to watch Dremio processes and a HTTP health check to kill executor processes that do not shutdown cleanly.

Dremio's object volume and heap usage can be sub-optimal.
Reduced the volume of objects and heap usage by consolidating buffers.

3.1.8 Release Notes

Enhancements in 3.1.8

LDAP Group Lists

As of Dremio version 3.1.8, LDAP functionality is implemented that establishes user-group relationships for group entries that lists users that belong to that group. See Using LDAP for more information.

Tableau ODBC Connection

As of Dremio version 3.1.8, a new advanced property, export.tableau.extra-odbc-connection-properties, is implemented. This property allows you to set the ODBC connection string when exporting a Dremio dataset to Tableau TDS format when SSL is enabled. The default is an empty string.

Fixed Issues in 3.1.8

The Dremio Hive ORC reader does not preserve the decimal point nor set the scale properly for an external Hive table when converting from decimal to string or decimal to ORC file format.
Resolved by enforcing precision and scale for decimals after Hive ORC read operations.

The Hive Table ORC reader throws an exception when there is a datatype mismatch occurs with Hive external tables.
Resolved by ensuring that the Hive ORC reader resolves data types and column differences between the table schema and files in the table.

For Hive sources, when Parquet files change in a folder (for example, they are deleted) and metadata refresh has not yet run, queries fail with a File/Folder does not exist error message.
Resolved by re-attempting the query (metadata change) when Parquet files/folders are missing.

The Dremio thread statistic collector is using too much memory.
Fixed by collecting and carrying forward stats only for slicing threads (not inactive threads).

The Dremio clean orphan command removes non-dataset Wiki content.
Fixed so that Wiki and tag content added to any namespace entity are preserved.

In Dremio, large planning sessions use too much heap memory.
Resolved by updating Dremio planners to limit the number of generated nodes.

Dremio source/space/folder listings in the UI Main page are too slow.
Resolved by improving the time it takes to list space/source/folders
when there a larger number of items owned by many users.

When a storage plugin is replaced, the permission cache isn't cleared. This can impact impersonation since impersonation changes won't take effect unless Dremio is restarted.
Resolved by invaliding the permissions cache when a storage replacement succeeds.

In some scenarios, when creating Dremio virtual datasets and associated reflections, long planning times and heavy heap usage may occur.
Improved internal re-duplication and usage of heap memory.

Project traits are incorrectly propagated when creating new project nodes.
Fixed a planner issue that prevented queries from running when the query contained multiple JOIN and ORDER BY clauses.

The Hive Tables (ORC) throw an exception when the partitioned table data type is altered.
Resolved by ensuring that the Hive ORC reader resolves data types and column differences between the table schema and files in the table.

When exporting a dataset to Tableau TDS format, the ODBC connection string is set by Dremio and cannot be changed/modified by the user.
Resolved by implementing a new advanced property export.tableau.extra-odbc-connection-properties which takes a ODBC connection string.

3.1.9 Release Notes

Fixed Issues in 3.1.9

Wait time in some of the file system operations are recorded incorrectly.
Wait time in some of the file system operations are now recorded correctly as wait time stats for the query.

Acceleration profile missing information for unmatched reflections.
Fixed the issue associated with the missing information.

MongoDB fails connection.
Resolved by updating version compatibility methodology.

Thread scheduler fails to schedule work on inactive CPUs.
Resolved by improving task scheduling


results matching ""

    No results matching ""