3.1 Release Notes
Workload Management (EE only)
[info] Enterprise Edition only
The Workload Management feature improves workload management via user-defined job queues. These queues that are associated with different resource constraints and flexible assignment rules for assigning user jobs into these queues.
Workload Management is displayed in the Dremio UI on the Admin console and the Queues and Rules sections allow you to manage your queues and rules. See Workload Management for more information.
Hash Aggregation Spilling
Hash aggregation spilling is now available. This feature supports the memory-intensive hash aggregation queries that process large datasets. See Hash Aggregation Spilling for more information.
[info] Hash aggregation is disabled by default.
To request access to this feature, please send email to email@example.com.
Previewing a dataset now generates local SQL table references instead of global references, improving performance on slow metadata systems.
The Gandiva feature is now supported on Mac OS X version 10.11 and higher.
Enhanced Connector Framework
With this release, Dremio is adding improved relational connectors for Oracle, Redshift and MySQL data sources. This framework provides enhanced performance and extensive push-down capabilities for each of theses relational sources.
When upgrading from Dremio version 3.0.x to 3.1.0, external reflections become out of sync. Workaround this issue by dropping and recreating your external reflections.
Attempting an aggregation whose measure is a minimum or a maximum of a
VARCHAR or VARBINARY column causes Dremio executors to run out of heap memory.
These operations are supported only for low cardinality data sets.
When a Yarn-provisioned executor runs out of heap memory,
the executor may continue in a diminished capacity which can result in
it being left in a provisioning or decommissioning state for a long time.
Resolved so that if the executor runs out of heap memory, the process is killed. This allows Yarn to start a new process in its place.
ParquetDatasetAccessor.getBatchSchema() and ParquetFormatPlugin.PreviewReader
are very inefficient and can result in unnecessary memory usage when
reading files with many columns.
Improved efficiency of preview of Parquet files by reducing the unnecessary read parallelism.
When using Parquet, the ROUND operation may produce incorrect results.
Fixed by adjusting the ROUND method for floating point numerics.
JVM may crash if Dremio runs out of direct memory during RAW reflections.
Fixed by resolving exceptions and data flush implementation.
A memory leak may occur with a file is not found
Fixed the issue by releasing all the buffers.
Under tight memory conditions, fragments may fail while allocating children allocators.
Resolved by improving child allocation in the executor build.
For the Gandiva feature, the performance is sub-optimal.
Resolved by splitting IF expressions and boolean operators in Java and Gandiva.
SpoolingRawBatchBuffer opens a new input stream for each spilled batch it reads from disk.
This can result in a large number of unused RA threads.
Resolved by reusing input streams and simplifying usage.
When upgrading from Dremio version 3.0.x to 3.1.0, external reflections become out of sync.
Workaround this issue by dropping and recreating your external reflections.
Partial reindexing fails or takes a long time.
On startup after a crash, as part of recovery, Dremio reindexes data that was not committed to the internal index. In most cases, this partial reindexing failed or took unusually long due to a bug in computing how much data to reindex.
When restarting the master-coordinator node, the materialization cache no longer refreshes on other coordinators.
Even when a non-master coordinator fails to connect to the master it will still keep trying to refresh the materialization cache every 30s.
For Workload Management, the query is mislabeled as UI Run when it should be UI Download.
Resolved by implementing a UI_DOWNLOAD workload type.
For Workload management, need to display cancellation messages for queries.
Fixed by adding a new field in the user interface, under the Jobs tab, for cancellation notices.
For Workload management, there isn't a message in the job status when a query is cancelled.
Fixed by adding a cancellation message to job status and profile.
For S3, adding reflections with a bucket name but without a folder name, caused an error.
Resolved by supporting no folders in S3 buckets for reflection storage.
For a moved and renamed dataset, the version history may become out of sync.
Resolved by clearing the cache and forcing information reload after a dataset is renamed.
The error message on missing policies on a S3 bucket is too cryptic.
Resolved by changing the log from warning to error level to show a better error message on missing policies on a S3 bucket.
Excessive refresh and load materialization jobs occur with raw reflections.
Failure to deserialize materialization plans will no longer cause reflection to refresh indefinitely.
Some phases of heuristic planning could loop forever.
If a loop is created, a runaway query may occur.
Resolved by implementing a termination flag.
An experimental planning rule to transpose projections past filters was enabled by default,
but has been reported to be causing excessive planning time.
The rule is now disabled by default.
A query was cancelled because planning time exceeded 60 seconds.
Fixed timing issue.
A planning error occurs when an accelerated query involves inequality joins.
Resolved by improving JOIN rule functionality.
Raw reflections for VDS do not match the query.
Resolved by implementing direct VDS matching capability.
VDS expansion sometimes provides inconsistent matching replacements which could
result in incorrect results.
Resolved by improving VDS substitutions and replacement management.
A substitution error occurs with star schema reflection.
Fixed internal implementation of mapping .
Preview queries on datasets can become time consuming when a dataset is large.
Resolved by supporting a limit value for previews.
An experimental planning rule may cause excessive planning time.
The rule is now disabled by default.
The "create or replace vds" SQL command fails.
The CTAS DROP operation on Amazon S3 is not optimum.
Resolved issue so that performance is improved.
SQL pushdown query sometime fails on Postgres.
Resolved by correcting SQL translations for IN with multiple columns.
In the RDBMS plug-in, the performance for retrieving the initial table metadata needs improvement.
Resolved by adding source-specific query to list tables and improving time for adding new sources and retrieving schemas.
The log message for an unknown type was presented as an error rather than a warning.
Fixed by correctly logging as a warning without the stack trace and with contextual information.
For Oracle, Dremio can not access Oracle synonyms.
Resolved by providing synonym support for Oracle. When the option is enabled, synonym tables are included and oracle connection is configured to return column information.
When viewing objects in an Oracle instance, synonyms are listed as queryable objects,
however, Dremio is unable to get the columns for the object and fails.
Resolved by adding a switch that lets you hide synonyms and making synonyms properly queryable when they are not hidden.
The direct memory used by the non-vectorized Hive ORC reader was held even after its usage was done.
Resolved by using a scoped Dremio allocator in the zero copy path on the non-vectorized Hive ORC reader.
Running queries against Hive tables with a large number of partitions can cause hanging.
Resolved by implementing a change to lazily create Hive job configuration which optimizes and reduces the memory utilization when running queries against Hive tables with a larger number of partitions.
Per CVE-2017-5644, Apache POI is vulnerable to some denial of service.
Resolved by upgrading Apache POI to 3.17.
Tableau Desktop 2018.3.2 fails to open the Dremio-generated .tds file.
Because Tableau Desktop 2018.3.2 expects the "datasource" element to have a "version" attribute, resolved this issue by setting the version to an empty string.
A confusing error message occurs when issuing a CREATE TABLE AS SELECT query
into a S3 source.
Fixed by implementing better error reporting when failing to create tables in S3.
Per CVE-2014-0107, Apache Xalan is vulnerable to code injection through specially crafted files.
Resolved by upgrading Apache Xalan from 2.7.1 to 2.7.2.
Dremio logs are being written to the wrong log file, journalctl.
Fixed so that the logs are written to the server.log file.
The SQL editor context does not work with folders that have periods in them (for example, S3 bucket names).
Resolved handling of folder names with periods.
During Preview, an execution error shows an empty preview rather than an error message.
Fixed by adding an error message.
Need to limit previews when using the REST API.
Resolved by allowing Preview APIs to return schema without preview data.
The UI screen is flickering in the Admin > Dataset area.
Fixed flickering scrollbars.
Automatic previews for datasets increases data load when querying large datasets.
Resolved by decoupling of dataset metadata load and data load. Data load now does not block a user from altering query.
In the user interface, the alignment is broken after the Wiki feature.
Fixed by modifying the UI layout.
Tag displays don't show context.
Resolved by showing tags that are associated with a dataset.
In the user interface, the Data Graph option isn't available for physical datasets.
Resolved so that the Data Graph option is available for physical and virtual datasets.
3.1.1 Release Notes
What's New in 3.1.1
Dremio now implements a Licensing Agreement requirement when installing Dremio Enterprise Edition.
In addition to supporting LDAP simple bind (user, password) authentiction, Dremio now additionally supports the following LDAP modes of authentication:
- Anonymous (no user, no password)
- Unauthenticated Simple Bind (user, no password).
When authenticating to Dremio, empty passwords for users are not allowed.
Fixed Issues in 3.1.1
When creating reflections (or) creating tables from sources that have nested structures,
values in nested fields not written correctly.
Resolved by improving the Parquet writer to handle UNIONS better.
When searching for file system based sources,
a file that was promoted as dataset and then unpromoted may still appear in dataset search results.
Resolved by implementing a check to verify that only datasets are returned.
A query planning rule may lead to infinite planning issues.
Resolved by removing the rule.
A dependent reflection is not using the parent reflection's latest materialization.
Resolved so that the dependent reflection is refreshed to use the latest materialization.
Some phases of heuristic planning could loop forever.
Resolved by improving the JOIN rule.
With Oracle, duplicated column names are not renamed in the result set which can cause
either a memory leak or null results.
Resolved by renaming the duplicated columns in the result set.
3.1.3 Release Notes
Enhancements in 3.1.3
HDFS and Hive Impersonation
For HDFS and Hive impersonation, an impersonated username can now be lowercase or uppercase. A new advanced option, Impersonation User Delegation, can be configured e either when adding a new HDFS or Hive source or editing an existing source. Default: As is
Parquet File Reader
Dremio now supports offheap memory buffers for reading Parquet files from Azure Data Lake Store (ADLS).
Datasets are now limited to a (default) maximum table width of 800 columns, which is configurable. If the number of columns exceeds the configured limit, when querying such datasets, the following error message is displayed: "Using datasets with more than X columns is currently disabled." Datasets that already exceed the limit will not be queryable after their metadata are refreshed.
Hash Aggregation Spilling Default
In Dremio 3.1.3, the Hash Aggregation Spilling feature is enabled by default. See Hash Aggregation Spilling for more information.
Deprecations in 3.1.3
IBM DB2 as a connector has been deprecated in this release. Existing DB2 source connections will still continue to function in this release, but any new DB2 connections will not be possible. In the next maintenance release Dremio will be officially removing DB2 from the product causing any existing DB2 connections to fail. Dremio may choose to develop a newer DB2 connector in the future.
HBase as a connector has been deprecated in this release. Existing HBase source connections will still continue to function, but any new HBase connections will not be possible. Dremio will make a community version of the HBase connector available in the future which you will be able to download and configure to add new HBase source connections.
Fixed Issues in 3.1.3
Dremio rejects operations when a table has a large number of columns.
Resolved by improving the handling of metadata tables.
When source names are mixed-cased (not lower-cased), the orphan split
cleaning method identifies datasets referring to the lower-case source
name as orphans and removes all their splits.
Fixed the issue by normalizing the source name for comparison.