1.3.1 Release Notes
Bug Fixes
Issue querying HBase tables from Hive
Querying HBase tables from Hive in Dremio would fail in some cases. This is now fixed. Users need to include hbase-site.xml
from HBase to Dremio’s /conf
location.
Improved error messages when working with Hive
Errors when reading data from Hive sources will now include more context.
Issue with data type changes when reading partitioned Hive tables
Changes in the data types for Hive tables would sometimes result in failed queries. Dremio now better handles different schemas across partitions.
1.3.0 Release Notes
Enhancements
Acceleration
Improved reflection profiles Query profiles now include more detailed information about reflections such as names of reflections, what reflections were considered, matched and chosen, details for the best cost query plan and canonicalized user query.
Improved reflection matching logic when working with multiple tables
Matching performance and reflection coverage has been increased when querying multiple datasets that have multiple reflections defined.
Execution
_Improved memory profiling _
Dremio now records more details about memory usage. Information on peak amount of memory across phases per node is now available.
_Better thread scheduling when some cores are idle _
Dremio now better handles scheduling threads when some of the cores are idle. This option is disabled by default in this release. The debug.task.on_idle_load_shed
flag can be used to enable this option, followed by restarting all the execution nodes.
_Performance improvements working with NULLS in Arrow _ This update reduces the amount of heap churn when interacting with validity vectors for all data types and provides better performance working with NULL values.
Ability to download Parquet in Dremio UI
Datasets can now be downloaded as Parquet files, which will preserve all type information. This option respects the 1,000,000 row system-wide download limit.
_Support for byte-order-marks (BOM) for text files _
BOM are now recognized when reading text files.
Coordination and metadata
Tableau for Mac support
Adds support for Tableau on Mac with Dremio ODBC Connector. Requires Tableau 10.4 or higher and Dremio Connector 1.3.14 or higher installed on the machine.
Metadata store maintenance utility
The dremio-admin
utility now has a clean
action that can be used to compact the metadata store, delete orphan objects, delete jobs based on age and reindex the data.
Web Application
Improvements to Job information Job information will now automatically refresh. New queries will also give detailed information about which Data Reflections were used, and which were not used.
Safari Support (experimental)
Dremio now supports Safari, starting with Safari 11.
SQL editor improvements
The SQL Editor now shows line numbers, has better insertion of fields, datasets, and functions (including “snippets”; tokenized arguments).
REST API for Sources
Dremio now has a public REST API for managing sources.
Bug Fixes
Acceleration
Windows queries fail if any reflection is chosen
Fixed issue with acceleration when using certain window function patterns.
Reflection field list incorrectly shows fields as having mixed type
Fixed various bugs affecting dataset schema information when working with reflections.
Reflections on datasets from RDBMS sources are immediately marked as expired
Fixed issue where reflections on datasets from RDBMS sources are marked as expired right after creation.
MaterializationTask fails to get the TTL of JDBC queries
Fixed bugs that were preventing reflections on JDBC datasets to be properly refreshed.
Left outer join queries not getting accelerated
Fixed issue where left outer join queries were not getting accelerated with certain query patterns.
Partial raw materializations are not matched when doing a join that requires only available columns
Updated acceleration logic to leverage raw reflections in a larger set of scenarios.
Substitution fails to flatten the array and gives wrong results
Fixed various bugs when using queries with flatten
function against datasets with reflections.
Handle “in-progress” Materialization tasks on startup
If the cluster is restarted while reflection materialization tasks are running, we make sure to mark those materialization as failed. This prevents issues with reflection maintenance after cluster restarts.
Coordination and Metadata
Use of binary collation with SQL Server
Pushdowns with string comparisons in SQL Server are now using a binary collation, consistent with Dremio’s own collation.
String data from SQL server is trimmed
String comparisons in SQL server ignore trailing spaces. To have a consistent behavior in Dremio, string data fetched by Dremio from SQL Server is trimmed from trailing spaces in order for comparisons with other systems to be consistent.
Edit original SQL fails after 2 or more transforms applied on virtual dataset
This should now work as expected.
Get error on Exclude when selecting “1970-01-01 00:00:00.000” date & time
Users are now able to select time within 100 ms boundary of Unix epoch.
SPLIT_PART() throws an ‘IndexOutOfBoundsException’
SPLIT_PART() function can now handle multiple parts.
Issue with different metadata refresh intervals
Although Dremio has two settings for the refresh rate of names vs. dataset definitions, the name-only refresh was not working as expected for some sources, and Dremio would always update the full dataset definitions. The individual settings are now observed for all sources. Moreover, when a source is added, Dremio only needs to find the dataset names before the UI allows the user to continue. The full set of metadata is refreshed in the background.
JDBC date/time issue
In certain scenarios, date/time values returned to JDBC clients could be off by one. This issue is now fixed.
Execution
Proxy settings for S3 are ignored
Attempting to set up an S3 source through a proxy would fail in Dremio. This behavior is now fixed – Dremio will correctly propagate all the proxy settings to the S3 client.
Avoid repeated object creation in reading/writing column data
The in-memory data structures in Arrow provide a read-only and write-only view of memory through accessor and mutator interfaces respectively. In our heap analysis, we noticed a bug where the volume of mutator objects was close to 66 million. The reason was that every time we asked for a mutator and accessor, a new object was created on heap upon every call. The fix resolves the problem.
Update default value of max width per node to be average number of cores across all executor nodes
Dremio has an external option “MAX_WIDTH_PER_NODE” to tune the degree of parallelism we use during the execution of a query. The default value of this parameter used to be 70% of the number of cores on a particular node. We now made changes such that default value of this option considers the number of cores across all executor nodes in Dremio cluster.
Null values in Complex data types were not correctly handled by WRITER operator
Dremio’s writer operator was not able to handle NULL values in complex/nested types. The fix resolves the problem.
Reduce heap usage in Parquet reader
The fix changes the code to use extremely lightweight (less heap overhead) and more efficient data structures in the critical path of Parquet reader code. Similar changes were also done for auxiliary structures we use in our implementation of hash join / hash agg operators.
Fix over-allocation of memory in our columnar data structures
In Dremio, all data is nullable so we use auxiliary structure to track NULL or non-NULL nature of cell values in a particular column. The problem was that we were over-allocating (8x) the memory for the auxiliary structure. The fix resolves the problem.