This topic provides general information and recommendation for Parquet files.
As of Dremio version 3.1.3, Dremio supports offheap memory buffers for reading Parquet files from Azure Data Lake Store (ADLS).
As of Dremio version 3.2, Dremio provides enhanced cloud Parquet readers. The parquet file readers were re-designed to deliver multiple improvements including: increased parallelism on columnar data, reduced latencies, and more efficient resource and memory usage.
Additionally, the enhanced reader improves the performance of reflections. Implemented for ADLS and AWS S3.
Take into consideration the following limitations when generating and configuring Parquet files. Failure to adhere to these restrictions may cause errors to trigger when using Parquet files with Dremio.
When using other tools to generate Parquet files for consumption in Dremio, we recommend the following configuration:
Implement your row groups using the following:
Note: By default, Dremio uses 256 MB row groups or the Parquet files that it generates.
Implement your pages using the following:
Use a recent Parquet library to avoid bad statistics issues.
|Statistics||Use a recent Parquet library to avoid bad statistics issues.|
|Dictionary Encoding||Do not use. By default, Dremio does not use dictionary encoding for the Parquet files that it generates.|