Parquet File Best Practices
This topic provides general information and recommendation for Parquet files.
Reading Parquet Files
As of Dremio version 3.1.3, Dremio supports offheap memory buffers for reading Parquet files from Azure Data Lake Store (ADLS).
As of Dremio version 3.2, Dremio provides enhanced cloud Parquet readers. The parquet file readers were re-designed to deliver multiple improvements including: increased parallelism on columnar data, reduced latencies, and more efficient resource and memory usage. Additionally, the enhanced reader improves the performance of reflections. Implemented for ADLS, and AWS S3.
When using other tools to generate Parquet files for consumption in Dremio, we recommend the following configuration:
- A single row group per file.
- A target of 1MB-25MB column stripes for most datasets (ideally).
- Snappy compression.
- A target of ~100K page size.