Parquet File Best Practices

When using other tools to generate Parquet files for consumption in Dremio, we recommend the following configuration:

[info] Reading Parquet Files

Dremio supports offheap memory buffers for reading Parquet files from Azure Data Lake Store (ADLS), as of Dremio version 3.1.3.

Type Implementation Row Groups Implement your row groups using the following:
  • A single row group per file.
  • A target of 1MB-25MB column stripes for most datasets (ideally).
Note: By default, Dremio uses 256 MB row groups or the Parquet files that it generates. Pages Implement your pages using the following:
  • Snappy compression.
  • A target of ~100K page size.
Use a recent Parquet library to avoid bad statistics issues. Statistics Use a recent Parquet library to avoid bad statistics issues. Dictionary Encoding Do not use. Dremio, by default, does not use dictionary encoding for the Parquet files that it generates.

results matching ""

    No results matching ""