Parquet File Best Practices

When using other tools to generate Parquet files for consumption in Dremio, we recommend the following configuration:

Row Groups

  • Use a single row group per file.
  • Dremio uses 256 MB row groups by default for Parquet files it generates.
  • For most datasets, a target of 1MB-25MB column stripes is ideal.

Pages

  • Target ~100K page size.
  • Use snappy compression.

Statistics Use a recent Parquet library to avoid bad statistics issues.

Dictionary Encoding Dremio, by default, does not use dictionary encoding for Parquet files it generates.


results matching ""

    No results matching ""