Version: current [26.x]

Parquet

This topic provides general information and recommendations for Parquet files.

Reading Parquet Files

Dremio's vectorized Parquet file reader improves parallelism on columnar data, reduces latencies, and enables more efficient resource and memory usage. The Parquet reader also improves Reflection performance.

Dremio supports offheap memory buffers for reading Parquet files.

Parquet Limitations

Take into consideration the following limitations when generating and configuring Parquet files. Failure to adhere to these restrictions may cause errors to trigger when using Parquet files with Dremio.

Maximum nested levels are restricted to 16. Multiple structs may be defined up to a total nesting level of 16. Exceeding this results in a failed query.
Maximum allowable elements in an array are restricted to 128. The maximum allowable number of elements in an array may not exceed this quantity. Additional elements beyond the allowed 128 results in a query failure.
Maximum footer size is restricted to 16MB. The footer consists of metadata. This includes information about the version of the format, the schema, extra key-value pairs, and metadata for columns in the file. When the footer exceeds this size, a query failure occurs.

Recommended Configuration

When using other tools to generate Parquet files for consumption in Dremio, we recommend the following configuration:

Type	Implementation
Row Groups	Implement your row groups using the following: A single row group per file, and a target of 1MB-25MB column stripes for most datasets (ideally). By default, Dremio uses 256 MB row groups for the Parquet files that it generates.
Pages	Implement your pages using the following: Snappy compression, and a target of ~100K page size. Use a recent Parquet library to avoid bad statistics issues.
Statistics	Use a recent Parquet library to avoid bad statistics issues.

Using Parquet for Apache Iceberg Tables

When writing to Iceberg tables (via Iceberg DML and OPTIMIZE), Dremio will honor the values defined in the write.target-file-size-bytes and write.parquet.row-group-size-bytes Iceberg table properties set when writing to underlying Parquet files.

If no values for write.target-file-size-bytes and write.parquet.row-group-size-bytes have been set on the table, then Dremio will use the Iceberg default values (512 MB and 128 MB, respectively) when writing to Parquet files.

Note that if you use Dremio to create an Iceberg table, then Dremio will set write.target-file-size-bytes to 128 MB, as this is optimal for Dremio's performance.

Reading Parquet Files​

Parquet Limitations​

Recommended Configuration​

Using Parquet for Apache Iceberg Tables​

Reading Parquet Files

Parquet Limitations

Recommended Configuration

Using Parquet for Apache Iceberg Tables