Skip to main content

Automatic Optimization

As Apache Iceberg tables are written to and updated, data and metadata files accumulate, which can affect query performance. For example, small files produced by data ingestion jobs slow queries because the query engine must open and read more files. Tables that use merge-on-read for DML operations accumulate row-level deletes—stored as position delete files on v2 tables or deletion vectors on v3 tables—that add overhead to every read.

Dremio automates table maintenance for Iceberg tables in the Open Catalog. Automatic optimization runs on a dedicated engine configured by Dremio, so maintenance work does not impact your project query workloads.

When Dremio optimizes a table, it evaluates file sizes, partition layout, metadata organization, and accumulated row-level deletes to reduce I/O and metadata overhead. Optimization consists of five operations:

  • Data file compaction — combines small files and splits large ones to the target file size.
  • Row-level delete handling — applies accumulated position deletes (v2) or deletion vectors (v3) by rewriting the affected data files.
  • Clustering — reorders data within files based on clustering keys to improve query pruning.
  • Partition evolution — rewrites files to align with the current partition specification.
  • Manifest file rewriting — compacts or splits manifest files to keep query planning fast.

You can also run these operations manually with OPTIMIZE TABLE. Automatic optimization and manual optimization perform the same operations—the difference is that automatic optimization runs them continuously on a schedule managed by Dremio.

Incremental Processing

Optimization processes tables in batches rather than all at once. Dremio automatically determines batch size based on available engine resources—both the total bytes to rewrite and the number of input files are capped to prevent resource exhaustion and out-of-memory errors.

For small-to-medium tables or tables that are already well-optimized, a single optimization run typically processes the entire table. For very large tables—especially those with millions of small files or terabytes of data to compact—multiple optimization runs are needed to fully optimize the table.

When running OPTIMIZE TABLE manually, the output summary indicates whether the table is fully optimized or whether additional runs are needed. Manifest file rewriting is included automatically in each run.

Dremio recommends relying on automatic optimization rather than manual optimization. Automatic optimization handles incremental processing continuously, keeping tables in an optimal state without manual intervention.

Data File Compaction

Iceberg tables that are frequently updated can accumulate data files of varying sizes. Small files—often produced by streaming ingestion or frequent INSERT operations—slow queries because the engine must open and read each file individually. Very large files can also be inefficient because they reduce parallelism and make partial reads expensive.

Dremio compacts small files into larger ones and splits oversized files to reach the target file size of 256 MB. This reduces the total number of files the engine must process during queries and keeps file sizes in the range where read performance is optimal.

You can configure the target file size per table using the write.parquet.row-group-size-bytes table property.

Row-Level Delete Handling

When Iceberg tables use merge-on-read mode for DELETE, UPDATE, or MERGE operations, deletes are recorded separately from data files rather than rewriting those files immediately. This makes writes fast but adds read overhead because the engine must apply the recorded deletes during every query.

The format of these recorded deletes depends on the table's format version:

  • v2 tables use position delete files—separate Parquet files that identify deleted rows by file path and row position. During reads, Dremio must join these delete files with data files to filter out deleted rows. As delete files accumulate, this join becomes increasingly expensive.

  • v3 tables use deletion vectors—compact bitmaps stored in Puffin files with a direct 1:1 mapping to each data file. This eliminates the join overhead of v2 position deletes, but accumulated deletion vectors still add read overhead as the engine must apply them during query execution.

During optimization, Dremio applies accumulated position deletes or deletion vectors by rewriting the affected data files, producing clean files that contain only current rows. The original data files, delete files, and Puffin files remain on storage until snapshots are expired and unreferenced files are removed by VACUUM TABLE.

For details on how deletion vectors are applied outside of automatic optimization, and on configuring merge-on-read mode, see Apply Deletion Vectors.

Clustering

Tables defined with clustering keys require periodic re-clustering during optimization. As new data is written to a table through DML operations, the incoming rows are not sorted according to the clustering keys. Over time, this reduces the effectiveness of file-level pruning because Parquet column statistics no longer tightly bound the values in each file.

During optimization, Dremio reads the affected data files and rewrites them with records sorted by the clustering keys using Z-ordering. This restores tight value ranges within each file, allowing the query engine to skip files that don't match query filters.

For details on what clustering is, how to choose clustering keys, and why clustering is generally preferred over partitioning, see Clustering.

Partition Evolution

When you change a table's partition specification, older data files may be partitioned according to a previous specification. Queries that filter on the current partition columns must still scan these older files because their layout doesn't match the current partitioning.

Dremio detects data files that don't align with the current partition specification and rewrites them to match. This ensures that partition pruning works effectively across all data in the table, regardless of when it was written.

For details on defining and modifying partition specifications, see Partition Evolution.

Manifest File Rewriting

Iceberg uses manifest files as an index over a table's data files. Each manifest tracks a set of data files along with their partition values and column statistics, enabling the query engine to prune unnecessary files during planning. As tables are updated, manifests can become suboptimal—too many small manifests slow planning, and oversized manifests reduce pruning efficiency.

Dremio rewrites manifest files based on size criteria, splitting oversized manifests and compacting small ones. The target size is controlled by the table's commit.manifest.target-size-bytes property, which defaults to 8 MB. For details on manifest rewriting behavior, see OPTIMIZE TABLE.