Skip to main content
Version: current [26.x]

OPTIMIZE TABLE

Rewrites data and manifest files to provide peak performance. For more information, see Optimize Tables.

To run OPTIMIZE TABLE, you must meet one of the following requirements:

  • Membership in the ADMIN role
  • Grants of SELECT and UPDATE privileges on the table
  • Table ownership
Syntax
OPTIMIZE TABLE <table_name>
[ REWRITE DATA [ USING BIN_PACK ]
[ ( { TARGET_FILE_SIZE_MB | MIN_FILE_SIZE_MB | MAX_FILE_SIZE_MB | MIN_INPUT_FILES } = <value> [, ... ] ) ]
[ FOR PARTITIONS <predicate> ] ]
[ REWRITE_MANIFESTS ]

Parameters

<table_name> String

The path of the Iceberg table that you want to optimize.


REWRITE DATA [ USING BIN_PACK ] Optional

For partitioned tables, this parameter compacts smaller files or splits larger ones to the optimal file size using the bin-pack strategy. This parameter is used by default to optimize a table, but you can optionally include it in your query to specify additional options to control the file sizes in your selected table. Additionally, since optimization uses only the bin-pack strategy, including the USING BIN_PACK clause is optional. Dremio also optimizes Iceberg v2 tables containing position delete files.


[ ( { TARGET_FILE_SIZE_MB | MIN_FILE_SIZE_MB | MAX_FILE_SIZE_MB | MIN_INPUT_FILES } = <value> [, ... ] ) ] Number   Optional

The options available with REWRITE DATA USING BIN_PACK:

Property NameDefault ValueDescription
TARGET_FILE_SIZE_MBwrite.target-file-size-bytes target table property, or 512 MB as defined by Apache Iceberg if the table property is not defined.Controls the target size of the files that are generated.
MIN_FILE_SIZE_MB0.24x TARGET_FILE_SIZE_MB (123 MB at the default TARGET_FILE_SIZE_MB)Files smaller than MIN_FILE_SIZE_MB qualify for compaction.
MAX_FILE_SIZE_MB1.8x TARGET_FILE_SIZE_MB (922 MB at the default TARGET_FILE_SIZE_MB)Files larger than MAX_FILE_SIZE_MB qualify for compaction.
MIN_INPUT_FILES5The minimum number of qualified files for a compaction operation. When a table is partitioned, MIN_INPUT_FILES specifies the minimum number of qualified files that must be present per partition.

The command analyzes file groups or partitions in parallel across available engine executors to identify optimization candidates and then performs optimization when at least MIN_INPUT_FILES candidates are present.


FOR PARTITIONS <predicate> Optional

Selects specific partitions for optimizing data files when using REWRITE DATA. Consider using this filter when:

  • Selected partitions are queried more often or are of higher importance.
  • Selected partitions are more active and are constantly being updated.
  • You want to avoid optimizing the entire table.

The predicate can be any combination of logical and mathematical conditions.

  • column_1=1: Optimizes all data files in the partition where column_1 equals 1.
  • column_2 like 'a%': Optimizes all data files in the partition where column_2 values start with the letter a.
  • date_column >= '2024-01-01': Optimizes the date_column partition with dates on or after January 1, 2024.

If the specified column is not partitioned, Dremio will display an error message. If a relevant partition cannot be found, Dremio will not optimize any files. When no partition filters are specified, all data files will be candidates for optimization. The command does not include sorting within partitions.


REWRITE_MANIFESTS Optional

Optimizes the size of Iceberg manifest files when optimizing clustered or partitioned data. When this clause is used, the command must not contain the REWRITE DATA clause.

Dremio's manifest file sizing:

  • Default target size: commit.manifest.target-size-bytes or 8 MB if the table property is not present.
  • Optimal range: 0.75x to 1.8x the target size
  • Files smaller than the minimum range: Will be compacted together
  • Files larger than the maximum range: Will be split into smaller files

Optimizing Clustered Data Tables

When working with clustered data tables, you may need to run multiple OPTIMIZE TABLE operations to achieve optimal data layout.

  • Start with OPTIMIZE TABLE to initiate reclustering of all records.
  • Use SELECT * FROM TABLE(clustering_information('table_name')) to check the resulting clustering depth.
  • Since Dremio optimization works incrementally to reduce the current clustering depth toward the target (default: 3), you may need to run OPTIMIZE TABLE multiple times, checking the clustering_information() function after each run to monitor progress.
  • Continue this process until the clustering depth falls below your target threshold or until successive runs show minimal improvement.

Data Table Parameters

To optimize the logical data layout of the table, OPTIMIZE TABLE uses the dremio.clustering table properties from the target table. The default values can be overridden by support keys of the same names.

Configuring Compaction

To optimize physical file management and compaction, OPTIMIZE TABLE also uses the dremio.iceberg.auto.clustering.compact.files.enabled support key to control file compaction. When true (the default), small files are combined with their closest cluster, and large files are rewritten into appropriate sizes while maintaining proper data organization according to the clustering keys.

Examples

Rewrite data files and manifests to Dremio's optimal size
OPTIMIZE TABLE demo.example_table
Rewrite partitioned data files meeting the minimum file count
OPTIMIZE TABLE demo.example_table
REWRITE DATA USING BIN_PACK (TARGET_FILE_SIZE_MB=512, MIN_INPUT_FILES=10)
Rewrite data files in the selected partition if the minimum file count is met
OPTIMIZE TABLE demo.example_table
REWRITE DATA USING BIN_PACK (MIN_INPUT_FILES=10)
FOR PARTITIONS state='NEBRASKA'
Rewrite only manifest files
OPTIMIZE TABLE demo.example_table
REWRITE_MANIFESTS