Skip to main content

OPTIMIZE TABLE

Rewrite data and manifest files to provide peak performance. Rewrite data and manifest files to provide peak performance. For more information, see Optimize Tables.

To run OPTIMIZE TABLE, you must meet one of the following requirements:

  • Membership in the ADMIN role
  • Possess grants of SELECT and UPDATE privileges on the table
  • Table ownership
Syntax
OPTIMIZE TABLE <table_name>
[ REWRITE DATA [ USING BIN_PACK ]
[ ( { TARGET_FILE_SIZE_MB | MIN_FILE_SIZE_MB | MAX_FILE_SIZE_MB | MIN_INPUT_FILES } = <value> [, ... ] ) ]
[ FOR PARTITIONS <predicate> ] ]
[ REWRITE_MANIFESTS ]

Clustered Data Tables

When working with clustered data tables, you may need to run multiple OPTIMIZE TABLE operations to achieve optimal data layout.

  • Start with OPTIMIZE TABLE to initiate reclustering of all records
  • Use SELECT * FROM TABLE(clustering_information('table_name')) to check the resulting clustering depth.
  • Since Dremio optimization works incrementally to reduce the current clustering depth toward the target (default: 3), you may need to OPTIMIZE TABLE multiple times, checking the clustering_information() function after each run to monitor progress.
  • Continue this process until the clustering depth falls below your target threshold, or until successive runs show minimal improvement.

Configure Compaction

To optimize physical file management and compaction, OPTIMIZE TABLE combines small files with their closest cluster and rewrites large files into appropriate sizes while maintaining proper data organization according to the clustering keys.

Partitioned Data Tables

When working with partitioned data, the REWRITE DATA clause accepts several parameters. Since file rewriting uses only the bin_pack strategy, the USING BIN_PACK clause is optional.

Configure Compaction

The options available with REWRITE DATA USING BIN_PACK:

Property NameDefault ValueDescription
TARGET_FILE_SIZE_MB512 MBControls the target size of the files that are generated.
MIN_FILE_SIZE_MB0.24x TARGET_FILE_SIZE_MBFiles smaller than MIN_FILE_SIZE_MB qualify for compaction.
MAX_FILE_SIZE_MB1.8x TARGET_FILE_SIZE_MBFiles larger than MAX_FILE_SIZE_MB qualify for compaction.
MIN_INPUT_FILES5The minimum number of qualified files for a compaction operation. When a table is partitioned, MIN_INPUT_FILES specifies the minimum number of qualified files that must be present per partition.

The command analyzes file groups or partitions in parallel across available engine executors to identify optimization candidates and then performs optimization when at least MIN_INPUT_FILES candidates are present.

Partition Filtering

The optional clause FOR PARTITIONS <predicate> selects specific partitions for optimizing data files when using REWRITE DATA. Consider using this filter when:

  • Selected partitions are queried more often or are of higher importance
  • Selected partitions are more active and are constantly being updated.
  • You want to avoid optimizing the entire table.

The predicate can be any combination of logical and mathematical conditions.

  • column_1=1: Optimize all data files in the partition where column_1 equals 1.
  • column_2 like 'a%': Optimize all data files in the partition where column_2 values start with the letter a.
  • date_column >= '2024-01-01': Optimize the data_column partition with dates on or after January 1, 2024.

If the specified column is not partitioned, Dremio will display an error message. If a relevant partition cannot be found, Dremio will not optimize any files. When no partition filters are specified, all data files will be candidates for optimization. The command does not include sorting within partitions.

Optimize Iceberg Manifests

The optional clause REWRITE MANIFESTS optimizes the size of Iceberg manifest files. When this clause is expressed, the command must not contain the REWRITE DATA clause.

Dremio's manifest file sizing:

  • Default target size: commit.manifest.target-size-bytes or 8 MB if the table property is not present.
  • Optimal range: 0.75x to 1.8x the target size
  • Files smaller than the minimum range: Will be compacted together
  • Files larger than the maximum range: Will be split into smaller files

Examples

Rewrite data files and manifests to Dremio's optimal size
OPTIMIZE TABLE demo.example_table
Rewrite partitioned data files meeting the minimum file count
OPTIMIZE TABLE demo.example_table
REWRITE DATA USING BIN_PACK (TARGET_FILE_SIZE_MB=512, MIN_INPUT_FILES=10)
Rewrite data files in the selected partition if the minimum file count is met
OPTIMIZE TABLE demo.example_table
REWRITE DATA USING BIN_PACK (MIN_INPUT_FILES=10)
FOR PARTITIONS state='NEBRASKA'
Rewrite only manifest files
OPTIMIZE TABLE demo.example_table
REWRITE MANIFESTS