OPTIMIZE TABLE
Rewrite data and manifest files to provide peak performance. Rewrite data and manifest files to provide peak performance. For more information, see Optimize Tables.
To run OPTIMIZE TABLE, you must meet one of the following requirements:
- Membership in the ADMIN role
- Possess grants of SELECT and UPDATE privileges on the table
- Table ownership
OPTIMIZE TABLE <table_name>
[ REWRITE DATA [ USING BIN_PACK ]
[ ( { TARGET_FILE_SIZE_MB | MIN_FILE_SIZE_MB | MAX_FILE_SIZE_MB | MIN_INPUT_FILES } = <value> [, ... ] ) ]
[ FOR PARTITIONS <predicate> ] ]
[ REWRITE_MANIFESTS ]
Clustered Data Tables
When working with clustered data tables, you may need to run multiple OPTIMIZE TABLE operations to achieve optimal data layout.
- Start with
OPTIMIZE TABLEto initiate reclustering of all records - Use
SELECT * FROM TABLE(clustering_information('table_name'))to check the resulting clustering depth. - Since Dremio optimization works incrementally to reduce the current clustering depth toward the target (default: 3), you may need to
OPTIMIZE TABLEmultiple times, checking theclustering_information()function after each run to monitor progress. - Continue this process until the clustering depth falls below your target threshold, or until successive runs show minimal improvement.
Configure Compaction
To optimize physical file management and compaction, OPTIMIZE TABLE combines small files with their closest cluster and rewrites large files into appropriate sizes while maintaining proper data organization according to the clustering keys.
Partitioned Data Tables
When working with partitioned data, the REWRITE DATA clause accepts several parameters. Since file rewriting uses only the bin_pack strategy, the USING BIN_PACK clause is optional.
Configure Compaction
The options available with REWRITE DATA USING BIN_PACK:
| Property Name | Default Value | Description |
|---|---|---|
TARGET_FILE_SIZE_MB | 512 MB | Controls the target size of the files that are generated. |
MIN_FILE_SIZE_MB | 0.24x TARGET_FILE_SIZE_MB | Files smaller than MIN_FILE_SIZE_MB qualify for compaction. |
MAX_FILE_SIZE_MB | 1.8x TARGET_FILE_SIZE_MB | Files larger than MAX_FILE_SIZE_MB qualify for compaction. |
MIN_INPUT_FILES | 5 | The minimum number of qualified files for a compaction operation. When a table is partitioned, MIN_INPUT_FILES specifies the minimum number of qualified files that must be present per partition. |
The command analyzes file groups or partitions in parallel across available engine executors to identify optimization candidates and then performs optimization when at least MIN_INPUT_FILES candidates are present.
Partition Filtering
The optional clause FOR PARTITIONS <predicate> selects specific partitions for optimizing data files when using REWRITE DATA. Consider using this filter when:
- Selected partitions are queried more often or are of higher importance
- Selected partitions are more active and are constantly being updated.
- You want to avoid optimizing the entire table.
The predicate can be any combination of logical and mathematical conditions.
column_1=1: Optimize all data files in the partition wherecolumn_1equals1.column_2 like 'a%': Optimize all data files in the partition wherecolumn_2values start with the lettera.date_column>= '2024-01-01': Optimize thedata_columnpartition with dates on or after January 1, 2024.
If the specified column is not partitioned, Dremio will display an error message. If a relevant partition cannot be found, Dremio will not optimize any files. When no partition filters are specified, all data files will be candidates for optimization. The command does not include sorting within partitions.
Optimize Iceberg Manifests
The optional clause REWRITE MANIFESTS optimizes the size of Iceberg manifest files. When this clause is expressed, the command must not contain the REWRITE DATA clause.
Dremio's manifest file sizing:
- Default target size:
commit.manifest.target-size-bytesor 8 MB if the table property is not present. - Optimal range: 0.75x to 1.8x the target size
- Files smaller than the minimum range: Will be compacted together
- Files larger than the maximum range: Will be split into smaller files
Examples
Rewrite data files and manifests to Dremio's optimal sizeOPTIMIZE TABLE demo.example_table
OPTIMIZE TABLE demo.example_table
REWRITE DATA USING BIN_PACK (TARGET_FILE_SIZE_MB=512, MIN_INPUT_FILES=10)
OPTIMIZE TABLE demo.example_table
REWRITE DATA USING BIN_PACK (MIN_INPUT_FILES=10)
FOR PARTITIONS state='NEBRASKA'
OPTIMIZE TABLE demo.example_table
REWRITE MANIFESTS