Skip to main content

OPTIMIZE TABLE

Rewrites data and manifest files to provide peak performance.

For more information, see Optimization.

To run OPTIMIZE TABLE, you must meet one of the following requirements:

  • Membership in the ADMIN role
  • Grants of SELECT and WRITE privileges on the table
  • Table ownership
Syntax
OPTIMIZE TABLE <table_name>
[ FOR PARTITIONS <predicate> ]
note

FOR PARTITIONS is available but planned for deprecation. Use automatic table maintenance for ongoing optimization instead.

Parameters

<table_name> String

The path of the Iceberg table that you want to optimize.


FOR PARTITIONS <predicate> Optional

Selects specific partitions for optimization. Consider using this filter when:

  • Selected partitions are queried more often or are of higher importance.
  • Selected partitions are more active and are constantly being updated.
  • You want to avoid optimizing the entire table.

The <predicate> can be any combination of logical and mathematical conditions. However, the supported predicate syntax differs depending on whether the table uses identity or non-identity partitions.

Identity Partitions

For tables using identity partitions (partitioned directly on a column, such as PARTITION BY name), you can use equality conditions and expressions:

  • column_1=1: Optimizes all data files in the partition where column_1 equals 1.
  • column_2 LIKE 'a%': Optimizes all data files in the partition where column_2 values start with the letter "a".
  • date_column >= '2024-01-01': Optimizes the date_column partition with dates on or after January 1, 2024.

Non-Identity Partitions

For tables using non-identity partitions (partitioned using a transformation on a column, such as PARTITION BY (MONTH(ts))), only equality conditions on the source column are supported. Expressions that reference the partition transformation directly are not allowed and will result in an error.

Supported equality conditions:

Partition DefinitionSupported PredicateDescription
PARTITION BY (MONTH(ts))ts >= TIMESTAMP '2025-06-01'Filters based on the source timestamp column
PARTITION BY (MONTH(ts))ts = TIMESTAMP '2025-06-15'Exact match on source column
PARTITION BY (TRUNCATE(1, name))name = 'Alice'Equality on the source column
PARTITION BY (YEAR(date_col))date_col > '2024-01-01'Range condition on source column

Unsupported expressions (will error):

Partition DefinitionUnsupported PredicateWhy It Fails
PARTITION BY (MONTH(ts))MONTH(ts) = 6Cannot use transformation function in predicate
PARTITION BY (TRUNCATE(1, name))name LIKE 'A%'LIKE expressions not supported for non-identity partitions
PARTITION BY (YEAR(date_col))YEAR(date_col) = 2020Cannot reference partition transformation directly
PARTITION BY (TRUNCATE(10, amount))amount / 10 = 5Arithmetic expressions not supported

If the specified column is not partitioned, Dremio will display an error message. If a relevant partition cannot be found, Dremio will not optimize any files. When no partition filters are specified, all data files will be candidates for optimization.


OPTIMIZE Output

Each OPTIMIZE run returns a result set describing what was accomplished:

Output FieldDescription
Files rewrittenNumber of data files compacted in this run
Bytes processedTotal data volume rewritten
Table optimization stateCurrent health/state of the table
Additional runs neededIndicates whether more OPTIMIZE runs are required

When Additional runs needed is true, re-run OPTIMIZE TABLE <table_name> until the output confirms the table is fully optimized. For more information on incremental optimization behavior, see Optimization in the Apache Iceberg reference.

Optimize Clustered Data Tables

For clustered tables, OPTIMIZE TABLE incrementally reorders data to achieve the optimal data layout and manages file sizes. When working with clustered data tables, you may need to run multiple OPTIMIZE TABLE operations to achieve optimal data layout.

  • Start with OPTIMIZE TABLE to initiate reclustering of all records.
  • Use SELECT * FROM TABLE(clustering_information('table_name')) to check the resulting clustering depth.
  • Since Dremio optimization works incrementally to reduce the current clustering depth toward the target (default: 3), you may need to run OPTIMIZE TABLE multiple times, checking the clustering_information() function after each run to monitor progress. This mechanism may take longer to run on newly loaded or unsorted tables.
  • Continue this process until the clustering depth falls below your target threshold or until successive runs show minimal improvement.

Configuring Compaction

To optimize physical file management and compaction, OPTIMIZE TABLE combines small files with their closest cluster and rewrites large files into appropriate sizes while maintaining proper data organization according to the clustering keys.

Limitations

  • You can run only one OPTIMIZE TABLE query at a time per table partition.
  • Sort ordering within partitions is not supported.

Examples

Optimize a table
OPTIMIZE TABLE demo.example_table
Optimize a specific partition
OPTIMIZE TABLE demo.example_table
FOR PARTITIONS state='NEBRASKA'