Skip to main content

OPTIMIZE TABLE

Rewrite data and manifest files to an optimal size, combining small files or splitting large files. For more information about optimizing tables, see Optimizing Tables. Users who have been assigned the ADMIN role, the table's owner, and users with SELECT and UPDATE privileges on the table can use the OPTIMIZE command.

Syntax
OPTIMIZE TABLE <table_name>
[ REWRITE MANIFESTS ]
[ REWRITE DATA [ USING BIN_PACK ]
[ FOR PARTITIONS <predicate> ]
[ ( { TARGET_FILE_SIZE_MB | MIN_FILE_SIZE_MB | MAX_FILE_SIZE_MB | MIN_INPUT_FILES } = <value> [, ... ] ) ]
]

Parameters

<table_name> String

The path of the Iceberg table that you want to optimize.


REWRITE MANIFESTS Optional

Rewrites the Iceberg manifest based on a size criteria. The default target size for manifest files is 8 MB.


REWRITE DATA [ USING BIN_PACK ] Optional

Either compacts smaller files or splits larger ones to the optimal file size using the bin_pack strategy. This parameter is used by default to optimize a table, but you can optionally include this parameter in your query to specify additional options to check the file sizes in your selected table. Additionally, since optimization uses only the bin_pack strategy, including the USING BIN_PACK clause is optional.

note

Dremio also optimizes Iceberg v2 tables containing position delete files.


FOR PARTITIONS <predicate> Optional

The filter condition for partitions. This clause can only be used with data files. The predicate can be any combination of logical and mathematical conditions. For example column_1=1 selects all files where column_1 contains the number 1. Another example, column_2 like 'a%' selects all files where column_2 starts with the letter a.

note

If the specified column is not partitioned, the query will not run and an error message will display.


[ ( { TARGET_FILE_SIZE_MB | MIN_FILE_SIZE_MB | MAX_FILE_SIZE_MB | MIN_INPUT_FILES } = <value> [, ... ] ) ] Number   Optional

The options available for the REWRITE DATA USING BIN_PACK clause. See the following table for the options that are supported.

Option NameDefault ValueDescription
TARGET_FILE_SIZE_MB256 MBControls the target size of the files that are generated.
MIN_FILE_SIZE_MB0.75 times the value of TARGET_FILE_SIZE_MBFiles that are smaller in size than min_file_size_mb qualifies for compaction.
MAX_FILE_SIZE_MB1.8 times the value of TARGET_FILE_SIZE_MBFiles that are larger in size than max_file_size_mb qualifies for compaction.
MIN_INPUT_FILES5The minimum number of qualified files needed to be considered for compaction.

Examples

Rewriting the data files then the manifest files for in the specified table to Dremio's optimal supported file size
OPTIMIZE TABLE demo.example_table
Rewriting the data files in the specified table using the default bin-packing rewrite algorithm to Dremio's optimal supported file size
OPTIMIZE TABLE demo.example_table
REWRITE DATA
Rewriting the data files in the specified table to Dremio's optimal supported file size
OPTIMIZE TABLE demo.example_table
REWRITE DATA (TARGET_FILE_SIZE_MB=256)
Rewriting the data files in the specified table using the specified minimum and maximum file sizes to Dremio's optimal supported file size
OPTIMIZE TABLE demo.example_table
REWRITE DATA (MIN_FILE_SIZE_MB=100, MAX_FILE_SIZE_MB=1000)
Rewriting the data files in the specified table for the minimum number of specified files to Dremio's optimal supported file size
OPTIMIZE TABLE demo.example_table
REWRITE DATA (MIN_INPUT_FILES=10)
Rewriting the data files in the specified table using the specified minimum and maximum file sizes to the specified target file size
OPTIMIZE TABLE demo.example_table
REWRITE DATA (MIN_FILE_SIZE_MB=100, MAX_FILE_SIZE_MB=1000, TARGET_FILE_SIZE_MB=512)
Rewriting the specified partition where column 'sales_year' contains '2023'
OPTIMIZE TABLE demo.example_table
FOR PARTITIONS sales_year=2023
Optimizing only Q4 data for the last 2 years
OPTIMIZE TABLE demo.example_table
FOR PARTITIONS sales_year IN (2021, 2022) AND sales_month IN ('OCT', 'NOV', 'DEC')
Rewriting manifest files only
OPTIMIZE TABLE demo.example_table
REWRITE MANIFESTS