OPTIMIZE
Rewrite data files to an optimal size, combining small files or splitting large files. For more information about optimizing tables, see Optimizing Tables. Users who have been assigned the ADMIN
role, the table's owner, and users with SELECT
and UPDATE
privileges on the table can use the OPTIMIZE
command.
OPTIMIZE TABLE <table_name>
[ REWRITE DATA USING BIN_PACK
[ ( { TARGET_FILE_SIZE_MB | MIN_FILE_SIZE_MB | MAX_FILE_SIZE_MB | MIN_INPUT_FILES } = <value> [, ... ] ) ]
]
Parameters
<table_name> String
The path of the Iceberg table that you want to optimize.
REWRITE DATA [ USING BIN_PACK ] Optional
Either compacts smaller files or splits larger ones to the optimal file size using the bin_pack strategy. This parameter is used by default to optimize a table, but you can optionally include this parameter in your query to specify additional options to check the file sizes in your selected table. Additionally, since optimization uses only the bin_pack strategy, including the USING BIN_PACK
clause is optional.
Dremio also optimizes Iceberg v2 tables containing position delete files.
[ ( { TARGET_FILE_SIZE_MB | MIN_FILE_SIZE_MB | MAX_FILE_SIZE_MB | MIN_INPUT_FILES } = <value> [, ... ] ) ] Number Optional
The options available for the REWRITE DATA USING BIN_PACK clause.
See the following table for the options that are supported.
Option Name | Default Value | Description |
---|---|---|
TARGET_FILE_SIZE_MB | 256 MB | Controls the target size of the files that are generated. |
MIN_FILE_SIZE_MB | 0.75 times the value of TARGET_FILE_SIZE_MB | Files that are smaller in size than min_file_size_mb qualifies for compaction. |
MAX_FILE_SIZE_MB | 1.8 times the value of TARGET_FILE_SIZE_MB | Files that are larger in size than max_file_size_mb qualifies for compaction. |
MIN_INPUT_FILES | 5 | The minimum number of qualified files needed to be considered for compaction. |
FOR PARTITIONS <predicate> Optional
The filter condition for partitions. This clause can only be used with data files. The predicate
can be any combination of logical and mathematical conditions. For example column_1=1
selects all files where column_1
contains the number 1
. Another example, column_2 like 'a%'
selects all files where column_2
starts with the letter a
.
If the specified column is not partitioned, the query will not run and an error message will display.
REWRITE MANIFESTS Optional
Rewrites the Iceberg manifest based on a size criteria. The default target size for manifest files is 8 MB.
Examples
Rewriting the data files in the specified table to Dremio's optimal supported file sizeOPTIMIZE TABLE demo.example_table
OPTIMIZE TABLE demo.example_table
REWRITE DATA USING BIN_PACK
OPTIMIZE TABLE demo.example_table
REWRITE DATA USING BIN_PACK (TARGET_FILE_SIZE_MB=256)
OPTIMIZE TABLE demo.example_table
REWRITE DATA USING BIN_PACK (MIN_FILE_SIZE_MB=100, MAX_FILE_SIZE_MB=1000)
OPTIMIZE TABLE demo.example_table
REWRITE DATA USING BIN_PACK (MIN_INPUT_FILES=10)
OPTIMIZE TABLE demo.example_table
REWRITE DATA USING BIN_PACK (MIN_FILE_SIZE_MB=100, MAX_FILE_SIZE_MB=1000, TARGET_FILE_SIZE_MB=512)