OPTIMIZE TABLE
Rewrite data and manifest files to an optimal size, combining small files or splitting large files. For more information about optimizing tables, see Optimizing Tables. Users who have been assigned the ADMIN
role, the table's owner, and users with SELECT
and UPDATE
privileges on the table can use the OPTIMIZE
command.
OPTIMIZE TABLE <table_name>
[ REWRITE MANIFESTS ]
[ REWRITE DATA [ USING BIN_PACK ]
[ FOR PARTITIONS <predicate> ]
[ ( { TARGET_FILE_SIZE_MB | MIN_FILE_SIZE_MB | MAX_FILE_SIZE_MB | MIN_INPUT_FILES } = <value> [, ... ] ) ]
]
Parameters
<table_name> String
The path of the Iceberg table that you want to optimize.
REWRITE MANIFESTS Optional
Rewrites the Iceberg manifest based on a size criteria. The default target size for manifest files is 8 MB.
REWRITE DATA [ USING BIN_PACK ] Optional
Either compacts smaller files or splits larger ones to the optimal file size using the bin_pack strategy. This parameter is used by default to optimize a table, but you can optionally include this parameter in your query to specify additional options to check the file sizes in your selected table. Additionally, since optimization uses only the bin_pack strategy, including the USING BIN_PACK clause is optional.
Dremio also optimizes Iceberg v2 tables containing position delete files.
FOR PARTITIONS <predicate> Optional
The filter condition for partitions. This clause can only be used with data files. The predicate can be any combination of logical and mathematical conditions. For example column_1=1 selects all files where column_1 contains the number 1. Another example, column_2 like 'a%' selects all files where column_2 starts with the letter a.
If the specified column is not partitioned, the query will not run and an error message will display.
[ ( { TARGET_FILE_SIZE_MB | MIN_FILE_SIZE_MB | MAX_FILE_SIZE_MB | MIN_INPUT_FILES } = <value> [, ... ] ) ] Number Optional
The options available for the REWRITE DATA USING BIN_PACK clause. See the following table for the options that are supported.
Option Name | Default Value | Description |
---|---|---|
TARGET_FILE_SIZE_MB | 256 MB | Controls the target size of the files that are generated. |
MIN_FILE_SIZE_MB | 0.75 times the value of TARGET_FILE_SIZE_MB | Files that are smaller in size than min_file_size_mb qualifies for compaction. |
MAX_FILE_SIZE_MB | 1.8 times the value of TARGET_FILE_SIZE_MB | Files that are larger in size than max_file_size_mb qualifies for compaction. |
MIN_INPUT_FILES | 5 | The minimum number of qualified files needed to be considered for compaction. |
Examples
Rewriting the data files then the manifest files for in the specified table to Dremio's optimal supported file sizeOPTIMIZE TABLE demo.example_table
OPTIMIZE TABLE demo.example_table
REWRITE DATA
OPTIMIZE TABLE demo.example_table
REWRITE DATA (TARGET_FILE_SIZE_MB=256)
OPTIMIZE TABLE demo.example_table
REWRITE DATA (MIN_FILE_SIZE_MB=100, MAX_FILE_SIZE_MB=1000)
OPTIMIZE TABLE demo.example_table
REWRITE DATA (MIN_INPUT_FILES=10)
OPTIMIZE TABLE demo.example_table
REWRITE DATA (MIN_FILE_SIZE_MB=100, MAX_FILE_SIZE_MB=1000, TARGET_FILE_SIZE_MB=512)
OPTIMIZE TABLE demo.example_table
FOR PARTITIONS sales_year=2023
OPTIMIZE TABLE demo.example_table
FOR PARTITIONS sales_year IN (2021, 2022) AND sales_month IN ('OCT', 'NOV', 'DEC')
OPTIMIZE TABLE demo.example_table
REWRITE MANIFESTS